-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider a design for access control #266
Comments
This sounds like a really really really bad idea to me. If we have an HTTP API and encourage our users to let anyone on the internet make arbitrary requests to it, then we're responsible for security, and our code base isn't anywhere near ready for that. This is the sort of thing we could plausibly do once we have a rock-solid code base, a real QA team, and enough money to hire pen testers. |
Is this issue for an access control like who can read/write on this database/table? |
The whole "I have no app server ,just the db" is flawed. Writing your own HTTP api in the app server in front of rethink is pretty simple. Access control and security is really difficult to get right. Better focus on easier problems first. (like push messages out of the database) |
I agree with Raynos; I think this is too difficult to do right in the near future. If people really want to do this in a scenario where security isn't a huge issue, e.g. for an intranet app, they can already load the javascript driver into their webpage and execute commands that way. |
I also agree that this is a bad idea. It's difficult to get right and the end product will probably be pretty error prone too. For example suppose you wanted to do a basic banking app, you'd probably have a table of transactions with a user id field on each row. Now you'll need to express permissions based on a predicate over the rows. This gets complicated fast and the stakes of getting it wrong are really high. |
CouchDB which is probably the one db closest to offering the direct web-to-db connection has a security issue reported every couple of months. It's a risky path to go right now. Implementing security features will be important, but let's get there first. |
Quietly moving to backlog... |
(Better?) Access control is a good idea, but I don't think exposing an HTTP API by default is a good one. I've seen too many exposed ElasticSearch and CouchDB clusters to recommend otherwise. |
Uh, HTTP API aside for a moment, what about the client driver interface? Only binding to the local network interface isn't enough. If I want my DB handled by another service (like MongoLab and MongoHQ for MongoDB) or even just another shard, without authentication, I can't do it without openly binding to an external network interface and giving every other app in the cluster/datacenter/world access to my database. I prototype my apps in a mock production environment. Right now, this is the one factor keeping me off of RethinkDB. |
The only way I've seen so far is to use some ssh tunnels. |
How would people feel about completely bare-bones access control scheme where:
(In particular, this means no encryption, no per-database, per-table, or per-datacenter access control, etc. etc.) I think that's the bare minimum we could do, and we could probably do it fairly quickly (1.7 maybe?). |
What about specifying the API key (or the hash of it) in the config? |
We could support that too, but the config is only re-read when RethinkDB restarts and it would be nice to be able to add/remove API keys while keeping the server up. |
The servers would also need to share the API keys, so that clients don't need to remember a different key for each server in the cluster. |
What about access to the web interface? |
Allow the web interface to bind to network interfaces separately from the client driver interface and include a tutorial in the documentation on how to set up a secure remote proxy using SSH tunneling / lightweight HTTP servers? |
That would one way to do it but I'm not sure that could be done if we decide to use only one port (see #768 ) About the documentation for the ssh tunnel, I just wrote it today, so it should be available around next week. |
I don't think combining the ports should be mandatory. (Say I want different firewall rules for HTTP, native client, and cluster connections.) Just mention in the tutorial that you have to split your services into multiple ports to use port-based mechanisms for access control. |
Also, I wouldn't have a problem with having SSH tunneling being the mechanism of access control for the native interface (I mean, heck, Git does it, and they get by), so long as it wouldn't require any additional setup for the client driver (like how connecting to an SSH'd Git server is as simple as entering a remote server URL that begins with "ssh+git://"). |
As an access control system, has any consideration been given to utilizing TLS? My thought is to have this work by creating a certificate authority (CA) for a DB cluster, a certificate would be issued for each server, and client (and cluster) connections would be have their client certs verified as issued by the same CA. Upsides would be: encrypted communication, very secure, could be applied to client->server & cluster->cluster & web interface, well supported by a variety of client environments, should be equally compatible with using only one network port. Downsides: requires incorporating a library dependency (like openssl, gnutls, etc.), relatively complex server configuration (although could be aided by automated tools), and it might not be a good precursor to a "fine-grained" access control system. |
This issue had a slightly different original intention, but since The People took it in the direction of client driver access, we'll treat it this way. We'll likely need some version of this for some cool things we're doing internally, so I'm moving this to 1.6. I'm not 100% sure it'll make it, but the odds are good, FYI. We'll figure out specifically what to do this week. |
Ok, we need a less-is-more solution here to get #892 out the door, so if anyone has ideas (pinging @srh), please add them here. I think full-blown TLS support might be a bit of an overkill since most of the time the database infrastructure is running on a protected subnet with a clear list of ip addresses permitted to contact the db server. (This might be much more relevant in case of Amazon EC2, though) One option I see might be to simply support an API key, which wouldn't be impervious to man-in-the-middle attacks, but would be sufficient for most needs (ok in protected subnets, ok on Amazon EC2 with a firewall or for simple cases where security isn't a big risk). We could just support a cluster-wide API key in the semi-lattices, allow setting and changing it via the CLI, and allow providing it to the clients via What does everyone think? |
What about supporting a list of keys so different endpoints can have different connection credentials that can be revoked / changed at any time without disrupting other connections? |
I think as long as we're very clear that the API key doesn't provide high security, and we document the SSH tunneling solution, that sounds like a good compromise. Adding support for a set of API keys probably wouldn't be too hard. |
@stuartpb -- That's a really cool idea (and probably very useful in production). I worry that this is a bit outside of the scope of a minimum-viable-feature as it would make things significantly more complicated and delay the release. If we go with the API key solution, I'd be inclined to do a single key for now, and then add support for multiple keys later if there's demand (but we'll wait to get some more feedback on this before deciding). |
I have an idea, use TLS instead of having a completely fake access control system that sends everything in the clear. |
Redis-like-auth (single key, defined in the config file) seems pretty good for minimum viable feature, particularly if you make it clear that it's only for local or semi restricted EC2 traffic. TLS won't buy you much if someone can snoop internal Amazon traffic. SSL in mongo makes things pretty contorted at times, and adds another level of complexity to drivers ... things break frequently for hard to diagnose reasons. SSH tunneling or similar seems like a simpler way to teach drivers to speak securely. |
Shipping an access control would a good workaround for #486 Connecting two machines through ssh tunnels is a piece of cake. |
@mrkurt -- I'm curious why it's necessary to implement an auth key for EC2 at all. Here's one possible way to implement security without it:
The only downside I can see for this scheme is that client drivers can't connect to the nodes from outside of EC2 (e.g. my laptop) without first ssh'ing into one of the authenticated EC2 machines (or using an SSH tunnel). I have two questions:
What do you think? |
One more thought: another downside of this scheme is that folks running on VPS nodes would have to set up their own firewall rules, which is more complicated than setting up EC2 security groups. Arguably in this case passing around auth keys in plain text won't help them much anyway, so having good docs on how to set that up sounds like a better solution to me. |
@coffeemug For EC2 proper, I wouldn't bother with auth and instead lean on security groups. If you hope you have something that people can you from Heroku or similar, though, IP/security group restrictions aren't good enough (you have to allow Heroku's entire security group). I think a key based auth for use inside EC2's network is reasonably easy way to let those users get going quickly. I do think this is an interesting problem with probably a better solution out there, though. We've toyed around with various ways of securing Redis/Mongo connections from "untrusted" app server networks (SSH tunnels, proxies, etc), but they're difficult to roll out in an unofficial way to general users without solid driver support. If you guys can come up with a straightforward way to let Rethink clients talk "into" a firewalled cluster, that would be cool. |
Ok, let's go for Redis-like auth (http://redis.io/topics/security). Here's the draft of the spec. Please feel free to comment, but let's keep it within the confines of this specific authentication scheme.
|
Working on this now. |
One thing I should probably point out: changing |
@jdoliner has talked me into putting the auth key into a different metadata, which will put off the problem I just mentioned. |
In general, could we just bump the serializer file version to avoid this problem or do we need to introduce a separate metadata version number? |
I don't see how either of those solve the problem in the future. There is still no upgrade path that preserves the cluster configuration, so users would be required to recreate their entire configuration. Bumping the serializer version should work to let the users know it's incompatible, but if we change the serializer version, we'll require them to reimport all their data without a very good reason. I think we should have separate versions for each layer (i.e. serializer, btree, and metadata), as well as an easy path to migrate a cluster. |
I understand there is no easy upgrade path. For now that's fine -- when we change the metadata we'll ask people to reimport data. It isn't ideal, but for the time being the additional development speed it gives us outweighs the disadvantages. |
This is also all theoretical though seeing as we're not actually changing the metadata for this issue. |
This is implemented and up in review 622 including the updated python client. Ruby and js clients are being updated separately. |
Alright, this has been merged to next in a ton of commits, will be in release 1.6. |
When this was closed in 2013, was it closed in reference to @coffeemug's initial thought of HTTP API for browser-only apps (‘if you expose the http api to people, they could write apps entirely in the browser without servers (which is really nice)’), or with reference to client driver access, which this issue morphed into? I would love to see an HTTP API, or anything that allowed me to write a purely client-side app, without anything server-side beyond RethinkDB—but I fully appreciate that this is likely beyond the scope of this project. Edit: I also fully appreciate that numerous other non-RethinkDB products have come out in the last 2.5 years to address this need :) |
@fasiha -- we've been thinking about it a lot (and still are). This is definitely tricky -- we don't have a plan for it yet, and don't yet know if it will ever become a core part of RethinkDB. It will take more time to figure it out, but it's definitely on our minds. |
If RethinkDB supported access control, we could allow people to hit the database via the http API straight from the browser and bypass servers. For a lot of people this would be really, really nice.
I'm slating for 1.5 for now, but we should brainstorm what we can do here once 1.4 is out.
The text was updated successfully, but these errors were encountered: