Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider a design for access control #266

Closed
coffeemug opened this issue Jan 30, 2013 · 44 comments
Closed

Consider a design for access control #266

coffeemug opened this issue Jan 30, 2013 · 44 comments
Assignees
Milestone

Comments

@coffeemug
Copy link
Contributor

If RethinkDB supported access control, we could allow people to hit the database via the http API straight from the browser and bypass servers. For a lot of people this would be really, really nice.

I'm slating for 1.5 for now, but we should brainstorm what we can do here once 1.4 is out.

@mlucy
Copy link
Member

mlucy commented Jan 30, 2013

This sounds like a really really really bad idea to me. If we have an HTTP API and encourage our users to let anyone on the internet make arbitrary requests to it, then we're responsible for security, and our code base isn't anywhere near ready for that.

This is the sort of thing we could plausibly do once we have a rock-solid code base, a real QA team, and enough money to hire pen testers.

@neumino
Copy link
Member

neumino commented Jan 30, 2013

Is this issue for an access control like who can read/write on this database/table?
Or is it just to provide people an HTTP api?

@coffeemug
Copy link
Contributor Author

@neumino -- if you expose the http api to people, they could write apps entirely in the browser without servers (which is really nice). The downside is that it opens up a whole world of security issues that we'd have to solve, which is what @mlucy is talking about here.

@Raynos
Copy link

Raynos commented Jan 30, 2013

The whole "I have no app server ,just the db" is flawed. Writing your own HTTP api in the app server in front of rethink is pretty simple.

Access control and security is really difficult to get right. Better focus on easier problems first. (like push messages out of the database)

@mlucy
Copy link
Member

mlucy commented Jan 30, 2013

I agree with Raynos; I think this is too difficult to do right in the near future.

If people really want to do this in a scenario where security isn't a huge issue, e.g. for an intranet app, they can already load the javascript driver into their webpage and execute commands that way.

@jdoliner
Copy link
Contributor

I also agree that this is a bad idea. It's difficult to get right and the end product will probably be pretty error prone too. For example suppose you wanted to do a basic banking app, you'd probably have a table of transactions with a user id field on each row. Now you'll need to express permissions based on a predicate over the rows. This gets complicated fast and the stakes of getting it wrong are really high.

@al3xandru
Copy link
Contributor

CouchDB which is probably the one db closest to offering the direct web-to-db connection has a security issue reported every couple of months. It's a risky path to go right now. Implementing security features will be important, but let's get there first.

@coffeemug
Copy link
Contributor Author

Quietly moving to backlog...

@bitemyapp
Copy link

(Better?) Access control is a good idea, but I don't think exposing an HTTP API by default is a good one. I've seen too many exposed ElasticSearch and CouchDB clusters to recommend otherwise.

@stuartpb
Copy link

Uh, HTTP API aside for a moment, what about the client driver interface? Only binding to the local network interface isn't enough. If I want my DB handled by another service (like MongoLab and MongoHQ for MongoDB) or even just another shard, without authentication, I can't do it without openly binding to an external network interface and giving every other app in the cluster/datacenter/world access to my database.

I prototype my apps in a mock production environment. Right now, this is the one factor keeping me off of RethinkDB.

@neumino
Copy link
Member

neumino commented May 25, 2013

The only way I've seen so far is to use some ssh tunnels.

@mlucy
Copy link
Member

mlucy commented May 25, 2013

How would people feel about completely bare-bones access control scheme where:

  • You register an API key with the server over ssh (or maybe through the web interface).
  • All connecting clients send the API key when they open the connection.
  • The server rejects all connections without a valid API key.

(In particular, this means no encryption, no per-database, per-table, or per-datacenter access control, etc. etc.)

I think that's the bare minimum we could do, and we could probably do it fairly quickly (1.7 maybe?).

@stuartpb
Copy link

What about specifying the API key (or the hash of it) in the config?

@mlucy
Copy link
Member

mlucy commented May 26, 2013

We could support that too, but the config is only re-read when RethinkDB restarts and it would be nice to be able to add/remove API keys while keeping the server up.

@AtnNn
Copy link
Member

AtnNn commented May 26, 2013

The servers would also need to share the API keys, so that clients don't need to remember a different key for each server in the cluster.

@neumino
Copy link
Member

neumino commented May 27, 2013

What about access to the web interface?
That sounds like a lot of work. Or maybe someone has a simple (and secure) solution?

@stuartpb
Copy link

Allow the web interface to bind to network interfaces separately from the client driver interface and include a tutorial in the documentation on how to set up a secure remote proxy using SSH tunneling / lightweight HTTP servers?

@neumino
Copy link
Member

neumino commented May 27, 2013

That would one way to do it but I'm not sure that could be done if we decide to use only one port (see #768 )

About the documentation for the ssh tunnel, I just wrote it today, so it should be available around next week.

@stuartpb
Copy link

I don't think combining the ports should be mandatory. (Say I want different firewall rules for HTTP, native client, and cluster connections.) Just mention in the tutorial that you have to split your services into multiple ports to use port-based mechanisms for access control.

@stuartpb
Copy link

Also, I wouldn't have a problem with having SSH tunneling being the mechanism of access control for the native interface (I mean, heck, Git does it, and they get by), so long as it wouldn't require any additional setup for the client driver (like how connecting to an SSH'd Git server is as simple as entering a remote server URL that begins with "ssh+git://").

@mfenniak
Copy link

As an access control system, has any consideration been given to utilizing TLS? My thought is to have this work by creating a certificate authority (CA) for a DB cluster, a certificate would be issued for each server, and client (and cluster) connections would be have their client certs verified as issued by the same CA.

Upsides would be: encrypted communication, very secure, could be applied to client->server & cluster->cluster & web interface, well supported by a variety of client environments, should be equally compatible with using only one network port.

Downsides: requires incorporating a library dependency (like openssl, gnutls, etc.), relatively complex server configuration (although could be aided by automated tools), and it might not be a good precursor to a "fine-grained" access control system.

@coffeemug
Copy link
Contributor Author

This issue had a slightly different original intention, but since The People took it in the direction of client driver access, we'll treat it this way.

We'll likely need some version of this for some cool things we're doing internally, so I'm moving this to 1.6. I'm not 100% sure it'll make it, but the odds are good, FYI.

We'll figure out specifically what to do this week.

This was referenced May 28, 2013
@coffeemug
Copy link
Contributor Author

Ok, we need a less-is-more solution here to get #892 out the door, so if anyone has ideas (pinging @srh), please add them here.

I think full-blown TLS support might be a bit of an overkill since most of the time the database infrastructure is running on a protected subnet with a clear list of ip addresses permitted to contact the db server. (This might be much more relevant in case of Amazon EC2, though)

One option I see might be to simply support an API key, which wouldn't be impervious to man-in-the-middle attacks, but would be sufficient for most needs (ok in protected subnets, ok on Amazon EC2 with a firewall or for simple cases where security isn't a big risk). We could just support a cluster-wide API key in the semi-lattices, allow setting and changing it via the CLI, and allow providing it to the clients via r.connect(..., api_key=key). This would probably be sufficient right now -- advanced users could use SSH tunneling for more security, and we can choose to support TLS and per-database settings later.

What does everyone think?

@stuartpb
Copy link

What about supporting a list of keys so different endpoints can have different connection credentials that can be revoked / changed at any time without disrupting other connections?

@mlucy
Copy link
Member

mlucy commented May 28, 2013

I think as long as we're very clear that the API key doesn't provide high security, and we document the SSH tunneling solution, that sounds like a good compromise. Adding support for a set of API keys probably wouldn't be too hard.

@coffeemug
Copy link
Contributor Author

What about supporting a list of keys so different endpoints can have different connection credentials that can be revoked / changed at any time without disrupting other connections?

@stuartpb -- That's a really cool idea (and probably very useful in production). I worry that this is a bit outside of the scope of a minimum-viable-feature as it would make things significantly more complicated and delay the release. If we go with the API key solution, I'd be inclined to do a single key for now, and then add support for multiple keys later if there's demand (but we'll wait to get some more feedback on this before deciding).

@srh
Copy link
Contributor

srh commented May 28, 2013

I have an idea, use TLS instead of having a completely fake access control system that sends everything in the clear.

@mrkurt
Copy link

mrkurt commented May 29, 2013

Redis-like-auth (single key, defined in the config file) seems pretty good for minimum viable feature, particularly if you make it clear that it's only for local or semi restricted EC2 traffic. TLS won't buy you much if someone can snoop internal Amazon traffic.

SSL in mongo makes things pretty contorted at times, and adds another level of complexity to drivers ... things break frequently for hard to diagnose reasons. SSH tunneling or similar seems like a simpler way to teach drivers to speak securely.

@neumino
Copy link
Member

neumino commented May 30, 2013

Shipping an access control would a good workaround for #486

Connecting two machines through ssh tunnels is a piece of cake.
Connecting three machines is a whole different story.

@ghost ghost assigned Tryneus May 31, 2013
@coffeemug
Copy link
Contributor Author

@mrkurt -- I'm curious why it's necessary to implement an auth key for EC2 at all. Here's one possible way to implement security without it:

  • Put all the nodes in the cluster in one EC2 security group and allow cluster connections only within that security group. This way only authorized nodes can connect to the cluster.
  • Add a rule to this security group allowing client driver connections from all the groups where the clients run.
  • Set up a reverse proxy for the web UI that uses http authentication.

The only downside I can see for this scheme is that client drivers can't connect to the nodes from outside of EC2 (e.g. my laptop) without first ssh'ing into one of the authenticated EC2 machines (or using an SSH tunnel). I have two questions:

  1. Am I missing anything?
  2. How important/common is it for client drivers to connect from outside of EC2? I can see it being annoying that I can't connect to an EC2 host from my laptop easily, but arguably if I'm running on EC2 already, my client drivers should run there as well.

What do you think?

@coffeemug
Copy link
Contributor Author

One more thought: another downside of this scheme is that folks running on VPS nodes would have to set up their own firewall rules, which is more complicated than setting up EC2 security groups. Arguably in this case passing around auth keys in plain text won't help them much anyway, so having good docs on how to set that up sounds like a better solution to me.

@mrkurt
Copy link

mrkurt commented May 31, 2013

@coffeemug For EC2 proper, I wouldn't bother with auth and instead lean on security groups. If you hope you have something that people can you from Heroku or similar, though, IP/security group restrictions aren't good enough (you have to allow Heroku's entire security group). I think a key based auth for use inside EC2's network is reasonably easy way to let those users get going quickly.

I do think this is an interesting problem with probably a better solution out there, though. We've toyed around with various ways of securing Redis/Mongo connections from "untrusted" app server networks (SSH tunnels, proxies, etc), but they're difficult to roll out in an unofficial way to general users without solid driver support. If you guys can come up with a straightforward way to let Rethink clients talk "into" a firewalled cluster, that would be cool.

@coffeemug
Copy link
Contributor Author

Ok, let's go for Redis-like auth (http://redis.io/topics/security).

Here's the draft of the spec. Please feel free to comment, but let's keep it within the confines of this specific authentication scheme.

  • The users can set cluster-wide auth key (stored in plain-text in semi-lattices) via admin CLI only. They can use the following command via admin CLI to set the key -- set auth [KEYHERE]. To unset the key -- unset auth.
  • In order to connect to a cluster that has an auth key set up, the client drivers need to connect with the auth key as follows: r.connect(host, port, auth="[KEYHERE]")
  • The auth key will not be required to access the http server. We'll explain in the instructions how to set up a reverse proxy instead.
  • The auth key will not be required for the nodes in the cluster to connect to each other. We'll explain in the instructions how to set up proper security groups to make sure unauthorized nodes can't connect to each other.

@Tryneus
Copy link
Member

Tryneus commented Jun 4, 2013

Working on this now.

@Tryneus
Copy link
Member

Tryneus commented Jun 5, 2013

One thing I should probably point out: changing cluster_semilattice_metadata_t makes the new version incompatible with old metadata files (which results in an ugly crash at startup), so something should probably be done to give people an upgrade path, and to make a cleaner error message.

@Tryneus
Copy link
Member

Tryneus commented Jun 5, 2013

@jdoliner has talked me into putting the auth key into a different metadata, which will put off the problem I just mentioned.

@coffeemug
Copy link
Contributor Author

In general, could we just bump the serializer file version to avoid this problem or do we need to introduce a separate metadata version number?

@Tryneus
Copy link
Member

Tryneus commented Jun 6, 2013

I don't see how either of those solve the problem in the future. There is still no upgrade path that preserves the cluster configuration, so users would be required to recreate their entire configuration.

Bumping the serializer version should work to let the users know it's incompatible, but if we change the serializer version, we'll require them to reimport all their data without a very good reason. I think we should have separate versions for each layer (i.e. serializer, btree, and metadata), as well as an easy path to migrate a cluster.

@coffeemug
Copy link
Contributor Author

I understand there is no easy upgrade path. For now that's fine -- when we change the metadata we'll ask people to reimport data. It isn't ideal, but for the time being the additional development speed it gives us outweighs the disadvantages.

@jdoliner
Copy link
Contributor

jdoliner commented Jun 6, 2013

This is also all theoretical though seeing as we're not actually changing the metadata for this issue.

@Tryneus
Copy link
Member

Tryneus commented Jun 11, 2013

This is implemented and up in review 622 including the updated python client. Ruby and js clients are being updated separately.

@Tryneus
Copy link
Member

Tryneus commented Jun 13, 2013

Alright, this has been merged to next in a ton of commits, will be in release 1.6.

@fasiha
Copy link

fasiha commented May 27, 2015

When this was closed in 2013, was it closed in reference to @coffeemug's initial thought of HTTP API for browser-only apps (‘if you expose the http api to people, they could write apps entirely in the browser without servers (which is really nice)’), or with reference to client driver access, which this issue morphed into?

I would love to see an HTTP API, or anything that allowed me to write a purely client-side app, without anything server-side beyond RethinkDB—but I fully appreciate that this is likely beyond the scope of this project.

Edit: I also fully appreciate that numerous other non-RethinkDB products have come out in the last 2.5 years to address this need :)

@coffeemug
Copy link
Contributor Author

@fasiha -- we've been thinking about it a lot (and still are). This is definitely tricky -- we don't have a plan for it yet, and don't yet know if it will ever become a core part of RethinkDB. It will take more time to figure it out, but it's definitely on our minds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests