Consider a design for access control #266

coffeemug · 2013-01-30T03:29:53Z

If RethinkDB supported access control, we could allow people to hit the database via the http API straight from the browser and bypass servers. For a lot of people this would be really, really nice.

I'm slating for 1.5 for now, but we should brainstorm what we can do here once 1.4 is out.

mlucy · 2013-01-30T03:44:30Z

This sounds like a really really really bad idea to me. If we have an HTTP API and encourage our users to let anyone on the internet make arbitrary requests to it, then we're responsible for security, and our code base isn't anywhere near ready for that.

This is the sort of thing we could plausibly do once we have a rock-solid code base, a real QA team, and enough money to hire pen testers.

neumino · 2013-01-30T03:55:24Z

Is this issue for an access control like who can read/write on this database/table?
Or is it just to provide people an HTTP api?

coffeemug · 2013-01-30T04:08:36Z

@neumino -- if you expose the http api to people, they could write apps entirely in the browser without servers (which is really nice). The downside is that it opens up a whole world of security issues that we'd have to solve, which is what @mlucy is talking about here.

Raynos · 2013-01-30T04:20:24Z

The whole "I have no app server ,just the db" is flawed. Writing your own HTTP api in the app server in front of rethink is pretty simple.

Access control and security is really difficult to get right. Better focus on easier problems first. (like push messages out of the database)

mlucy · 2013-01-30T04:27:46Z

I agree with Raynos; I think this is too difficult to do right in the near future.

If people really want to do this in a scenario where security isn't a huge issue, e.g. for an intranet app, they can already load the javascript driver into their webpage and execute commands that way.

jdoliner · 2013-01-30T05:01:38Z

I also agree that this is a bad idea. It's difficult to get right and the end product will probably be pretty error prone too. For example suppose you wanted to do a basic banking app, you'd probably have a table of transactions with a user id field on each row. Now you'll need to express permissions based on a predicate over the rows. This gets complicated fast and the stakes of getting it wrong are really high.

al3xandru · 2013-01-30T19:17:25Z

CouchDB which is probably the one db closest to offering the direct web-to-db connection has a security issue reported every couple of months. It's a risky path to go right now. Implementing security features will be important, but let's get there first.

coffeemug · 2013-02-12T22:17:51Z

Quietly moving to backlog...

bitemyapp · 2013-05-10T00:41:35Z

(Better?) Access control is a good idea, but I don't think exposing an HTTP API by default is a good one. I've seen too many exposed ElasticSearch and CouchDB clusters to recommend otherwise.

stuartpb · 2013-05-25T08:16:56Z

Uh, HTTP API aside for a moment, what about the client driver interface? Only binding to the local network interface isn't enough. If I want my DB handled by another service (like MongoLab and MongoHQ for MongoDB) or even just another shard, without authentication, I can't do it without openly binding to an external network interface and giving every other app in the cluster/datacenter/world access to my database.

I prototype my apps in a mock production environment. Right now, this is the one factor keeping me off of RethinkDB.

neumino · 2013-05-25T16:10:36Z

The only way I've seen so far is to use some ssh tunnels.

mlucy · 2013-05-25T21:40:17Z

How would people feel about completely bare-bones access control scheme where:

You register an API key with the server over ssh (or maybe through the web interface).
All connecting clients send the API key when they open the connection.
The server rejects all connections without a valid API key.

(In particular, this means no encryption, no per-database, per-table, or per-datacenter access control, etc. etc.)

I think that's the bare minimum we could do, and we could probably do it fairly quickly (1.7 maybe?).

stuartpb · 2013-05-26T00:45:31Z

What about specifying the API key (or the hash of it) in the config?

mlucy · 2013-05-26T03:11:27Z

We could support that too, but the config is only re-read when RethinkDB restarts and it would be nice to be able to add/remove API keys while keeping the server up.

AtnNn · 2013-05-26T03:32:59Z

The servers would also need to share the API keys, so that clients don't need to remember a different key for each server in the cluster.

neumino · 2013-05-27T00:11:23Z

What about access to the web interface?
That sounds like a lot of work. Or maybe someone has a simple (and secure) solution?

stuartpb · 2013-05-27T01:43:43Z

Allow the web interface to bind to network interfaces separately from the client driver interface and include a tutorial in the documentation on how to set up a secure remote proxy using SSH tunneling / lightweight HTTP servers?

neumino · 2013-05-27T01:48:27Z

That would one way to do it but I'm not sure that could be done if we decide to use only one port (see #768 )

About the documentation for the ssh tunnel, I just wrote it today, so it should be available around next week.

stuartpb · 2013-05-27T01:54:32Z

I don't think combining the ports should be mandatory. (Say I want different firewall rules for HTTP, native client, and cluster connections.) Just mention in the tutorial that you have to split your services into multiple ports to use port-based mechanisms for access control.

stuartpb · 2013-05-27T02:01:44Z

Also, I wouldn't have a problem with having SSH tunneling being the mechanism of access control for the native interface (I mean, heck, Git does it, and they get by), so long as it wouldn't require any additional setup for the client driver (like how connecting to an SSH'd Git server is as simple as entering a remote server URL that begins with "ssh+git://").

mfenniak · 2013-05-27T03:18:35Z

As an access control system, has any consideration been given to utilizing TLS? My thought is to have this work by creating a certificate authority (CA) for a DB cluster, a certificate would be issued for each server, and client (and cluster) connections would be have their client certs verified as issued by the same CA.

Upsides would be: encrypted communication, very secure, could be applied to client->server & cluster->cluster & web interface, well supported by a variety of client environments, should be equally compatible with using only one network port.

Downsides: requires incorporating a library dependency (like openssl, gnutls, etc.), relatively complex server configuration (although could be aided by automated tools), and it might not be a good precursor to a "fine-grained" access control system.

coffeemug · 2013-05-28T03:12:09Z

This issue had a slightly different original intention, but since The People took it in the direction of client driver access, we'll treat it this way.

We'll likely need some version of this for some cool things we're doing internally, so I'm moving this to 1.6. I'm not 100% sure it'll make it, but the odds are good, FYI.

We'll figure out specifically what to do this week.

coffeemug · 2013-05-28T11:08:56Z

Ok, we need a less-is-more solution here to get #892 out the door, so if anyone has ideas (pinging @srh), please add them here.

I think full-blown TLS support might be a bit of an overkill since most of the time the database infrastructure is running on a protected subnet with a clear list of ip addresses permitted to contact the db server. (This might be much more relevant in case of Amazon EC2, though)

One option I see might be to simply support an API key, which wouldn't be impervious to man-in-the-middle attacks, but would be sufficient for most needs (ok in protected subnets, ok on Amazon EC2 with a firewall or for simple cases where security isn't a big risk). We could just support a cluster-wide API key in the semi-lattices, allow setting and changing it via the CLI, and allow providing it to the clients via r.connect(..., api_key=key). This would probably be sufficient right now -- advanced users could use SSH tunneling for more security, and we can choose to support TLS and per-database settings later.

What does everyone think?

stuartpb · 2013-05-28T11:12:30Z

What about supporting a list of keys so different endpoints can have different connection credentials that can be revoked / changed at any time without disrupting other connections?

mlucy · 2013-05-28T12:36:15Z

I think as long as we're very clear that the API key doesn't provide high security, and we document the SSH tunneling solution, that sounds like a good compromise. Adding support for a set of API keys probably wouldn't be too hard.

coffeemug · 2013-05-28T18:45:47Z

What about supporting a list of keys so different endpoints can have different connection credentials that can be revoked / changed at any time without disrupting other connections?

@stuartpb -- That's a really cool idea (and probably very useful in production). I worry that this is a bit outside of the scope of a minimum-viable-feature as it would make things significantly more complicated and delay the release. If we go with the API key solution, I'd be inclined to do a single key for now, and then add support for multiple keys later if there's demand (but we'll wait to get some more feedback on this before deciding).

srh · 2013-05-28T20:25:21Z

I have an idea, use TLS instead of having a completely fake access control system that sends everything in the clear.

mrkurt · 2013-05-29T00:01:26Z

Redis-like-auth (single key, defined in the config file) seems pretty good for minimum viable feature, particularly if you make it clear that it's only for local or semi restricted EC2 traffic. TLS won't buy you much if someone can snoop internal Amazon traffic.

SSL in mongo makes things pretty contorted at times, and adds another level of complexity to drivers ... things break frequently for hard to diagnose reasons. SSH tunneling or similar seems like a simpler way to teach drivers to speak securely.

neumino · 2013-05-30T04:17:37Z

Shipping an access control would a good workaround for #486

Connecting two machines through ssh tunnels is a piece of cake.
Connecting three machines is a whole different story.

coffeemug · 2013-05-31T05:32:22Z

@mrkurt -- I'm curious why it's necessary to implement an auth key for EC2 at all. Here's one possible way to implement security without it:

Put all the nodes in the cluster in one EC2 security group and allow cluster connections only within that security group. This way only authorized nodes can connect to the cluster.
Add a rule to this security group allowing client driver connections from all the groups where the clients run.
Set up a reverse proxy for the web UI that uses http authentication.

The only downside I can see for this scheme is that client drivers can't connect to the nodes from outside of EC2 (e.g. my laptop) without first ssh'ing into one of the authenticated EC2 machines (or using an SSH tunnel). I have two questions:

Am I missing anything?
How important/common is it for client drivers to connect from outside of EC2? I can see it being annoying that I can't connect to an EC2 host from my laptop easily, but arguably if I'm running on EC2 already, my client drivers should run there as well.

What do you think?

coffeemug · 2013-05-31T05:35:02Z

One more thought: another downside of this scheme is that folks running on VPS nodes would have to set up their own firewall rules, which is more complicated than setting up EC2 security groups. Arguably in this case passing around auth keys in plain text won't help them much anyway, so having good docs on how to set that up sounds like a better solution to me.

mrkurt · 2013-05-31T17:35:57Z

@coffeemug For EC2 proper, I wouldn't bother with auth and instead lean on security groups. If you hope you have something that people can you from Heroku or similar, though, IP/security group restrictions aren't good enough (you have to allow Heroku's entire security group). I think a key based auth for use inside EC2's network is reasonably easy way to let those users get going quickly.

I do think this is an interesting problem with probably a better solution out there, though. We've toyed around with various ways of securing Redis/Mongo connections from "untrusted" app server networks (SSH tunnels, proxies, etc), but they're difficult to roll out in an unofficial way to general users without solid driver support. If you guys can come up with a straightforward way to let Rethink clients talk "into" a firewalled cluster, that would be cool.

coffeemug · 2013-06-01T06:45:42Z

Ok, let's go for Redis-like auth (http://redis.io/topics/security).

Here's the draft of the spec. Please feel free to comment, but let's keep it within the confines of this specific authentication scheme.

The users can set cluster-wide auth key (stored in plain-text in semi-lattices) via admin CLI only. They can use the following command via admin CLI to set the key -- set auth [KEYHERE]. To unset the key -- unset auth.
In order to connect to a cluster that has an auth key set up, the client drivers need to connect with the auth key as follows: r.connect(host, port, auth="[KEYHERE]")
The auth key will not be required to access the http server. We'll explain in the instructions how to set up a reverse proxy instead.
The auth key will not be required for the nodes in the cluster to connect to each other. We'll explain in the instructions how to set up proper security groups to make sure unauthorized nodes can't connect to each other.

Tryneus · 2013-06-04T21:05:00Z

Working on this now.

Tryneus · 2013-06-05T22:08:50Z

One thing I should probably point out: changing cluster_semilattice_metadata_t makes the new version incompatible with old metadata files (which results in an ugly crash at startup), so something should probably be done to give people an upgrade path, and to make a cleaner error message.

Tryneus · 2013-06-05T22:34:34Z

@jdoliner has talked me into putting the auth key into a different metadata, which will put off the problem I just mentioned.

coffeemug · 2013-06-05T23:33:42Z

In general, could we just bump the serializer file version to avoid this problem or do we need to introduce a separate metadata version number?

Tryneus · 2013-06-06T00:50:40Z

I don't see how either of those solve the problem in the future. There is still no upgrade path that preserves the cluster configuration, so users would be required to recreate their entire configuration.

Bumping the serializer version should work to let the users know it's incompatible, but if we change the serializer version, we'll require them to reimport all their data without a very good reason. I think we should have separate versions for each layer (i.e. serializer, btree, and metadata), as well as an easy path to migrate a cluster.

coffeemug · 2013-06-06T01:18:22Z

I understand there is no easy upgrade path. For now that's fine -- when we change the metadata we'll ask people to reimport data. It isn't ideal, but for the time being the additional development speed it gives us outweighs the disadvantages.

jdoliner · 2013-06-06T02:00:26Z

This is also all theoretical though seeing as we're not actually changing the metadata for this issue.

Tryneus · 2013-06-11T23:59:44Z

This is implemented and up in review 622 including the updated python client. Ruby and js clients are being updated separately.

Tryneus · 2013-06-13T02:42:24Z

Alright, this has been merged to next in a ton of commits, will be in release 1.6.

fasiha · 2015-05-27T06:08:12Z

When this was closed in 2013, was it closed in reference to @coffeemug's initial thought of HTTP API for browser-only apps (‘if you expose the http api to people, they could write apps entirely in the browser without servers (which is really nice)’), or with reference to client driver access, which this issue morphed into?

I would love to see an HTTP API, or anything that allowed me to write a purely client-side app, without anything server-side beyond RethinkDB—but I fully appreciate that this is likely beyond the scope of this project.

Edit: I also fully appreciate that numerous other non-RethinkDB products have come out in the last 2.5 years to address this need :)

coffeemug · 2015-05-27T17:37:03Z

@fasiha -- we've been thinking about it a lot (and still are). This is definitely tricky -- we don't have a plan for it yet, and don't yet know if it will ever become a core part of RethinkDB. It will take more time to figure it out, but it's definitely on our minds.

This was referenced May 28, 2013

Client Authentication #751

Closed

Ship an Amazon AMI instance #892

Closed

ghost assigned Tryneus May 31, 2013

AtnNn mentioned this issue Jun 11, 2013

Have a proper client handshake. #978

Closed

AtnNn mentioned this issue Jun 12, 2013

Bump serializer version for 1.6 #985

Closed

mlucy mentioned this issue Jun 12, 2013

Change the driver protocol magic number #951

Closed

Tryneus closed this as completed Jun 13, 2013

stuartpb mentioned this issue Apr 17, 2015

Integrate SSH into client driver #4074

Closed

Consider a design for access control #266

Consider a design for access control #266

Comments

coffeemug commented Jan 30, 2013

mlucy commented Jan 30, 2013

neumino commented Jan 30, 2013

coffeemug commented Jan 30, 2013

Raynos commented Jan 30, 2013

mlucy commented Jan 30, 2013

jdoliner commented Jan 30, 2013

al3xandru commented Jan 30, 2013

coffeemug commented Feb 12, 2013

bitemyapp commented May 10, 2013

stuartpb commented May 25, 2013

neumino commented May 25, 2013

mlucy commented May 25, 2013

stuartpb commented May 26, 2013

mlucy commented May 26, 2013

AtnNn commented May 26, 2013

neumino commented May 27, 2013

stuartpb commented May 27, 2013

neumino commented May 27, 2013

stuartpb commented May 27, 2013

stuartpb commented May 27, 2013

mfenniak commented May 27, 2013

coffeemug commented May 28, 2013

coffeemug commented May 28, 2013

stuartpb commented May 28, 2013

mlucy commented May 28, 2013

coffeemug commented May 28, 2013

srh commented May 28, 2013

mrkurt commented May 29, 2013

neumino commented May 30, 2013

coffeemug commented May 31, 2013

coffeemug commented May 31, 2013

mrkurt commented May 31, 2013

coffeemug commented Jun 1, 2013

Tryneus commented Jun 4, 2013

Tryneus commented Jun 5, 2013

Tryneus commented Jun 5, 2013

coffeemug commented Jun 5, 2013

Tryneus commented Jun 6, 2013

coffeemug commented Jun 6, 2013

jdoliner commented Jun 6, 2013

Tryneus commented Jun 11, 2013

Tryneus commented Jun 13, 2013

fasiha commented May 27, 2015

coffeemug commented May 27, 2015