Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Add HTTPS and basic authentication support to elasticsearch #664

Closed
casperOne opened this Issue · 97 comments
@casperOne

This would help with those that are hosting elasticsearch and need security when making calls from another machine which cannot be used with the TCP transport.

HTTPS support should allow for configuration of the SSL certificate from a path, along with any other easily discoverable certificate sources.

@michaelcaplan

I second this. I have legal requirements to protect private patient information in transit and at rest. Native HTTPS, or SSL support would help solve a good piece of this problem.

@whateverdood

+1 - an encrypted transport is required at my client site.

@michaelcaplan

Related to this would be locking down ES node to node communication over SSL as well.

@whateverdood

Yep - I'd like both an HTTPS entry point and SSL inter-node comms.

@jdzurik

I agree +1 for me too.

@ghost

+1000!

@karmi
Owner

I don't think there's a real need for supporting this in ElasticSearch itself? Most installations where this is neccessary could easily use a reverse proxy such as Nginx?

@ghost

I think data is very sensitive and shouldn't rely on validation from third parties.

The data is in ES, ES should do the auth since it knows it's own data structure the most.

@michaelcaplan

The simplicity of native encrypted transport is highly attractive. The complexity and fragility of wrapping all data in transit through an SSL proxy would be a barrier.

@jdzurik

SSL at least ... I mean it's an http transport layer not some abstract peice of code.

@jdzurik

Well since it seems people want this but dons't sound like its something you want to build in directly ... maybe a plugin would be good? I looked at Nginx and while it looks supper light weight and pretty clean it would add another virtual hop and, when you already have the extra weight of SSL it would be nice to have the ability to access ES directly with the least amount of extra overhead. I wonder if http://www.openssl.org/ would be helpful in integrating.

@ibotty

i guess a simple tutorial on how to set up nginx would suffice. nginx is not that hard to setup.

@BenHall

I was wondering if anyone was working on this issue, if not I would like to take a shot.

@casperOne

I don't believe so (and I think Shay has a lot of other things going on smiles). I'd contribute, but I'm a .NET programmer by trade, and don't have the skills in Java to make an effective contribution on this issue.

If you're going to take a shot, note, that they really have to go hand-in-hand; HTTPS is nice, but without some kind of authentication, you don't gain much for it. Additionally, you'll have to come up with a way to specify the credentials (username/password) of the users that can access the cluster.

@skade

I don't have any use-case for HTTPS (i use Nginx as SSL offloader to great effect), but needed HTTP Basic. You can find my plugin here:

https://github.com/Asquera/elasticsearch-http-basic

My two cents on SSL: Supporting SSL does not only mean implementing all the nuts and bolts (certificate management, etc.), it also has to be efficient and safe. There are other projects (stunnel, nginx, your favourite hardware load balancer) that are much better at doing all that. If you want your elasticsearch to speak ssl externally without configuring nginx - bind it to a local port and put stunnel in front of it. This is a common and tested solution.

@whateverdood
@casperOne

@skade: That isn't always feasible; especially when you are on a cloud infrastructure. As an example, Azure, which I've been able to get it to run on, won't let me install any of those.

It might be common and tested, but it's not always applicable.

@prb

@skade Your suggestion for using nginx or stunnel is fine from a whiteboard perspective, but nginx isn't going to deal with cluster communications, and stunnel can be a bit of an operational mess.

@BenHall I'd be interested in lending a hand on the issue as well I would see it as touching two core components, the transport module and the http module.

@karmi
Owner

@prb: That's interesting -- could you elaborate why an Nginx-based proxy (for example) isn't enough? (I don't understand the nginx isn't going to deal with cluster communications part. You can restrict the in-cluster communication based on IPs.)

@prb

Elasticsearch uses the network for both external communications (e.g., over HTTP via the http module or via the transport module) and internal communications (e.g., shuttling data between nodes, cluster membership, etc.). All of that communication should be secured, and not all of it is over HTTP.

@karmi
Owner

@prb: You can disable http on nodes entirely, and they will communicate via transport, which is not open, or? You can restrict how nodes communicate. At EC2, you can further restrict access to 9200/9300 to certain IPs, etc. So I still don't get why is auth needed here. Of course, something like Nginx-based proxy is only meant for authorizing access from outside the cluster.

@asanderson

+1 This is a major deficiency from a government security requirements perspective, and may be a blocker for our project moving from Solr to ElasticSearch, since we have our Solr shards locked down via tomcat.

@kimchy
Owner

@asanderson you can deploy elasticsearch as a war file within a wen container if you want using the wares plugin (check the transport-wares repo). Questions in the mailing list.

@asanderson

@kimchy Excellent! Other than performance, are there any other disadvantages to deploying via war?

@kimchy
Owner

@asanderson not really, same API, quite simply wrapping as well. If you are up to it, would love to get async support to the servlet based on the servlet 3.0 async feature :)

@asanderson

@kimchy good to know.

One of these days, I'd love to contribute, but right now I don't have the bandwidth.

Sorry, but the best I can do is to continue to evangelize ElasticSearch. ;-)

FWIW, I've spent the last 5+ years replacing expensive COTS products with Solr, and I must say that ElasticSearch out-of-the-box seems to address all of Solr's shortcomings, IMHO.

Now, I get to spend the next year or so replacing Solr with ElasticSearch. ;-)

@imotov
Owner

We just released Jetty plugin for elasticsearch. It is a drop-in HTTP transport replacement that exposes full power of embedded Jetty including support for SSL, logging, authentication and access control. It is similar to transport-wares plugin that Shay mentioned above, but instead of running elasticsearch inside a web server, it embeds Jetty web server into elasticsearch. See https://github.com/sonian/elasticsearch-jetty for more information.

@asanderson

Outstanding! Great job! It just keeps getting better and better. ;-)

@ejain

So I can set up a reverse proxy or use the Jetty plugin to secure access to elasticsearch, but now I can no longer use the Java API...

@kevandunsmore

There's been some good progress on this issue from plugin contributors (thanks imotov and kimchy) but I've not yet seen anything that addresses the inter node cluster communications. HTTP proxies like Apache and Nginx won't cut it there, as the communications between nodes are lower level.

So, any form of movement on that front?

@pulkitsinghal

For those using the jetty plugin for SSL:
https://github.com/sonian/elasticsearch-jetty

You can also utilize the Chef cookbook to speed-up your AWS deployments:
https://github.com/pulkitsinghal/cookbook-elasticsearch

@tlrx
Owner

I've made a pull request for SSL support in transport client and node-to-node communications:
#2105

(This does not concern the HTTPS authentication)

@skade

@tlrx
This is a weird coincidence, I wrote something similar during the last week. Its much worse in the configuration department, but it uses a different approach: it makes NettyTransport fit for implementing the SSL functionality as a plugin.

Its a bit different in some regards, but the broad direction is the same.

@otisg

+1 (watching)

@clintongormley

Has anybody tried running the elasticsearch transport over ssh forwarding or a VPN? Or, for that matter, the http transport. If so, any comments about performance impact?

@kirmorozov

This is search engine, there is no point to have HTTPS support, you can organize it at proxy-webserver level.

@asanderson

Absolutely disagree. It is a DISTRIBUTED search engine and there are legitimate security requirements to support SSL across the distributed nodes even if the entire node set is behind a firewall. Trust me. We have clients that require it. Considering that Netty supports SSL and ElasticSearch is built upon Netty, why is this a big deal?

@asanderson

... especially in the world of multi-tenant cloud environments. ;-)

@devoncrouse

Perhaps more of a philosophical question, but why is it important to have these features baked in, if they're already provided by the elasticsearch-jetty project? Works great for us...

@brusic

Many new "big data" systems are built under the assumption that security is handled at a higher-level. MongoDB does not support authentication between nodes (correct me if I am wrong, it has been years since I used MongoDB) and even the earliest versions of Hadoop did not have security between nodes/HDFS.

Clients requiring a feature does not equate it to being an essential feature. If that were the case, I have a few essential features that ElasticSearch needs to support as well. :) I have used embedded Jetty app security before, and it works well. Security could/should be expanded on, but I rather see other features worked on first.

@awick

If you only require https or very simple authentication then elasticsearch-jetty solves the issue.

If you want true authorization and authentication then you need something more. Probably hits those of us using ES as a db and/or multi-tenant more then others. Not that it matters but Mongo and others now support this, but I'm sure its a large ES change under the covers to do at the index level and stop data leakage.

@asanderson

@devoncrouse It's about having to support YET ANOTHER software configuration item which needs to be patched and/or upgraded for this one feature whereas we use Apache Tomcat for everything else. Again, ElasticSearch is built on Netty, and Netty supports SSL, so why wouldn't ElasticSearch leverage that support out-of-the-box.

@asanderson

@brusic Not in the government sector where public key infrastructure (PKI) is required and security certification & accreditation requires encryption between every node. So, it's not about a single customer requiring a feature; it's about a barrier-to-entry for an entire potential customer vertical sector. ;-)

@asanderson

@awick It's about supporting public key infrastructure (PKI) between nodes. My impression was that it would not be a big ES change, since Netty already supports SSL. So, my impression was that it would be another ES optional set of configuration properties that expose the Netty-supported configuration. Of course, YMMV. ;-)

@dangarthwaite

I'm having a devil of a time getting zen transport to work with forwarded ports. If I'm interpreting the tea leaves correctly - I have remotenode connecting to masternode via a forwarded port 9300 - which then immediately tells remotenode that the masternode is to be reached by a completely different IP address.

Master binds to: 192.168.1.30:9300

Client forwards the above to 127.0.2.2:9300 via an ssh tunnel.

Client's elasticsearch has conf file set to find unicast master at ["127.0.2.2:9300]

Client's elasticsearch succeeds in connecting to that port and talks to master.

Master then says master is 192.168.1.20, which is unreachable, client dumps stacktrace.

@thejohnfreeman

I'm with @asanderson here. My installation needs encryption between nodes because users on the internal network cannot be trusted. It's not just a nice feature to have, it is prohibiting adoption of ES at all.

@hrdcore0x1

I second what thejohnfreeman is saying. I cannot implement ES into our environment until I can send and receive from it over an encrypted connect.

@clintongormley
@thejohnfreeman

How does network.publish_host help?

@clintongormley

Sorry @thejohnfreeman, that was directed to @dangarthwaite. publish_host allows you to set the host/port that ES publishes to other nodes, instead of the host/port that it is bound to. Which should fix his issue with port forwarding.

@thejohnfreeman

I see, thank you for the clarification.

@tmaiaroto

The Jetty plugin doesn't have the proper CORS response headers. Specifically when it comes to an OPTIONS request. It would be really awesome to have this feature. Especially when we also have Kibana at play here. It only makes sense.

@salyh

You maybe want have a look here: https://github.com/salyh/elasticsearch-security-plugin (Do not really offer basic authentication but goes beyond it). Supports yet PKI and Kerberos authentication (and authorization through LDAP) for the REST layer.

@MartinHatas

+100 for this feature.

@thefosk

+1

@crisen

+1

@kirmorozov

Please leave core search functionality in core. Extra functionality is available via plugin use it.
Moreover, its easy to add ssl layer with basic authentication via nginx as reverse proxy with authentication.

location / {
    proxy_set_header X-Forwarded-Host $host;
    proxy_set_header X-Forwarded-Server $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_pass http://127.0.0.1:8090/;
    proxy_redirect off;

    # Password
    auth_basic "Restricted";
    auth_basic_user_file /home/passwd/.htpasswd;

    # Don't forward auth header
    proxy_set_header   Authorization "";
}
@mbarrien

@kirmorozov This isn't about SSL for the end user, this is about SSL between nodes for internal communication at the transport layer. That's f*ing difficult to add as a reverse proxy since everything port choice is so dynamic, chosen internally by Elasticsearch, and its hard to override so that it points to Nginx's ports instead of the internal ports that are being reverse proxied.

I argue internode transport is core to Elasticsearch. And if it's preventing deployment of Elasticsearch in government and HIPAA cases (as it does for me), then the feature is needed.

@kirmorozov

Ok, @mbarrien, you have second option, which is VPN. Third option is to alter code on your own. Anyway your HIPA, it adds extra complexity to any system.

P.S. HIPA2 must close access to system, even for users :)

@mbarrien

@kirmorozov No we don't have the VPN option. To host HIPAA/gov data on Amazon EC2, Amazon's agreement requires encryption of data between all machines, even if they are all in the same datacenter/same VPC/dedicated machines. Thus a VPN would have to be set up between every machine to get the necessary encryption; you can't just set up a VPN once into a private area where everything in that private network is unencrypted (like you're proposing), even if it's completely locked down from the outside world and doesn't add any actual security.

@FollowMyDev

Hi all,

I try to install SSL on ES.
Jetty Plugin: 1.1.0-beta
Elasticsearch: 1.1.0

So I followed the steps described here: https://github.com/sonian/elasticsearch-jetty.
When I launched ES, I get this:

D:\TESTS\elasticsearch\bin>elasticsearch.bat
...
[2014-06-06 13:31:39,685][INFO ][org.eclipse.jetty.util.ssl.SslContextFactory] [
myCluster.myHost6D] Enabled Protocols [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLS
v1.2] of [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2]
[2014-06-06 13:31:39,688][INFO ][org.eclipse.jetty.server.AbstractConnector] [my
Cluster.myHost6D] Started SslSelectChannelConnector@0.0.0.0:9443
...
[2014-06-06 13:31:40,048][INFO ][node ] [myCluster.myHost
6D] started

Then, from my web app (https://localhost:9880/head/index.html) which has SSL integrated and the head plugin, when I try to connect to ES by https://localhost:9443/.

I get these errors:
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://localhost:9443/_cluster/health. This can be fixed by moving the resource to the same domain or enabling CORS. health
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://localhost:9443/_nodes/stats?all=true. This can be fixed by moving the resource to the same domain or enabling CORS. stats
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://localhost:9443/_nodes. This can be fixed by moving the resource to the same domain or enabling CORS. _nodes
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://localhost:9443/. This can be fixed by moving the resource to the same domain or enabling CORS.

And in ES i get this error:
[2014-06-06 13:43:36,667][WARN ][org.eclipse.jetty.io.nio ] [myCluster.myHost
6D] javax.net.ssl.SSLException: bad record MAC

Could you help me?

@jerrac

Anyone know of a post or mailing list issue that explains why systems like Elasticsearch and MongoDB choose to not support encryption and/or authentication? In detail, not just "you can use this other tool to do the job", but "here's the negative effects on our product supporting this would have, and here's why it's not worth it."

I, personally, have always found it odd when products like this don't support authentication and encryption. Enabling that kind of security is something I consider a default action. Even if I'm behind a firewall or otherwise secured. What happens if something unrelated to ES get's compromised and the attacker can now scan traffic between ES nodes? Or just starts querying port 9200?

The nginx option is not satisfactory to me. It means I'd have to configure it on every single ES node, and that I'd have yet another tool that might break or present security holes. And that's assuming i could figure out how to get nginx to secure data transfer between nodes, as well as basic authentication. I've had less than stellar experiences with nginx in the past, so...

Using the jetty plugin is also not appealing. It's a third party product that might be trustworthy, and could easily lose support down the road if the company supporting it stops using ES for some reason, and the developer doesn't have the time to do it on his/her own.

Ultimately, I want this kind of feature embedded in Elasticsearch itself. No third party's needed. That means that when ES get's updated, so does that feature. It means less configuration for me. It means I don't have to trust another product with my data.

Anyway, all that noise aside, here's yet another vote for secure data transfer between nodes, and basic authentication support for the API (if that's what people wanted, I think it is....)

@erikringsmuth

Implementing TLS/SSL on both HTTP and the Transport protocol looks simple. Netty has built in support for TLS/SSL. All we need to do is add the Netty SslHandler as the first step in elasticsearch's NettyHttpServerTransport and NettyTransport pipelines.

The Netty SslHandler is documented here.
http://netty.io/4.0/api/io/netty/handler/ssl/SslHandler.html

This article gives a quick overview of using TLS in Netty. Look at step 7 for example code.
http://maxrohde.com/2013/09/07/setting-up-ssl-with-netty/

SSLContext serverContext = SSLContext.getInstance("TLS");
// load the keystore
...
pipeline.addLast("ssl", new SslHandler(serverContext));

This could all be configured through a few new properties. Something like this:

ssl.keystore.location: /config/keystore
ssl.keystore.password: password
http.ssl.enabled: true
http.ssl.port: 9243
transport.tcp.ssl.enabled: true
transport.tcp.ssl.port: 9343

A few plugins and pull requests have done partial SSL implementations. They either worked on the HTTP or Transport protocols but not both. We also don't need the extra features the plugins add.

Performance of TLS/SSL has been mentioned quite a bit. This is a real concern, but adding TLS/SSL support will never force a user to enable it. The performance vs. encryption trade-off is worth it for the users who are required to encrypt their data over the wire.

We could go as far as implementing SPDY Netty example. The slowest part of TLS/SSL is the RSA encryption during the handshake. This still happens in SPDY but SPDY uses persistent connections, multiplexed streams, header compression, and a few other optimizations to speed it up after the handshake. It would probably be worth looking into after TLS/SSL is implemented.

Does this seem like a reasonable proposal? Adding the Netty SslHandler should be a pretty small amount of work. We just need to keep it focused on TLS/SSL and not do authentication or authorization like the current plugins.

@jprante

@erikringsmuth I agree with @kimchy that SSL adds close to nothing regarding security for many reasons.
Imagine I want to implement the Amazon / HPIAA story seriously, I would have not only to set up SSL between nodes, but also to secure the keystore file at /config/keystore.
How do I protect a keystore file at /config/keystore from being copied while the whole server is running in a hostile virtual environment and nothing is known about the physical host? That is impossible. The only safe place for the keystore file would be outside Amazon virtual host (or other HPIAA cloud providers). But then, how would I secure the access to the keystore? And there we are - I couldn't. There is no security in this solution.
So I warn about that SSL, when running between nodes of Elasticsearch while ignoring unsafe environment condition, can suddenly be assumed secure, or even HPIAA compliant. Also because of the nightmares of a method that has not proven to be secure.

@mbarrien

@jprante Securing of persistent material on disk for HIPAA is orthogonal to securing the network connection. We already have best practices and documentation to comply with HIPAA documentation requirements (keep in mind that HIPAA doesn't necessarily dictate specific protocols or specific encryption; HIPAA mostly dictates the documentation of it, and it's up to auditors and customers to decide if it's secure enough, even if hosted on Amazon virtual hosts).

And as stated before, yes this doesn't necessarily add security, but then again security is always about end-to-end security; don't think people don't know this. However, the agreements to host HIPAA services on Amazon require encryption between nodes. This is a contractual requirement between Amazon and any service wanting to store HIPAA on Amazon, not a HIPAA requirement. As of now, this will preclude Elasticsearch because of the inability/extreme difficulty to encrypt the internal communication inside an Elasticsearch cluster, as contractually required.

So please let's not add extraneous issues such as perceived security or disk security for securing the network connections, which is all this asks about.

@erikringsmuth

@jprante Encryption is one part of security as a whole. I understand very well that TLS/SSL isn't a one stop shop for security. You also need to check data integrity, user/product identity, authentication, and authorization.

At this point I'm more concerned about authentication and authorization. I work for a company with 65k employees and once you're in our internal network you have HTTP access to our servers. I'm trying to prevent the average Joe from calling DELETE /* and similar destructive actions.

My team wrote a simple plugin to do basic authentication for products and another form of authentication for users. Unfortunately basic authentication is useless unless the message is encrypted since it only Base64 encodes the username and password. Adding encryption will make basic authentication a usable option.

The other option for authentication is to use digest authentication to sign every message with an HMAC. Amazon Web Services does this http://docs.aws.amazon.com/general/latest/gr/sigv4-signed-request-examples.html. Digest authentication calculates a hash of the message and sets it on the Authorization header. It never sends a password or secret key so you don't need to encrypt the message. I would love to see elasticsearch announce an official digest authentication scheme, but they have to specify how to sign the message before any elasticsearch clients would support it. Right now this isn't a viable option.

Back to the Amazon / HPIAA story. I haven't dealt with keystores on insecure hardware so I can't say what the solution is there. I'm working with in-house servers and so are many other elasticsearch customers who still need encryption. Adding TLS/SSL support will not hinder anyone. It will give a lot of elasticsearch customers a critical piece of security.

@skurfuerst

Hi everybody,

I'd also just like to :+1: to implementing SSL as optional feature for server-to-server communication; or otherwise open up the API a little to be able implementing that as a module.

Of course one needs to deal with the key exchanges oneself, but that can be done by server orchestration systems easily...

Greets, Sebastian

@nickminutello

I will add another datapoint here:
1) In most corporate environments, there will be a requirement for some level of security. Either we would have to turn the http access off (which is horrible from a support point of view) or we need some basic level of authentication so we can tick the box that says "no, its not wide open to anyone with a browser".
2) using nginx etc etc is way overkill - and raises another problem of having another piece of tech that requires some level of approval (or at least a means to be able to support it ourselves).
3) Ultimately, our requirement is not to make elastic internet-safe - but safe enough to satisfy corporate security oversight.

I'm going to look at the plugins/jetty integrations to see how simple they are.
But having some basic/crude security would be better than none for us.

FWIW, my 2p

@jprante

Nginx is not overkill. Jetty is overkill.

# yum install nginx
# htpasswd -c /etc/nginx/conf.d/es.htpasswd es

Add this to /etc/nginx/conf.d/es.conf

server {
  listen *:80 ;
  server_name yourhost;
  access_log /var/log/nginx/es.access.log;
  root /var/www;
  location / {
      proxy_pass http://127.0.0.1:9200;
      auth_basic "Restricted Elasticseach";
      auth_basic_user_file /etc/nginx/conf.d/es.htpasswd;
  }
}

Add this to config/elasticsearch.yml

http.host: 127.0.0.1
http.port: 9200

Restart services

# service nginx restart
# service elasticsearch restart

and you have a minimal HTTP Basic authentication for Elasticsearch.

For Kibana, also note this

https://github.com/elasticsearch/kibana/blob/master/sample/nginx.conf

If your "corporate security oversight" is satisfied with this, do no longer trust them. It isn't security. It's just a plain authentication.

@nickminutello

Thanks Jörg.

Nginx is not overkill. Jetty is overkill.

I agree with you. Jetty happens not to be overkill in my case because I am already using it to host the app's rest services.

yum install nginx

We fail at that first step in most corporate environments because we typically don't have the access to install anything and moreover most package managers are removed from the server installations. We have to go through the bureaucratic process of requesting someone else install it for us - and all the attendant hassle/futility that typically involves. It sucks. But thats the env we tend to have in most large corporates. Stupid and slow bureaucracy. Stuff that can be 'installed' by tar -zxf rules.

That said, thanks for the info above - will definitely be handy (even if not immediately).

If your "corporate security oversight" is satisfied with this, do no longer trust them.

Ha. Most corporate security oversight bods know next-to-nothing about actual security.... sadly...
In cases where we actually care about the security we often have to fight with them to take necessary measures. In cases where it really doesn't matter, we just have to meet the minimum standard.

In the end, the Servlet transport for elastic https://github.com/elasticsearch/elasticsearch-transport-wares/ has ticked our box. I am hosting the servlet in our existing jetty, fronted by our existing security filter.

@thomascramer

@mbarrien I'm also in the same boat with having to secure ES for Amazon's HIPAA compliance agreement. I'm curious if you or anyone has come up with a solution to easily do ssl with tcp transport. As you've pointed out VPN type setup really doesn't work for this. I've been trying to setup something with iptables and ssl tunnel to try to create a transparent proxy; but can't seem to get it to work quite right. Though, in general I'd like to avoid a web of ssl tunnel's though.

I do think that securing the http transport is trivial enough and thanks to those who have posted their examples and insight; but I think there is still some interest here on securing the tcp transport. Wether it diminishes performance or if it is truly a reasonable security measure (points I perfectly understand) are besides the point in my opinion, it is more about helping us deal with the cards we are dealt. Thanks!

@javadevmtl

A few more points...

Using reverse proxies or even any of the added basic http plugins mentioned only allows you to protect ES from the "outside" world.

Further more if you do firewall off 9300 and only open up http, you basically pretty much lose all the nice functionality you get with the native JAVA client.

And from the "inside" world, anyone with a bit of will could simply spin up a quick java app that connects to 9300 and do what they want... but what do I know...

Also how do you explain to your security manager... "So I want to take data which inside our nice secure SQL DB and put it on this search thingy which has NO security what so ever"

@javadevmtl

Or a user running inside an authenticated environment can connect to ES on port 9300 un-authenticated and do what they want.

@jerrac

Assuming you set up firewall rules to only allow specific sets of ip addresses to connect to the ES ports, how hard would it be for someone to spoof one of those ip addresses?

As in, an attacker gets in, notices that you use logstash, and then spoofs the ip address of a server that can connect to ES in order to delete logs from that day, or maybe figure out what logs relate to the attack, and only delete those logs.

As I think about it, I really don't like the idea of using the firewall to limit what servers can connect to the ES cluster. You'd have to update the rules on every ES server every time you added a node. Or open it up more than is safe...

I'd still like to see a detailed explanation of why ES doesn't include user authentication, or ssl authentication between nodes. Relying on a web server like nginx to authenticate means you have to run the web server, not really something I'd like to do when I'm trying to minimize attack surfaces. Relying on a third party plugin means I'm out of luck if that third party stops supporting it.

Plus, well, every SQL database I know about offers user authentication. Why should ES do the same? What aspect of ES's (or NoSQL DBs in general) infrastructure eliminates the need for it?

@blakeja

We have ES installed on Windows and in our case, we do not have the option of using Linux/nginx whatsoever. Nginx support for Windows is terrible (connection/worker limits) so that is off the plate. The few plugins I have looked at which add some type of auth/ssl have issues mentioned by others in this thread.

It would be ideal for those of us that are unable to run nginx and not willing to use the plugins, for whatever reason, if ES offered native support for auth/ssl as part of the core package.

@javadevmtl

@blakeja You can use stunnel it works on Windows also. But I must admit it's a B I T C H to setup. I mean stunnel itself is fairly straight forward. But getting ES to play nice with stunnel requires a bit of thinking and knowing very well how ES communicates between all the nodes. Plus MOST the solutions proposed above don't really scale (management wise) well in large clusters anyways.

But good news the security stack has been announced: http://www.elasticsearch.com/products/shield/

:)

@clintongormley

Shield has been released - closing

@mbarrien

Is Shield released in the open source project?

@jmacmahon

No, but there is an upcoming project which seems to do similar things to Shield. https://github.com/salyh/elastic-defender not sure what state of development it's in though...

@jerrac

No actual code in that project...

As for Shield, I really wish they had released it as part of ES. I get that they need to be making money, but the lack of authentication is a glaring hole in the feature set. Nearly every other database product has authentication, ES at least needs built in users per index, or something like that. Maybe, user A has access to index logstash-, user B has access to index applicationA? At least IMO.

@jmacmahon
@brusic
@jprante

There are a lot of folks out there who assume every piece of software has some measures of security built in and works even in "hostile" environments, like cloud, Amazon etc.

The challenge is to keep the search engine code in ES as simple as possible without getting overloaded by "security concerns". Note there is no "security" software that can meet 100% expectations of all users. Shield has also some weakness in setup, which is no surprise because it is still up to the admin skills to set up a "safe enough" environment.

See also MongoDB security flaw which exposed thousands of databases to the public internet because admins were not skilled enough:

http://www.mongodb.com/blog/post/mongodb-security-best-practices

Jörg

@jerrac

To preface this, the ELK stack is open source, and free. I deeply appreciate the work the various contributers have put into it. I'm glad the ES company exists and is making money so they can pay many of the contributers. Heck, I've contacted sales asking about how much support would cost my workplace in the hope we could afford it. (Though, I never got a reply...) I say this because my comments below could come across as somewhat demanding or entitled. I tried to word things as well as I could, but I'm not sure how well I did.

If I were a java developer, and I had the time, I'd be trying to create a pull request to add the security support I want. As it is, I'm a sysadmin, and what developement work I do is usually in PHP. So...

So far, the only reason I've heard for not adding security from the start, is that it adds extra work to the ES teams lives.

It's not like they think it isn't needed/wanted. If that was the case, Shield would not exist.

If it's more that just the extra work, I'd love to read a post that goes into details. If it is just the work, then I'd love a detailed explaination of how that's a valid excuse.

From my perspective, it's just plain common sense to add, at least, basic auth to the http api. Not having that, just doesn't make any sense to me.

Why using third party plugins or a proxy server isn't a good idea has already been addressed in this thread. But, to summarize my thoughts: It complicates deployment, and makes keeping things secure harder. You have to figure out if third party plugins are secure, if they stop being developed, you have to replace them. Adding a proxy server is yet another thing using resources, and yet another thing to configure and keep up to date. Basically, on an ideal ES node, the only thing other than basic OS/Logging/monitoring stuff running should be ES.

@pulkitsinghal

It complicates deployment, and makes keeping things secure harder. You have to figure out if third party plugins are secure, if they stop being developed, you have to replace them. Adding a proxy server is yet another thing using resources, and yet another thing to configure and keep up to date

@jerrac - Have you already looked at service providers like https://www.found.no/ who've been offering the security layer for ES for quite a long time now? Do they not meet your needs?

@jerrac

@pulkitsinghal We need the data to be stored in our data center, and our budget is really really tight. So, a third party host isn't something we can consider.

@pulkitsinghal

Gosh I'll sound like an arse now ... when you say:

If it is just the work, then I'd love a detailed explanation of how that's a valid excuse.

Aren't budget constraints and time constraints the same thing? Isn't it fair for them to hold-off like you do?

As for your on-premise piece, you should email Found, they may be willing to do an on-premise install or something for you depending on the price. I'll try to think of more constructive approaches as well and reach out as ideas come. You cam IM me in gitter.im too!

@jerrac

Aren't budget constraints and time constraints the same thing? Isn't it fair for them to hold-off like you do?
That's why I asked for an explanation. Maybe it is valid. But, well, when it comes to security, you should at least try to get it taken care of. Not deliberately not deal with security.

I recently put a lot of work into closing a security hole on a service we're trying to sunset. It certainly would have been easier to just leave it alone, but it would have left users at risk. That isn't something that is acceptable.

So, I am willing to believe that the ES team has reasons for why they didn't implement any security. I just want to know those reasons. 'Cause I can't think of any good ones...

@awick

I also want elasticsearch.com to make money and stay in business, and think support and plugins like marvel make great sense to offer for a fee, but having to pay for basic security just seems wrong and broken.

And when you hit things like

http://www.elasticsearch.org/blog/scripting-security/
http://www.elasticsearch.org/blog/elasticsearch-1-4-3-and-1-3-8-released/

you start to wonder. Yes you shouldn't run ES "on the internet" but if someone is "in your network" and scanning that advice doesn't matter. (More common then you might think.)

Maybe splitting Shield into a Free vs Pay might help?
Free might include

  • ip filtering (iptables can be hard, and I wouldn't had to worry about dynamic script holes :)
  • basic auth with ES implementation and no roles
  • https

Pay could add

  • LDAP, MSFT, ... integration
  • auditing
  • full role support

The other issue is you can't buy Shield without a support subscription right now. Those of us writing/using open source software can't afford that.

I guess we are all waiting for elastic-defender :)

@jprante

@jerrac in the beginning, ES started just as a toolbox with kimchy as the sole developer, it was not product ready. Read kimchy's comments from late 2010 at http://elasticsearch-users.115913.n3.nabble.com/Document-level-security-and-Connectors-td1818688.html then you understand that security feature are orthogonal to search features and could have been easily added by external software add-ons (as it is now, you can still protect ES today without Shield). It was a matter of priority for completing the core features first.

@kimchy
Owner

Heya fellows, let me try and share my thoughts around this:

I have always claimed that "just adding basic auth" to ES is not enough, I deeply feel that when it comes to security, you need to have "rounded" features that are defined by what you are trying to achieve, and then how its implemented.

Specifically, basic auth as a generic feature in ES itself (and not implemented through a proxy, which is quite simple, check our nginx blog about how to get it configured quickly), means having the ability to authenticate. But that spans more than just HTTP, its also for things like transport client, and potentially node to node authentication too. Sometimes "just" http is good enough, not as a generic function in core ES (where it wouldn't be enough) , but provided by a proxy.

The full scale of security was implemented in Shield, its was easier to just take the whole set and implement it in a single package. Note, we went back to ES open source and made sure Shield can be implemented as a plugin to add all the security features, its not a closed source redistribution of ES. This is super important to me personally and generally in work we do, I think that the ability to add it as an extension to open source is very important.

In the future, as the usage of ES increases and expands more, and as our usage grows, I definitely can see a case where some "rounded" security features end up being contributed back to the open source. Its a whole different set of work in terms of engineering (compared to building it in a single package). The authentication example in this case definitely falls into this bucket, but then if it happens, its scope will, and must be, more than "just" http basic auth.

Just a note of the amount of effort we put into open source, trying to give it some color, as I saw its mentioned, today we have close to 80 developers working at the company, almost all of them focused on projects like Lucene (we have 8 Lucene committers and a lot of work goes into Lucene every day), Elasticsearch, Kibana, Logstash, language clients (perl, python, .NET, ruby, ...), Hadoop and more.

@jerrac

@kimchy So, you want to do security the right way, if you're going to do it at all. And the "right" way involves a LOT of work. I can understand that. I've always tried to do things "right" the first time, even if that takes more initial work.

I guess where I disagree with you is that I view security as a feature just as important as every other feature in ES. Even if it doesn't directly relate to what ES is supposed to do. To (hopefully) exagerate, if my data isn't secure and my users are having their identity stolen, or my servers are getting profiled for future hacking, or a disgruntled employee is wiping my logs, then I don't really care how well search works.

Here's an imaginary situation, I have no idea how viable this attack would actually be... An attacker owns a server in the same dmz as ES, and are able to see that ES exists (nmap scan). They then add a node to the cluster that copies all the data out of it, and sends it to their servers. Or they just wipe it. Setting some kind of basic auth between nodes, and on the http api, could help prevent (or at least reduce the damage done) that from happening. Since they'd then have to actually own an ES node, or a server that stores the auth info, to do anything.

Compare that to mysql. They know mysql exists, but they can only access what databases they have credentials for. Or they'd have to own the entire mysql server. With ES, it's just, "oh, hey, I can connect to the ES api, let's delete everything."

Some form of auth wouldn't fix all the security issues, but it sure would help.

Anyway, you've answered my questions about why auth isn't part of ES by default. If nothing I've said makes a good point to you, I'm willing to just leave it here.

(Side note: Heh, I was just imagining how painful it would be if I had to build a database server for every single mysql based app I manage... If they were based on ES, I'd have to have 20+ ES clusters running, or some kind of complicated custom layer between the ES api and the apps...)

@skurfuerst

Hey everybody,

I've been following this thread for quite some time; and I can fully understand "both sides" of the equation. I don't really have any facts to add, but I think it's really really awesome that you, @kimchy, joined the discussion as the project lead -- even though you have a big company to run and I bet lots of other tasks.

Besides that, I really like the constructive tone in the whole thread...

For me that just resembles how great of a community / product ElasticSearch is.

Thanks everybody :+1:

Sebastian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.