Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault CLI SSL not behaving properly through reverse proxy #611

Closed
ericchapman opened this issue Sep 11, 2015 · 18 comments

Comments

Projects
None yet
3 participants
@ericchapman
Copy link

commented Sep 11, 2015

I have Vault configured to use SSL. In a browser, I can access the Vault server with no issues. When I use the command line CLI, it is defaulting to the port 443 even though it should be accessing 8200. My setup is as follows

Vault:

Vault v0.2.0
Config:
backend "consul" {
  address = "consul.thehq.io:8500"
  advertise_addr = "https://<PRIVATE IP>"
  path = "vault"
}

listener "tcp" {
  address = "0.0.0.0:8200"
  tls_cert_file = "/path/to/cert/cert.pem"
  tls_key_file = "/path/to/cert/key.pem"
}

NGINX Load Balancer:

upstream vault {
    least_conn;
    server <IP 1>:8200 max_fails=3 fail_timeout=60 weight=1;
    server <IP 2>:8200 max_fails=3 fail_timeout=60 weight=1;
    server <IP 3>:8200 max_fails=3 fail_timeout=60 weight=1;
}

server {
   .. Config Information
}

When I hit my reverse proxy via URL, I get

URL: https://vault.<domain>/v1/policies
Return: {"errors":["missing client token"]}

When I use the CLI tool, I get

%> vault policies -address=https://vault.<domain>
Error: Get https://10.136.8.178/v1/sys/policy: dial tcp 10.136.8.178:443: i/o timeout

Notice it is forcing the port to 443. It looks like Vault CLI is not respecting my reverse proxy settings and still going to port 443.

@jefferai

This comment has been minimized.

Copy link
Member

commented Sep 11, 2015

You didn't paste your nginx config, so I can't look at the setup and I'm not quite clear what it is you want here. Do you want the CLI to be going through your reverse proxy on port 443, or is the CLI going through your reverse proxy and you don't want that, you want it to go straight to Vault?

@ericchapman

This comment has been minimized.

Copy link
Author

commented Sep 11, 2015

Thanks Jeff. My point was that my reverse proxy is working through the browser and NOT working through the Vault CLI meaning it is setup right but Vault CLI is trying to force port 443 after the lookup instead of 8200. Here is my entire config

upstream vault {
    least_conn;
    server 10.136.8.177:8200 max_fails=3 fail_timeout=60 weight=1;
    server 10.136.8.178:8200 max_fails=3 fail_timeout=60 weight=1;
    server 10.136.8.179:8200 max_fails=3 fail_timeout=60 weight=1;
}

server {
  listen 80;
  server_name ~^(?<subdomain>.+).<domain>$;
  return 301 https://$subdomain.<domain>$request_uri;
}

server {
  listen 443 ssl;
  server_name ~^(?<subdomain>.+).<domain>$;

  ssl_certificate     /path/to/certs/cert.pem;
  ssl_certificate_key /path/to/certs/key.pem;
  ssl_verify_client off;

  proxy_set_header   X-Real-IP  $remote_addr; # pass on real client IP
  proxy_read_timeout 900;

  charset utf-8;

  location / {
    proxy_pass https://$subdomain;
    proxy_set_header Host $http_host;
    proxy_set_header X_FORWARDED_PROTO https;
    proxy_ssl_certificate      /path/to/certs/cert.pem;
    proxy_ssl_certificate_key  /path/to/certs/key.pem;
  }
}

I actually just noticed something else, it is trying to access "10.136.8.178" which is the private IP of the upstream lookup meaning I don't think it is working at all. Note that I know the reverse proxy config is correct given the browser hits the API no problem. Is there something special I need to do for Vault CLI?

@jefferai

This comment has been minimized.

Copy link
Member

commented Sep 11, 2015

The -address parameter to the vault client doesn't automatically select a port of 8200. Without a port specified, it will use the system default behavior -- which will see the "https" prefix and assume port 443. So if you're trying to get your Vault CLI to not go through a reverse proxy on 443, put :8200 at the end of the address.

The reason it's trying to access the private IP is because that's the address you put in the advertise_addr field. The advertise_addr parameter is a value given to clients when a Vault server needs to redirect them to the leader -- the leader's advertise_addr will be used. This affects any client that properly follows 307 redirects. So if you want any redirects to go back through nginx, you need to specify nginx's external-facing IP. But if this means that you set the advertise_addr the same for all of your Vault servers, you should make sure nginx is going to select the correct server behind it (the current leader), or you could end up having a long chain of redirects until the client lands at the right one.

@jefferai

This comment has been minimized.

Copy link
Member

commented Sep 11, 2015

FWIW, I'm not sure what your end-goal is, security-wise, but you have a transitive trust issue using nginx like that -- a process that compromises nginx can see all traffic between your clients and the Vault backends. In terms of security, you're better off doing a straight tcp proxy; nginx's support for this is rather limited, but haproxy works very well for this.

@ericchapman

This comment has been minimized.

Copy link
Author

commented Sep 11, 2015

Thanks Jeff. My end goal was to have Vault be HTTPS at the Vault level meaning even encrypted on my internal network. I was simply just trying to get the Reverse Proxy/LB to expose the server cluster externally in a load balanced configuration. I'll look into the TCP proxy.

I see on the advertise IP. This was all working great on my internal private network with HTTP and the LB handling the SSL. That would explain the issue I am seeing. Looks like as you stated it is forwarding it to the leader but the leader is a private IP so it is breaking.

Thanks, you can close this issue as user error. I'll figure out a better topology.

@jefferai

This comment has been minimized.

Copy link
Member

commented Sep 11, 2015

@jefferai jefferai closed this Sep 11, 2015

@ericchapman

This comment has been minimized.

Copy link
Author

commented Sep 12, 2015

Thanks Jeff. You hit the nail on the head with the difficulty I am seeing here. I realized I had several problems.

Advertise Address:: I was advertising the internal address so if the LB didn't hit the leader, it was forwarding to the leader but externally you couldn't see it. I realized the browser was also breaking the same way the CLI was. Also I realized the CLI was actually working some of the time, aka, when the LB resolved it to the leader. Once I saw they were both breaking the same way, it then became apparent. At first I thought the browser was working 100% and the CLI 0% so my debug led me in the wrong direction.

HTTPS: I had vault configured to provide the HTTPS (trying to make it encrypted everywhere) but was getting a MITM on the LB when trying to forward. Hence why I was terminating the LB with SSL as well.

CONSUL: I am using Consul to dynamically configure the LB using service discovery but as you pointed out I don't know who the leader is from that directly without an extra step to ask Vault.

After thinking about it, I realized I have to almost treat the entire system as being public to get this to work. Working setup

  • Have Vault handle HTTPS (had this before)
  • Advertise Public IP Address
  • Open and bind to public port

With the above settings the redirect from hitting a "non leader" resolves correctly since the advertise address is to the public IP and port. System is working as expected now. Next thing will be to replace the LB with HAProxy to do a TCP forward rather than actually terminating the SSL

Thanks for your help.

@jefferai

This comment has been minimized.

Copy link
Member

commented Sep 12, 2015

That all sounds good!

@ericchapman

This comment has been minimized.

Copy link
Author

commented Sep 12, 2015

Hey Jeff,

Sorry, one more question. I hit a mental conundrum on this and not sure where to turn. As you know, I now have my vault server terminating HTTPS using my wildcard cert for "*.example.com". Lets say I want to use Consul service discovery to find the vault server and lets say it is registered as the name "api" with the tag "vault".

Externally this works fine since I would type "vault.example.com" and it would hit my proxy and eventually the Vault server and all good.

Internally I would simply use the name "vault.api.service.consul" and it will resolve fine. The problem is that since the vault server is now terminating https, I will get a cert mismatch since it is not of the name "*.example.com" but rather "vault.api.service.consul".

To try and alleviate this, I set my consul domain to "example.com" so now to use discovery, I would type "vault.api.service.example.com". Problem I have there is that the cert is only for one level deep of sub domains. I read allot of posts with people saying it is possible to get a "..example.com" cert but it may break IE, more expensive since only a few providers do it, etc.

What is the recommended way to handle this scenario? I see the following options

  • Internal Proxy that is using Consul Template to populate the config. This would allow me to use the url "vault.example.com" internally but still be using service discovery to populate the proxy. Problem here is that I would still need to know where the proxy is.
  • Just use HTTP everywhere and assume my private network is secure and only terminate SSL at the proxy (don't like this. Just stating it as an option).

My internal radar is going off that I am missing something fundamental here but not sure where it is.

@ericchapman

This comment has been minimized.

Copy link
Author

commented Sep 12, 2015

Hey @jefferai ,

I promise I will leave you alone after this. I hit a weird conundrum with the way Vault does HA and just wanted to run it by you. It is still relevant to this thread.

As you know, the cluster will elect a leader and then forward API requests to that leader using the "advertise" address of the leader. Problem occurs when using HTTPS. The advertise address has to be a host name that matches the certificate and cannot be an IP address. Possible solutions

  • Spin up your cluster and then in DNS statically link to each node using "vault1.", "vault2." etc and then use these static links as the advertise address. Hook all of these to the proxy and then no matter which one the proxy picks you will find the master and get a certificate match. My issue with this is that you seem to be bypassing all of the cool features of Consul and service discovery in general by statically mapping to the servers
  • Spin up your cluster and then before making any real requests, figure out who the leader is via the /sys/leader call and then use "/etc/hosts" to map a domain name access to that IP address so the certificate matches. This seems really hacky
  • Hook up all servers behind a proxy, point DNS to the proxy, then have all servers advertise that same address. This will keep forwarding you back to the proxy until you find the master. This seems REALLY hacky.
  • Don't use HA and put a watch on the service that if it goes down, you spin up a new Vault server and unseal it with the old values. I am leaning towards this one since all of the issues go away.

I think the root of the problem here is that Vault forwards the request rather than allowing each server to process the request. I think of Consul where I can hit the API on any Agent so I can easily cluster them and not think about it. My guess here is that you guys have some wicked corner cases of simultaneous accesses that are unsolvable without server RPC so you had to punt and just forward the request using a 301 to the master.

I am not really requesting a change here since this seems like a logic overhaul, but just kind of pointing out that if you are trying to create truly dynamic high availability solutions, your HA implementation does not behave nicely with HTTPS.

Please let me know if there is some other possible solution that I missed above that would help me get this thing playing friendly with HTTPS and still take advantage of service discovery. I am thinking large scale where I can automate the bootstrap of entire data centers meaning I am trying to eliminate any manual step all together.

@jefferai

This comment has been minimized.

Copy link
Member

commented Sep 12, 2015

It's possible that some future enhancement would be to allow Vault to make requests on a client's behalf rather than redirect, like Consul does, but as you noted this opens up some cans of worms. So let's look at how things are now.

The way I see it, you have a couple of options:

(1) Use a reverse proxy that is constantly running health checks to always send the external client to the leader. In my opinion, this is probably your best bet as you can then do straight TCP proxying, but whether you care about that strongly depends on what your security needs are. At the same time, how difficult this is to configure depends on a number of things, including which reverse proxy you want to use. I think you'd be best off with haproxy here; not only can you do TCP proxying, but it also supports HTTP status checks to a custom URI (https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#option%20httpchk) and allows you to specify what status codes you want to see to indicate good or bad status (https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#4-http-check%20expect). FWIW, in my own usage I had gone down the nginx route for a while due to familiarity with it, but in the end switched to haproxy because I just needed the features it supported. The wrinkle here is if you want to do dynamic service discovery with Consul. You could use consul-template to write a haproxy config file, but due to the way that haproxy reloads it can be super racy. I ended up writing a reloader script (https://gist.github.com/jefferai/8b1d226b148c15110df3) that worked well enough for what I needed. But if I didn't expect to have my Vault servers changing around, if I had to do it again, I would just hardcode it, rely on status checks, and only reload it when I literally needed to add or remove potential servers.

(2) Use external certificates on the reverse proxy, and internally run your own CA (potentially using Vault's PKI backend) and issue your own certs. This may help, but then you still have to deal with redirects -- either they have to go to an external address that then maps directly into an internal address (e.g. you have vault.x.com, vault1.x.com, vault2.x.com and vault3.x.com and configure your reverse proxy to send requests for specific hosts using the SNI header to specific backend servers), or you have to basically do what I suggested in (1) above anyways.

Both of those options should solve your problem, but it does mean that internal clients then need to go through your external balancer/reverse proxy...which is probably not a problem.

@ericchapman

This comment has been minimized.

Copy link
Author

commented Sep 12, 2015

Thanks @jefferai

I didn't think of the health checks. Also, your suggestion in (2) of using Vault's PKI backend to create internal certificates may solve my other question. I was trying to just use the external cert throughout the entire system but that may actually be burning me. My only true requirement here is that everything, whether internal or external, is over TLS.

Thanks for your help and the additional suggestions. I'll step back with the knowledge I have now and decide how I want to define the overall topology. I kind of just got everything working without SSL internally and then was trying to super impose it on to the system and that is clearly causing me allot of headache.

@jefferai

This comment has been minimized.

Copy link
Member

commented Sep 12, 2015

Sure. Vault's PKI capabilities are pretty good at this point but there will be more coming later, including the ability to self-generate CA certs and sign CSRs. In the meantime, if you create your CA cert elsewhere, you can do a lot with what's there now.

@ericchapman

This comment has been minimized.

Copy link
Author

commented Sep 13, 2015

Just as a note if anyone else hits this thread googling and it can help them out.

There is a health check on Vault at "/v1/sys/health" that only the cluster leader will return "200" so this can be used to populate the LB and it will guarantee that only the leader gets accessed. Jeff may have mentioned this above but just being explicit.

@feliksik

This comment has been minimized.

Copy link

commented Dec 8, 2015

@jefferai , I'm attempting to set up your suggestion (1) with haproxy without full service discovery. But how do you make the haproxy http-check succeed with 200, given that it in turn requires to reach the host via the appropriate domain name to make SSL work? Do you also use a wildcard certificate, with unique subdomains for the vault backends you watch with haproxy?

@jefferai

This comment has been minimized.

Copy link
Member

commented Dec 8, 2015

@feliksik Honestly, that's totally up to you. You could use wildcard, or a cert with each of your backends as DNS SANs, or you could use IP SANs on the certificate, or...

Do note that the http-check will get different statuses depending on whether it's the active node or standby nodes. So if you get a 429, it's a successful call, just to a standby.

@ericchapman

This comment has been minimized.

Copy link
Author

commented Dec 8, 2015

@feliksik I had actually punted on this when I submitted the ticket and I am now picking it back up. Will let you know if I discover anything this week.

@feliksik

This comment has been minimized.

Copy link

commented Dec 8, 2015

@ericchapman I now set vault up with ssl, without proxy; aws elb does the https health check on /v1/sys/health, thus only declaring master healthy. I thinks this makes the slave proxy functionality of vault irrelevant.

I added localhost.domain.com in /etc/hosts so the vault wildcard cert works there; this is for unsealing and diagnostics.

It looks like aws elb does not verify ssl. You could also configure haproxy that way, i suppose.

Alternatively, when using wildcard certs, you could setup v1.domain.com (and v2 and v3) in /etc/hosts and have validation work, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.