Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TLS support to the ctl gateway #7923

Merged
merged 33 commits into from
Oct 15, 2020
Merged

Conversation

davidMcneil
Copy link
Contributor

@davidMcneil davidMcneil commented Sep 18, 2020

Resolves #7523 #7928

This PR adds TLS support to the ctl gateway. Below is an explanation of the general UI/UX around this feature. There is still work to be done on the overall workflow. Specific questions/pain points are called out.

Generate a Self Signed Key and Certificate

Users can use their own keys and certificates, but the most straightforward option is to let the Habitat CLI generate a private key and self signed certificate. This can be done with the command hab sup secret generate-tls --subject-alternative-name <dns name>. This will automatically create two files ctl-gateway-<timestamp>.crt.pem and ctl-gateway-<timestamp>.key.pem storing them in /hab/cache/keys/ctl by default.

Questions

  1. Where should this command live hab sup secret generate-tls was used because we already had hab sup secret generate for creating the ctl secret token, but it does not feel like quite the right place.
  2. Is /hab/cache/keys/ctl a good place to store the keys and certs

Start the Supervisor with TLS Enabled

This PR adds three new flags to the hab sup run command.

--ctl-client-ca-certificate <CTL_CLIENT_CA_CERTIFICATE>
    Enable client authentication for the control gateway and set the certificate authority to use when
    authenticating the client [default: /hab/cache/keys/ctl]
--ctl-server-certificate <CTL_SERVER_CERTIFICATE>
    The control gateway server certificate to use for TLS [default: /hab/cache/keys/ctl
--ctl-server-key <CTL_SERVER_KEY>
    Enable TLS for the control gateway and set the private key
    
    See `ctl-server-certificate` and `ctl-client-certificate` for additional settings. [default:
    /hab/cache/keys/ctl]

If these paths are pointed at a directory (eg /hab/cache/keys/ctl by default) they will use the latest file matching the ctl-gateway-<timestamp>.crt|key.pem format. If these paths point to a file --ctl-client-ca-certificate and --ctl-server-certificate will try to read PEM formatted certificates and --ctl-server-key will try to read PEM, PKCS8 private keys. If --ctl-client-ca-certificate is not specified no client authentication will be used.

Connect with the Habitat CLI Client

This is currently the least polished part of the process. All configuration is accomplished through three environment variables
HAB_CTL_GATEWAY_CLIENT_CERTIFICATE, HAB_CTL_GATEWAY_CLIENT_KEY HAB_CTL_GATEWAY_SERVER_CA_CERTIFICATE these variables are the client-side equivalent of the hab sup run CLI arguments. Environment variables are used, as opposed to CLI flags, due to the large number of commands requiring these options.

This PR now adds five fields to cli.toml instead of using environment variables: listen_ctl, ctl_client_certificate, ctl_client_key, ctl_server_ca_certificate, and ctl_server_name_indication.

Questions
3. How should we allow these options to be set? Leave the environment variables, add CLI options, use a config file as we do with /hab/sup/default/CTL_SECRET
4. Currently disabling server authentication is not possible. Would that be a desired feature?
5. Should we add logic to automatically detect the server is using TLS and upgrade the client appropriately? Or should we require the user to specify that a TLS connection should be used?

Example with both Client and Server Authentication

# Generate new keys
> hab sup secret generate-tls --subject-alternative-name localhost
# Start the server with TLS using the default values (ie the key and cert we just generated in `/hab/cache/keys/ctl`)
> hab sup run --ctl-server-certificate --ctl-server-key --ctl-client-ca-certificate
# Start the client with the defaults and get the service status. Note setting `--remote-sup` to `localhost` is important. It must match the `--subject-alternative-name` we used when generating the key
> env HAB_CTL_GATEWAY_CLIENT_CERTIFICATE=/hab/cache/keys/ctl \
  env HAB_CTL_GATEWAY_CLIENT_KEY=/hab/cache/keys/ctl \
  env HAB_CTL_GATEWAY_SERVER_CA_CERTIFICATE=/hab/cache/keys/ctl \
  hab sup status --remote-sup localhost

Example with only Server Authentication

# Generate new keys
> hab sup secret generate-tls --subject-alternative-name localhost
# Start the server with TLS using the default values (ie the key and cert we just generated in `/hab/cache/keys/ctl`)
> hab sup run --ctl-server-certificate --ctl-server-key
# Start the client with the defaults and get the service status. Note setting `--remote-sup` to `localhost` is important. It must match the `--subject-alternative-name` we used when generating the key
> env HAB_CTL_GATEWAY_SERVER_CA_CERTIFICATE=/hab/cache/keys/ctl \
  hab sup status --remote-sup localhost

Other Questions
6. Currently the traffic sent over the TLS connection is identical to what is sent when not using TLS. Specifically, we still send the CTL_SECRET and do our custom handshake. How should we handle this?

TODO

  • Refine the UI/UX
  • Investigate all error conditions and ensure the error messages are reasonably helpful
  • Add testing

@@ -47,9 +47,7 @@ prost-types = "*"
rand = "*"
rants = { version = "*", features = ["native-tls"] }
regex = "*"
# Pinning for now. Since upgrading to 0.17.0 results in conflicts with other crates
# See https://github.com/habitat-sh/habitat/issues/7523
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we'll be able to close this issue, then?

Cursor},
path::{Path,
PathBuf}};
use thiserror::Error;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

return Err(Error::FailedToReadCertificatesFromRootStore(failed));
}
Ok(root_certificate_store)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The various X_from_buf functions look like they could be private, and private_key_from_buf appears unused.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private_keys_from_file also looks like it could be private.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it would also be better to use BufReader instead of Cursor for these functions. Using fs::read will pull the contents of the entire file into memory, which we don't really need to do in this case. I think Cursor is mainly useful for taking data that already lives in memory, but adapting it for the kind of operations of BufRead. Since we don't already have this in memory, but have to read it off disk anyway, this seems like an unnecessary complication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Addressed in a0e8d3b


/// Upgrade a `TcpStream` into a `TlsStream` using server configuration
pub async fn to_tls_server(self, tls_config: Arc<TlsServerConfig>) -> io::Result<Self> {
Ok(match self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this match block is so big, I think it might be a little more clear to compute the new value, and then have Ok(stream) be the final line of the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Addressed in 28bd76a

.map_err(|e| e.0)?;
Self::TlsStream(TlsStream::Server(tls_stream))
}
Self::TlsStream(stream) => Self::TlsStream(stream),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if having a stream.is_tls() function might be nice here. That could make it a little more clear that this branch is really returning the incoming TLS stream unchanged.

My main motivation here is that TlsStream and TcpStream look very orthographically similar at a glance, so taking pains to make it incredibly clear what's going on could be a big win.

if self.is_tls() {
    Ok(self)
} else {
    // do stuff
    Ok(upgraded)
}

Another thought: would it ever be plausible to want to change the TLS configuration of an existing TLS stream? Would it be better to call this function maybe_upgrade_to_tls? 🤔

Copy link
Contributor Author

@davidMcneil davidMcneil Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if having a stream.is_tls() function might be nice here

I think we need to do some type of matching/destructuring in order to access the interior stream in the non-tls case. I could use if let here, but I generally prefer match in these situations so the compiler will direct future developers to conditions based on the enum variant.

My main motivation here is that TlsStream and TcpStream look very orthographically similar at a glance

Agreed, maybe RawTcpStream and TlsTcpStream would be better names?

Another thought: would it ever be plausible to want to change the TLS configuration of an existing TLS stream? Would it be better to call this function maybe_upgrade_to_tls?

Probably not. I made those methods private and added the maybe_ prefix which I agree makes it more clear.

Changes in 28bd76a

};
Ok(Self { path, private_key })
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments re: FromStr vs TryFrom<Path> apply here, as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same explanation as #7923 (comment)

type Error = habitat_core::Error;

fn try_from(s: &str) -> Result<Self, Self::Error> { Self::from_str(s) }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments apply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same explanation as #7923 (comment)


impl From<PrivateKey> for String {
fn from(pkg_ident: PrivateKey) -> Self { pkg_ident.to_string() }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments apply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same explanation as #7923 (comment)

}

impl RootCertificateStore {
pub fn root_certificate_store(self) -> RootCertStore { self.root_certificate_store }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit more idiomatic as into_inner, or perhaps just an Into or From implementation.

We're basically using RootCertificateStore as a way to get information from the user interacting with our CLI down to the point in the code when we can start to actually use the underlying TLS machinery. Really, it's just a way to tie a file path from the user to a certificate store.

Viewed in that light, maybe RootCertificateStore is actually just RootCertificateStore(PathBuf), which then has a From<RootCertificateStore> for RootCertStore implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This same reasoning applies to the PrivateKey type above, as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit more idiomatic as into_inner

Agreed. Addressed in 4dae869

Viewed in that light, maybe RootCertificateStore is actually just RootCertificateStore(PathBuf), which then has a From<RootCertificateStore> for RootCertStore implementation.

This is certainly possible and I agree makes more sense. However, by having RootCertificateStore parse the file we can catch errors as part of CLI parsing. If RootCertificateStore only had to be a valid PathBuf that would require parsing that PathBuf deep in the code and then bubbling that error back up. I would rather fail early, but I think there are arguments to be made for both approaches.

@@ -141,24 +144,35 @@ impl SrvClient {
secret_key: &str,
request: impl Into<SrvMessage> + fmt::Debug)
-> Result<impl Stream<Item = Result<SrvMessage, io::Error>>, SrvClientError> {
let socket = TcpStream::connect(address.as_ref()).await?;
let mut socket = Framed::new(socket, SrvCodec::new());
let tcp_stream = TcpStream::connect(address.as_ref()).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good... tcp_stream is a better variable name than socket 👍

-> Result<(), Error> {
let mut params = CertificateParams::new(vec![subject_alternate_name.to_string(),
"localhost".to_string(),
"127.0.0.1".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it will matter for our use here, but one strange thing about this library is that it assumes all subject alt names are dnsNames:

          X509v3 Subject Alternative Name: 
                DNS:foo, DNS:localhost, DNS:127.0.0.1, DNS:0.0.0.0

Ideally, IP addresses would get ip address entries there.

Copy link
Contributor Author

@davidMcneil davidMcneil Sep 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, nice catch! I should remove the IP addresses. The tokio_rustls library does not allow using an IP address when specifying the domain. As a result, adding those IP addresses does not buy us anything.

Copy link
Contributor

@stevendanna stevendanna Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine the implication then is also that users who want to use this feature for remote supervisors will have to use a hostname rather than an IP to connect.

One nice "development" feature you might consider is allowing the client to specify the server name via some option that then gets passed through to the SNI we send from the client. This would allow doing small tests where you specify the IP address as the host to connect to while still allowing the hostname verification to work. It isn't a big deal but can be helpful sometime for situation where you can't get DNS to work out consistently for one reason or another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that is a nice feature. The field ctl_server_name_indication in cli.toml allows setting this.

@netlify
Copy link

netlify bot commented Sep 22, 2020

Deploy preview for chef-habitat processing.

Building with commit 86c97bc

https://app.netlify.com/sites/chef-habitat/deploys/5f7643c9231ed90007085964

Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
@davidMcneil davidMcneil marked this pull request as ready for review September 29, 2020 19:31
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
let mut params =
CertificateParams::new(vec![subject_alternate_name.to_string(), "localhost".to_string(),]);
let mut distinguished_name = DistinguishedName::new();
distinguished_name.push(DnType::CommonName, "Habitat Supervisor Control Gateway");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought a CommonName had to be a FQDN or wildcard domain name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I switched to use DynType::Organization. Addressed in 34a7f27

It's odd it looks like changing it does not actually change the cert. The default value uses an invalid common name https://github.com/est31/rcgen/blob/ac4813acbbe90c458fbb016b5a08631edecabeca/src/lib.rs#L458

path: impl AsRef<Path>)
-> Result<(), Error> {
let mut params =
CertificateParams::new(vec![subject_alternate_name.to_string(), "localhost".to_string(),]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the subject alternate name is supposed to be a valid DNS name... do we check for that anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Addressed in 6e5fa2a

Ok(())
}

fn get_latest_path(path: impl AsRef<Path>, pattern: &str) -> Result<PathBuf, Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, path is really a directory, while pattern describes files within that directory, correct?

If so, it would be nice to have the variable names reflect that, along with some documentation comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 86c97bc

Ok(rustls_wrapper::private_key_from_file(&path)?)
}

pub fn latest_root_certificate_store(path: impl AsRef<Path>) -> Result<RootCertStore, Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that these functions are in this module, but are called from tls::rustls_wrapper::types and call functions from tls::rustls_wrapper themselves suggests that maybe things are not located in the right place in the module hierarchy.

Over in tls::rustls_wrapper::types, the only thing that seems to distinguish things is whether something is called with a directory or a file path, whereas over here, it seems like "control gateway-ness" of the files is the distinguishing feature, which seem at odds with each other.

(On the other hand, generate_self_signed_certificate_and_key definitely pertains to the control gateway, so its location here makes complete sense.)

Is there a better organization that would help clarify this?

net::{IpAddr,
SocketAddr},
path::PathBuf,
str::FromStr};
use structopt::{clap::AppSettings,
StructOpt};

const HAB_CTL_KEYS_CACHE: &str = "/hab/cache/keys/ctl";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to put this into habitat_core::fs for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 4ed067a

Some(Arc::new(tls_config))
} else {
None
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Rather than having this function implicitly depend on CliConfig, it might be more clear to pass that in as an argument.

Possibly better would be to give CliConfig a function that could generate the necessary TLS config object, and then pass that into this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 this is a difficult question; I wrestled with it as well. I actually made the opposite decision and removed the secret parameter from the request function. My reasoning was that whenever request was called we did the exact same thing: lookup the secret and then pass it to the function. This simply complicates the logic when we go to call request. Looking up the TLS parameters would be the same story. I agree it does not feel 100% correct to encapsulate this logic inside request. However, I do think we want it encapsulated somewhere. I am inclined to leave it in request for now and if we have a need for a more general request function we can make the change then. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could split the difference... still depend implicitly on CliConfig within the function, but encapsulate the TLS parameter creation as a method on CliConfig. If that's not too gnarly, that seems like a more natural location for the logic than here (similar arguments might be made for the server TLS config).

If that looks gross, though, we could leave it as is for the time being.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that sounds like a good approach. Addressed in 91a106e

Some(Arc::new(tls_config))
} else {
None
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, it might be better for the CtlGatewayServer to be able to take an optional TlsServerConfig, rather than having the logic for creating that embedded here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 8643bcf

Signed-off-by: David McNeil <mcneil.david2@gmail.com>
params.distinguished_name = distinguished_name;
params.alg = &PKCS_ECDSA_P256_SHA256;

let certificate = RcgenCertificate::from_params(params)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the current use this likely doesn't matter since we also control the client, but I noted that this library uses a fixed Serial Number if you don't provide one:

        Serial Number: 42 (0x2a)

we've hit problems related to reused serial numbers in various browsers. I've wrote a few words about it here:

https://github.com/chef/automate/blob/b42fa46102d7fb0aad9bdfb236e16d40ccf55e05/api/config/deployment/init_config.go#L354-L369

Might be worth fixing just in case this function gets reused for other purposes.

In the past we've also hit problems with self-signed certificates where the CA:TRUE basic constraint wasn't set (setting this constraint also requires adding KeyUsageCertSign to the key usage extensions. However, we also have a lot of self-sgined certs out in the wild that don't set this and haven't hit a problem so perhaps most of the clients that cared about this aren't an issue.

Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Copy link
Contributor

@christophermaier christophermaier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@christophermaier christophermaier merged commit 0f03c89 into master Oct 15, 2020
christophermaier added a commit that referenced this pull request Oct 16, 2020
…tls"

This reverts commit 0f03c89, reversing
changes made to e3d33d0.

This is a temporary action, pending some investigation of e2e test failures.
christophermaier added a commit that referenced this pull request Oct 16, 2020
Revert "Merge pull request #7923 from habitat-sh/dmcneil/ctl-gateway-…
christophermaier added a commit that referenced this pull request Oct 23, 2020
This is a second attempt at merging #7923.

All changes are by @davidMcneil; I'm just the messenger.

This reverts commit 5494773, reversing
changes made to 59e3c3b.

Signed-off-by: David McNeil <mcneil.david2@gmail.com>
Signed-off-by: Christopher Maier <cmaier@chef.io>
christophermaier added a commit that referenced this pull request Oct 23, 2020
Per [the migration guide][1] this wraps `AppState` as `Data<AppState>`
to prevent panics in the HTTP gateway.

The rest of the code currently assumes the data is returned in this
way; future refactorings may allow us to drop the `Data` wrapper
entirely, but this is the minimal change for now.

Actix 3.0 was introduced in #7949 / #7923.

[1]: https://github.com/actix/actix-web/blob/master/MIGRATION.md#300

Signed-off-by: Christopher Maier <cmaier@chef.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

relax version pin on rustls crate
3 participants