Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update http-client and builder-api-client from hyper to reqwest. #6759

Merged
merged 3 commits into from Aug 7, 2019

Conversation

@chefsalim
Copy link
Member

commented Jul 25, 2019

This change has two significant components:

  1. Migrate http-client and builder-api-client crates from hyper to reqwest
  2. Modify the cert handling to be more flexible and sane

The reqwest migration allows us to remove custom handlers for proxy and SSL support and use the library implementations instead. It also simplifies the handling in the builder api client crate. It also allows us to build on newer versions of Ubuntu which have updated openssl libraries.

The cert handling changes allow more robustness and flexibility in the client. We now add the cacerts to the root certificate list instead of replacing it. We also allow dropping new SSL certs into the hab cache ssl directory that also get added to the root certificate list. Finally we can now support both PEM and DER cert formats.

The behavior in cert handling for the Mac platform has also been brought in line with the other platforms. Previously, we did basically left the Mac to its system certs. Now the same logic is used consistently.

Resolves #6682 and #6460

Signed-off-by: Salim Alam salam@chef.io


This change is Reviewable

@chefsalim chefsalim force-pushed the SA/update-reqwest branch from 2836641 to 6651b47 Jul 25, 2019
@mwrock

This comment has been minimized.

Copy link
Contributor

commented Jul 25, 2019

I have not reviewed the code yet but I have validated that this builds and successfully interacts with bldr on Windows.

@@ -32,8 +37,8 @@ use crate::{error::{Error,
OriginKeyIdent,
SchedulerResponse};

This comment has been minimized.

Copy link
@raskchanky

raskchanky Jul 25, 2019

Member

We typically delete all the blank lines in between use clauses now and let rustfmt sort them.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 25, 2019

Author Member

Will do

@@ -31,8 +30,7 @@ use crate::{error::{Error,
PackageIdent,
PackageTarget},
ChannelIdent},
hab_http::{util::decoded_response,
ApiClient},

This comment has been minimized.

Copy link
@raskchanky

raskchanky Jul 25, 2019

Member

Same comment here as above re: deleting all the blank lines in between the use statements.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 25, 2019

Author Member

👍

.get(X_FILENAME)
.expect("XFileName missing from response")
.to_str()
.expect("Invalid X-Filename");

This comment has been minimized.

Copy link
@raskchanky

raskchanky Jul 25, 2019

Member

I notice this pattern is repeated a lot, where in order to retrieve the value of some header, we have to call headers().get(NAME_OF_HEADER).expect("something's missing").to_str().expect("Invalid something or other").

It'd be nice if there was a way to abstract that out into something that could be reused every time we need to get a header value out, ideally without the need to call expect() twice, but at the very least just to avoid all the repetition.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 25, 2019

Author Member

Good point, will look at that

} else {
let mut encoded = String::new();
resp.read_to_string(&mut encoded).map_err(Error::IO)?;
trace!("Body: {:?}", encoded);

This comment has been minimized.

Copy link
@raskchanky

raskchanky Jul 25, 2019

Member

Good call on moving this to trace!.

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

An idea just occurred to me for this stuff. It would also be something that justifies its own PR, but I figured I'd mention it.

One interesting thing we might want to consider here (at least until we can move our logging over to some kind of tagged / structured logging system, that is), is to add a target to the log call, like so:

trace!(target: "builder_api_traffic::fetch_rdeps::body", "{:?}", encoded);

If you don't explicitly set a target, it defaults to the module you're currently in (thus the module-like ::-delimited path). This is what you're actually operating on when you do something like RUST_LOG=foo::bar::baz=trace.

However, nothing stops you from adding your own targets. This basically lets us establish our own custom parallel logging hierarchy. If you want to see all traffic at trace level, use builder_api_traffic=trace. If you only want to see the info for this particular API call, narrow it to builder_api_traffic::fetch_rdeps. I added ::body as an example of how to further refine, but that's not needed.

Ideally, we'd have structured logging so that we could just annotate what's happening here (tack on IDs, status codes, header information, responses, etc.). Depending on how things are actually happening under the hood, we might already be able to use log_mdc for this (which I've been using on some other in-progress work; however using it with futures requires a bit of extra consideration). However, until we get there, we could use this parallel hierarchy technique to squeeze a bit more functionality and usefulness out of the base logging facilities to extract more information out of our code.

If we were to go this route, I'd recommend putting the logic behind some helper functions, both to ensure we're logging consistently and with the same overall structure, and to ease a future transition to a real structured logging setup.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 26, 2019

Author Member

Interesting idea... I didn't realize you could add a target to the trace command... adding that could indeed be very useful... lets start with that and we can discuss the future state of logging separately

} else {
Err(err_from_response(res))
Ok(())

This comment has been minimized.

Copy link
@raskchanky

raskchanky Jul 25, 2019

Member

Slight nitpick, but I think the logic for this read better how it originally was:

if status == NoContent { Ok } else { Err }

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 2, 2019

Author Member

Addressed by other fixes

if resp.status() != StatusCode::OK {
Err(err_from_response(resp))
} else {
Ok(())

This comment has been minimized.

Copy link
@raskchanky

raskchanky Jul 25, 2019

Member

I left a similar comment previously, and this is totally personal preference, but I think this reads better as:

if resp.status() == StatusCode::Ok {
    Ok(())
} else {
    Err(err_from_response(resp))
}

Similarly elsewhere (there's a lot of these if statements).

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 25, 2019

Author Member

I went back and forth on this, and ended up keeping the negated checks first (as there were already many of those) and after a while it seemed to read a bit better that way to me. I'll look at it again...

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

Thinking a bit more, I get why you might want to get the negated scenario out of the way first. If you know it's bad, just get rid of it now; otherwise move on.

I was messing around a bit, and a nice alternative might be a macro like this:

macro_rules! fail_if_not {
    ($resp:expr, $status:expr) => {
        if $resp.status() != $status {
            return Err(err_from_response($resp));
        }
    };
}

This is nice because you can recast the above code like this:

fail_if_not!(resp, StatusCode::OK);
Ok(())

When the non-error case is more complex, the benefits are greater.

This lets you quickly get the errors out of the way, and lets you do so in a pretty descriptive and low-overhead way. You also don't need to add an extra level of indentation to the code with an if/else block.

I played around with a function formulation, but then you have to concern yourself with return types and potentially reassigning the response, which can be done and is valid, but adds a bit of unnecessary ceremony and line noise in my opinion.

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

And because it's a macro, you could also modify it to accept one-or-more status codes, a pattern I see in at least one other place in this module.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 26, 2019

Author Member

Nice... a set of macros for these common patterns could be useful..

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 2, 2019

Author Member

Addressed by other fixes

Copy link
Contributor

left a comment

Still working through this, but I've gotta run to dinner now.

Overall it's looking pretty good. I think we could take this opportunity to do some further cleanup, as there's still a lot of boilerplate. But I'm very much liking where it's going!

header! { (XFileName, "X-Artifactory-Filename") => [String] }
header! { (XJFrogArtApi, "X-JFrog-Art-Api") => [String] }
const X_JFROG_ART_API: &str = "x-jfrog-art-api";
const X_ARTIFACTORY_FILENAME: &str = "x-artifactory-filename";

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 25, 2019

Contributor

Why the case change?

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 25, 2019

Author Member

HeaderName::from_static requires the string be lowercase

.expect("X-Artifactory-Filename missing from response")
.to_string();
.to_str()
.expect("Invalid X-Artifactory-Filename");

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 25, 2019

Contributor

Is a panic the appropriate action to take here? Wouldn't an Err be better?

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 25, 2019

Author Member

Mostly convenience.. I'll look at encapsulating out this API as its a pain

.get(CONTENT_LENGTH)
.expect("Content length missing")
.to_str()
.expect("Content length invalid");

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 25, 2019

Contributor

Same re: expect

It's also a bit unexpected to panic on a failed to_str() call here, but then use unwrap_or_else when parsing that string to a u64 right below. Would those not be the same kind of error (i.e., "you gave me trash for content length; I'm going to set it to 0")?

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 25, 2019

Author Member

Yep

.get(X_FILENAME)
.expect("XFileName missing from response")
.to_str()
.expect("Invalid X-Filename");

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 25, 2019

Contributor

Same questions re: expect usage vs. error

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 3, 2019

Author Member

Fixed

.expect("Content length missing")
.to_str()
.expect("Content length invalid");
let size = cl.parse::<u64>().unwrap_or_else(|_| 0);

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 25, 2019

Contributor

Ditto re: expect and parsing

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 27, 2019

Contributor

And same comments as before re turbofish and the cl binding

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 2, 2019

Author Member

Fixed

if res.status != StatusCode::Ok {
return Err(err_from_response(res));
if resp.status() != StatusCode::OK {
Err(err_from_response(resp))

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 25, 2019

Contributor

positive first

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 25, 2019

Author Member

Will look at switching

.map_err(Error::BadResponseBody)?;
trace!("Body: {:?}", encoded);

let secret_keys: Vec<String> =

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 25, 2019

Contributor

You don't actually need the Vec<String> type hint here, since it can be inferred from the function's return type.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 25, 2019

Author Member

Will update

.map_err(Error::BadResponseBody)?;
trace!("Body: {:?}", encoded);

let revisions: Vec<OriginKeyIdent> =

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 25, 2019

Contributor

Don't need the type hint here either.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 25, 2019

Author Member

OK

.map_err(Error::BadResponseBody)?;
trace!("Body: {:?}", encoded);

let channels: Vec<String> = serde_json::from_str::<Vec<String>>(&encoded)?.into_iter()

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 25, 2019

Contributor

Don't need the first Vec<String> type hint here either.

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 25, 2019

Contributor

Actually, you can simplify this and many other functions like this:

if resp.status() == StatusCode::OK {
  Ok(resp.json()?)
} else {
  Err(err_from_response(resp))
}

That pattern itself could also be abstracted to a helper function if you like.

You'd forego the ability to trace the body of the response, though (I notice that the body isn't always logged in all the functions that parse JSON, however). If that's really important, though, encapsulating this general pattern into something more reusable would still be a big win.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 25, 2019

Author Member

I did that and ended up reverting that change since the ability to trace the body could be pretty valuable for debugging

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 25, 2019

Author Member

Will look at a helper

if resp.status() != StatusCode::OK {
Err(err_from_response(resp))
} else {
Ok(())

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 25, 2019

Contributor

Actually, this pattern could be extracted in a few ways. One potentially nifty way would be to use an extension trait (which would also be a natural place to add some smart JSON parsing, and other common functionality, too).

This basically lets you add your own custom code to a Response!

trait ResponseExt {
    fn ok_if_status(self, code: reqwest::StatusCode) -> Result<()>;
    // add more functions, of course
}

impl ResponseExt for reqwest::Response {
    fn ok_if_status(self, code: reqwest::StatusCode) -> Result<()> {
        if self.status() == code {
            Ok(())
        } else {
            Err(err_from_response(self))
        }
    }
}

(You can of course add whatever logging you like, too, ensuring that's also consistent everywhere.)

Then, instead of writing this all the time:

if resp.status() != StatusCode::OK {
    Err(err_from_response(resp))
} else {
    Ok(())
}

you get to write:

resp.ok_if_status(StatusCode::OK)

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 25, 2019

Contributor

You could also add something like this to the trait to start abstracting the common "parse the response as a JSON and return the type" pattern:

    fn parse_json_if_status<T: serde::de::DeserializeOwned>(mut self,
                                                            code: reqwest::StatusCode)
                                                            -> Result<T> {
        if self.status() == code {
            Ok(self.json()?)
        } else {
            Err(err_from_response(self))
        }
    }

then just call

resp.parse_json_if_status(StatusCode::OK)

etc.

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 25, 2019

Contributor

(these suggestions can all be modified, of course; I just wanted to provide an example)

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 26, 2019

Author Member

Cool... I like it!

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 27, 2019

Contributor

Since this is the big discussion thread for it, I'll reiterate that I think we should be using reqwest::Response::error_for_status where we can and moving the err_from_response logic into the From<reqwest::Error> for Error code.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 1, 2019

Author Member

Between these two approaches, I'm inclined to go with the extension trait. It solves a couple of different issues - the error handling, as well as abstracting out things like headers in a nicer way. It also lets us return error for specific status codes which the error_for_status does not. Also the Fromreqwest::Error error logic doesn't seem right when moving err_from_response there, since we would have to handle both error from status case as well as a regular reqwest error case.

net::ProxyHttpsConnector,
proxy::{proxy_unless_domain_exempted,
ProxyInfo},
ssl};

This comment has been minimized.

Copy link
@raskchanky

raskchanky Jul 25, 2019

Member

Same comment as previously re: deleting blank lines in the use statements.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 2, 2019

Author Member

OK

@baumanj

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2019

Wasn't able to get to this today, but will plan to take a look first thing tomorrow.

@@ -399,28 +409,27 @@ impl BuilderAPIProvider for BuilderAPIClient {
token: &str,
promote: bool)
-> Result<()> {
debug!("Promote/demote for group: {}, channel: {}",
group_id, channel);

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

It would be useful to include the concrete operation being performed here (either promote or demote), rather than being ambiguous.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 26, 2019

Author Member

Ha, yes...

let resp = self.add_authz(self.0.post(&url), token)
.json(&body)
.send()?;
debug!("Status: {:?}", resp.status());

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

Going back to my comment above re: structured logging, will logging just the status here, absent any other information about a specific request, be useful?

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 26, 2019

Author Member

It's semi useful atm... I did some work to make the logging a bit cleaner and consistent but that definitely hasn't been the main thrust of this PR...

@@ -473,24 +482,24 @@ impl BuilderAPIProvider for BuilderAPIClient {
key: &str,
secret: &WrappedSealedBox)
-> Result<()> {
debug!("Creating origin secret: {}, {}", origin, key);

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

Seeing "key" in a logging statement (in a function named create_origin_secret, no less) got me concerned about leaking confidential information in logs.

It looks like "key" in this context refers to a hash map key though, so that's fine for logging. However, it might be good to change the argument name from key to something a bit more descriptive (and less concerning!) like name.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 26, 2019

Author Member

OK

@@ -499,15 +508,18 @@ impl BuilderAPIProvider for BuilderAPIClient {
///
/// * Remote Builder is not available
fn delete_origin_secret(&self, origin: &str, token: &str, key: &str) -> Result<()> {
debug!("Deleting origin secret: {}, {}", origin, key);

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

Same suggestion regarding the name "key".

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 2, 2019

Author Member

OK

@@ -902,8 +948,9 @@ impl BuilderAPIProvider for BuilderAPIClient {
let ident = pa.ident()?;
let target = pa.target()?;

let mut file =
File::open(&pa.path).map_err(|e| Error::PackageReadError(pa.path.clone(), e))?;
debug!("Uploading package {}, target {}", ident, target);

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

I am soooooo looking forward to a day when target and ident can be part of a single unified identifier.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 26, 2019

Author Member

That's part of "v1" right? :)

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

It'd be real nice!

///
/// # Errors
///
/// * If system information cannot be obtained via `uname`

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

Why remove this comment... it seems incredibly useful!

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 2, 2019

Author Member

I'll add back in

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 2, 2019

Author Member

Will add back


if let Ok(pkg_install) = PackageInstall::load(&cacerts_ident, fs_root_path) {
let pkg_certs = pkg_install.installed_path().join("ssl/cert.pem");
debug!("Found installed certs: {}", pkg_certs.display());

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

It might be a good idea to swap this line with the one below it, so that we only say we found certs at that location once we've determined that they are actually there. If for some reason they weren't, it could be confusing to say that "we found them!", only to immediately error out because they're not there or are invalid somehow.

It would also be nice to have a debug! message immediately upon entering this if block saying that we're retrieving certs from a core/cacerts package (and specifically which release of that package).

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 3, 2019

Author Member

I'll update the debug to be clearer - that line is to output debug that an installed core/cacerts Hab package was found, which is true if PackageInstall::load succeeds

let cert = cert_from_file(&pkg_certs)?;
certificates.push(cert);
} else {
debug!("No installed cacerts package found");

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

Just for absolute clarity, it would be nice to say something like No installed Habitat core/cacerts package found from which to install certificates. Saying just "cacerts package" could be confusing to newer users, who could reasonably think that refers to their system cert package.

Indicating why we were looking for a cacerts package in the first place would also be useful.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 3, 2019

Author Member

This is not meant to be text for end users.. will look at updating

let cached_certs = cache_ssl_path(fs_root_path).join("cert.pem");
if !cached_certs.exists() {
fs::create_dir_all(cache_ssl_path(fs_root_path))?;
debug!("Creating cached cacert.pem: {}", cached_certs.display());

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

Continuing in the vein of being more descriptive, saying something like "Adding embedded certificate file to Habitat SSL cache path {} as fallback" would be nice.

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

I wonder if these kind of messages should actually be at the info level 🤔

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 3, 2019

Author Member

It would be nice to expose cert info as part of the regular UI output (which is where I think you're going with this) but non-trivial enough that I think its out of scope for this PR

if let Some(ext) = path.extension() {
if path.is_file() && ((ext == "pem") || (ext == "der")) {
debug!("Found cached cert: {}", path.display());
let cert = cert_from_file(&path)?;

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

If one of these files is corrupt or otherwise unreadable, do we want to fail the entire operation, or log the error and skip that file?

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 5, 2019

Author Member

Well thats a great question - we can certainly log the error and move on. That could also "work" for the OSX case where we currently are not able to load the PEM file atm.

This comment has been minimized.

Copy link
@baumanj

baumanj Aug 5, 2019

Contributor

My suggestion (to minimize painting ourselves into a corner) would be to fail the whole operation currently. Later, we can always add a flag (or configuration item) to continue on error, but we'd have a much harder time changing it in the other direction.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 5, 2019

Author Member

Gracefully handling errors and not failing the entire operation allows us to remove the check for PEM/DER entirely, which is a definite simplification, and also better as we can now handle PEM formatted CRT files as well.

This comment has been minimized.

Copy link
@baumanj

baumanj Aug 5, 2019

Contributor

Seems like we could try all the available formats and then, if they all gave errors, fail the operation, no?

let mut buf = Vec::new();
File::open(file_path)?.read_to_end(&mut buf)?;

let ext = file_path.extension().unwrap(); // unwrap Ok

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

This is only "ok" because of how this function is currently being called, because we check the existence and validity of the extension before we call it.

That seems fragile to me, since it could be easy to unwittingly break with a refactor.

It also means we're validating the extension twice, which is a bit of a bummer, since it's a) extra work and b) splits up the logic across multiple sites.

I noodled around with some code, but I'm not quite sure how I feel about it, even though it would address my concerns. It feels a bit heavy, and we'd still need to deal with OsStrs, which is what we actually get from the extension() method.

enum CertFileType {
    PEM,
    DER,
}

impl CertFileType {
    fn parse_fn(&self) -> impl Fn(&[u8]) -> std::result::Result<Certificate, reqwest::Error> {
        match self {
            CertFileType::PEM => Certificate::from_pem,
            CertFileType::DER => Certificate::from_der,
        }
    }
}

impl FromStr for CertFileType {
    type Err = (); // this would maybe be something else in real code.

    fn from_str(s: &str) -> std::result::Result<Self, Self::Err> {
        match s {
            "pem" => Ok(CertFileType::PEM),
            "der" => Ok(CertFileType::DER),
            _ => Err(()),
        }
    }
}

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 5, 2019

Author Member

Happily, needing to look at the extension is no longer needed

Certificate::from_pem(&buf).map_err(Error::ReqwestError)
} else {
Certificate::from_der(&buf).map_err(Error::ReqwestError)
}

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

To reduce the duplication here, you could do something like this:

let fun = if ext == "pem" {
    Certificate::from_pem
} else {
    Certificate::from_der
};

fun(&buf).map_err(Error::ReqwestError)

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 31, 2019

Contributor

Per my earlier suggestion, we could ignore the extension entirely and just do:

    Certificate::from_pem(&buf).or_else(|_| Certificate::from_der(&buf))
                               .map_err(Error::ReqwestError)

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 5, 2019

Author Member

Yep, going with Jon's suggestion on this...

@@ -74,7 +74,7 @@ lazy_static! {

/// Generate a new secret key used for authenticating clients to the `CtlGateway`.
pub fn generate_secret_key(out: &mut String) {
let mut rng = rand::rngs::OsRng::new().unwrap();
let mut rng = rand::rngs::OsRng;

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

Why did this change?

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 26, 2019

Author Member

Updated rand deprecates the old way - the new one seems better as there is no unwrap

.set(jemalloc_ctl::stats::retained::mib().unwrap()
.read()
.unwrap()
.to_i64());

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

Why did all these change?

I'm a bit uneasy with all the unwraps... can we be certain they're safe to do?

At the very least, they should be expect calls, so we can get additional context if one of them fires.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Jul 26, 2019

Author Member

The version of jemalloc got bumped in the dep chain... in general seems fine to pick up an updated version, the only change is that the stats is under a mib namespace and needed another unwrap than what was already there.

let path = entry.path();

if let Some(ext) = path.extension() {
if path.is_file() && ((ext == "pem") || (ext == "der")) {

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 26, 2019

Contributor

What happens if someone has symlinked their certs from somewhere else?

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 31, 2019

Contributor

According to the docs:

This function will traverse symbolic links to query information about the destination file. In case of broken symbolic links this will return false.

Or, are you wondering about the case where foo is a symlink with target bar.pem and we reject it based on the fact that foo lacks the appropriate extension?

I would argue that we'd be better off not even caring about the extension and just trying to process the file first as a PEM and if that fails as a DER. As far as I can tell, there shouldn't ever be ambiguity since one format is ASCII and the other is binary.

This comment has been minimized.

Copy link
@christophermaier

christophermaier Jul 31, 2019

Contributor

I kinda like the idea of ignoring the extension and just trying both conversions in sequence. I think that would simplify quite a bit of code.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 5, 2019

Author Member

Yep, agreed

@chefsalim chefsalim force-pushed the SA/update-reqwest branch from 6651b47 to 50af071 Jul 26, 2019
@baumanj

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2019

Does this address #5597 and the need for the export SSL_CERT_FILE=/usr/local/etc/openssl/cert.pem macOS workaround generally?

@christophermaier

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2019

@baumanj That's my suspicion and hope.

Also update SSL cert handling to be additive and allow drop in certs.

Signed-off-by: Salim Alam <salam@chef.io>
@chefsalim chefsalim force-pushed the SA/update-reqwest branch from 50af071 to 5762710 Jul 26, 2019
Copy link
Contributor

left a comment

I'm not 100% through, but wanted to get some initial feedback to you since I'll be out on Monday.

Overall, this is really awesome!

.expect("Content length missing")
.to_str()
.expect("Content length invalid");
let size = cl.parse::<u64>().unwrap_or_else(|_| 0);

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 27, 2019

Contributor
Suggested change
let size = cl.parse::<u64>().unwrap_or_else(|_| 0);
let size = cl.parse().unwrap_or_default();

The turbofish isn't necessary because the type is inferred via the call to progress.size(…).

While you're at it, why not just add this to the end of the chain of functions off res and eliminate the cl binding?

                     let size = res.headers()
                                   .get(CONTENT_LENGTH)
                                   .expect("Content length missing")
                                   .to_str()
                                   .expect("Content length invalid")
                                   .parse()
                                   .unwrap_or_default();

                     progress.size(size);

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 1, 2019

Author Member

Will fix

.body(Body::SizedBody(&mut file, file_size))
.send()
let body = Body::sized(file, file_size);
self.add_authz(self.0.put(&path), token).body(body).send()?

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 27, 2019

Contributor

Since L374 and 377 are identical, why not?

        let body = if let Some(mut progress) = progress {
            progress.size(file_size);
            let reader = TeeReader::new(file, progress);
            Body::sized(reader, file_size)
        } else {
            Body::sized(file, file_size)
        };
        let resp = self.add_authz(self.0.put(&path), token).body(body).send()?;

Similarly elsewhere.

Also, for the sake of consistency, other places in the code use res for result/response. I think it makes sense to do the same here.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 5, 2019

Author Member

Good catch on the duplicated code - will fix.

Re: the result/response - I'm updating so that Result = "res" and Response = "resp" in accordance with the Reqwest samples so we are consistent in the usage.

@@ -514,24 +516,20 @@ impl BuilderAPIProvider for ArtifactoryClient {

let path = self.url_path_for_package(&ident, target);

let mut reader: Box<dyn Read> = if let Some(mut progress) = progress {
let reader: Box<dyn Read + Send> = if let Some(mut progress) = progress {

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 27, 2019

Contributor

Why is this code different from the L370 and L403? I'm fine with either (though adding Box into the mix seems a bit much complexity to avoid duplicating Body::sized(…, file_size) once), but it seems like it should be the same unless this bit of code is doing something different from the other 2 that I'm not seeing.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 5, 2019

Author Member

Yep will fix

if token.is_some() {
rb.header(Authorization(Bearer { token: token.unwrap().to_string(), }))
rb.bearer_auth(token.unwrap().to_string())

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 27, 2019

Contributor

As long as we need to touch this code, we could get rid of the unnecessary unwrap. Seeing that is usually an indication of less robust code, but this usage is perfectly robust. That can be more readily communicated with the more idiomatic approach:

        match token {
            Some(token) => rb.bearer_auth(token),
            None => rb,
        }

Also, the to_string is no longer necessary.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 5, 2019

Author Member

Yes indeed

fn add_authz<'a>(&'a self, rb: RequestBuilder<'a>, token: &str) -> RequestBuilder<'_> {
rb.header(Authorization(Bearer { token: token.to_string(), }))
fn add_authz(&self, rb: RequestBuilder, token: &str) -> RequestBuilder {
rb.bearer_auth(token.to_string())

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 27, 2019

Contributor
Suggested change
rb.bearer_auth(token.to_string())
rb.bearer_auth(token)

No need for to_string here either.

Given how simple this method is now, perhaps it makes sense to get rid of it and put the bearer_auth call inline. Personally, I find chaining:

        let mut resp = self.0
                           .post_with_custom_url(&path, custom)
                           .bearer_auth(token)
                           .send()?;

easier to parse than composition:

        let mut resp = self.add_authz(self.0.post_with_custom_url(&path, custom), token)
                           .send()?;

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 5, 2019

Author Member

Agreed, I like that better as well...

user_agent_header: user_agent(product, version)? })
let endpoint = endpoint.into_url().map_err(Error::ReqwestError)?;

let timeout_in_secs = match env::var("HAB_CLIENT_SOCKET_TIMEOUT") {

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 27, 2019

Contributor
Suggested change
let timeout_in_secs = match env::var("HAB_CLIENT_SOCKET_TIMEOUT") {
let timeout_in_secs = match env::var("HAB_CLIENT_SOCKET_TIMEOUT_SECS") {

As env vars have no type information, I think it's important to be clear about how they'll be interpreted.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 6, 2019

Author Member

For this change, I'd prefer to leave the existing env vars the same in order to preserve as much backward compatibility as possible

}
}
Err(_) => CLIENT_SOCKET_RW_TIMEOUT_SEC,
};

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 27, 2019

Contributor

This whole block (as well as the CLIENT_SOCKET_RW_TIMEOUT_SEC constant), can be replaced by:

        habitat_core::env_config_duration!(ClientSocketTimeout,
                                           HAB_CLIENT_SOCKET_TIMEOUT_SECS => from_secs,
                                           Duration::from_secs(300));

Then, the value (which implements Into<Duration>) can be accessed with ClientSocketTimeout::configured_value(). Making the accesses:

        debug!("Client socket timeout: {:?}",
               Duration::from(ClientSocketTimeout::configured_value()));

and

                                        .timeout(Some(ClientSocketTimeout::configured_value().into()))

The env_config_* macros are documented here.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 6, 2019

Author Member

Thanks, I think I'll tackle env switchover separately and not refactor as part of this PR

};
debug!("Client socket timeout: {} secs", timeout_in_secs);

let skip_cert_verify = env::var("HAB_SSL_CERT_VERIFY_NONE").is_ok();

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 27, 2019

Contributor

For the sake of uniformity, it would be nice to use an env_config macro here as well. That will make it easier to discover all our env var tunables with a simple search. There's no env_config_bool yet, though it wouldn't be too hard to add.

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 31, 2019

Contributor

Additionally, I think getting away from using easily misunderstood env vars (is an empty value true or false here?*) and adding a proper CLI argument for this (with the env var available as well, but its value validated up front) would be worthwhile.

* I know which it is, but it's impossible to tell from just looking at this line since changing use habitat_core::env above so std::env would change that behavior. That's very related to what caused the 0.78 incident (see also #6314), so I think that should be a good enough reason to avoid this pattern.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 6, 2019

Author Member

As mentioned elsewhere, I'll save the env update for a separate PR

debug!("Skip cert verification: {}", skip_cert_verify);

let header_values = vec![(USER_AGENT, user_agent(product, version)?)];
let headers = HeaderMap::from_iter(header_values.into_iter());

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 27, 2019

Contributor

The name header_values isn't quite accurate since it's a Vec of (key, value) tuples. The naming issue could be elided by combining the two statements:

        let headers =
            HeaderMap::from_iter(vec![(USER_AGENT, user_agent(product, version)?)].into_iter());

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 6, 2019

Author Member

Done

.danger_accept_invalid_certs(skip_cert_verify);

let certs = certificates(fs_root_path)?;
for cert in certs {

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 27, 2019

Contributor

The extra certs binding seems unnecessary:

        for cert in certificates(fs_root_path)? {
            client = client.add_root_certificate(cert);
        }

But since this is the builder pattern, I find the loop body a bit non-obvious. To me, fold makes the intent a bit clearer and allows the mut on L83 to be removed:

        let client =
            certificates(fs_root_path)?.into_iter()
                                       .fold(client, |client, cert| {
                                           client.add_root_certificate(cert)
                                       });

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 6, 2019

Author Member

OK

@danielcbright

This comment has been minimized.

Copy link

commented Jul 27, 2019

This looks great to me, minor nag though is I don't see any documentation/readme/examples on how to use your own cacert bundle both in-and-out of studio, am I missing it somewhere?

Copy link
Contributor

left a comment

Migrate http-client and builder-api-client crates from hyper to reqwest

This stuff looks good to me; since it's essentially all slight changes in interface, I'm not too concerned.

Modify the cert handling to be more flexible and sane

I'm really excited about these bits, and most of my suggestions are optional. However, I'm requesting changes because we're adding important functionality here, and there's no test coverage. I think we should use this opportunity to make sure we correct that.

};
debug!("Client socket timeout: {} secs", timeout_in_secs);

let skip_cert_verify = env::var("HAB_SSL_CERT_VERIFY_NONE").is_ok();

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 31, 2019

Contributor

Additionally, I think getting away from using easily misunderstood env vars (is an empty value true or false here?*) and adding a proper CLI argument for this (with the env var available as well, but its value validated up front) would be worthwhile.

* I know which it is, but it's impossible to tell from just looking at this line since changing use habitat_core::env above so std::env would change that behavior. That's very related to what caused the 0.78 incident (see also #6314), so I think that should be a good enough reason to avoid this pattern.

match t.parse::<u64>() {
Ok(n) => n,
Err(_) => CLIENT_SOCKET_RW_TIMEOUT_SEC,
fn client_with_proxy(url: &Url) -> ClientBuilder {

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 31, 2019

Contributor

This will silently do nothing if url.scheme() returns something other than "http" or "https". Certainly we don't expect any other schemes, but changing the logic to do something (even panic) would help illuminate otherwise confusing behavior should that somehow happen.

One thing that might make that more natural, while fitting more naturally into the builder pattern is to have this helper return a Proxy rather a ClientBuilder. That way, instead of calling it like L83 does:

        let mut client = client_with_proxy(&endpoint).default_headers(headers)
                                                      …

it could be

        let mut client = reqwest::Client::builder().proxy(proxy_for(&endpoint)?)
                                                   .default_headers(headers)
                                                   …

with client_with_proxy replaced by

fn proxy_for(url: &Url) -> reqwest::Result<Proxy> {
    trace!("Checking proxy for url: {:?}", url);

    if let Some(proxy_url) = env_proxy::for_url(&url).to_string() {
        match url.scheme() {
            "http" => {
                debug!("Setting http_proxy to {}", proxy_url);
                Proxy::http(&proxy_url)
            }
            "https" => {
                debug!("Setting https proxy to {}", proxy_url);
                Proxy::https(&proxy_url)
            }
            _ => unimplemented!(),
        }
    } else {
        debug!("No proxy configured for url: {:?}", url);
        Ok(Proxy::custom(|_| None::<Url>))
    }
}

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 6, 2019

Author Member

Agreed, will update

let path = entry.path();

if let Some(ext) = path.extension() {
if path.is_file() && ((ext == "pem") || (ext == "der")) {

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 31, 2019

Contributor

According to the docs:

This function will traverse symbolic links to query information about the destination file. In case of broken symbolic links this will return false.

Or, are you wondering about the case where foo is a symlink with target bar.pem and we reject it based on the fact that foo lacks the appropriate extension?

I would argue that we'd be better off not even caring about the extension and just trying to process the file first as a PEM and if that fails as a DER. As far as I can tell, there shouldn't ever be ambiguity since one format is ASCII and the other is binary.


fn cert_from_file(file_path: &PathBuf) -> Result<Certificate> {
let mut buf = Vec::new();
File::open(file_path)?.read_to_end(&mut buf)?;

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 31, 2019

Contributor

These two lines can be simplified and the mut eliminated:

    let buf = fs::read(file_path)?;

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 6, 2019

Author Member

Done

Ok(certificates)
}

fn cert_from_file(file_path: &PathBuf) -> Result<Certificate> {

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 31, 2019

Contributor

&PathBuf is akin to &String. Since we're not mutating the value, &Path is preferred in the same way that &str is preferred since it's more flexible and less indirection.

Suggested change
fn cert_from_file(file_path: &PathBuf) -> Result<Certificate> {
fn cert_from_file(file_path: &Path) -> Result<Certificate> {

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 6, 2019

Author Member

Fixed

let cacerts_ident = PackageIdent::from_str(CACERTS_PKG_IDENT)?;

if let Ok(pkg_install) = PackageInstall::load(&cacerts_ident, fs_root_path) {
let pkg_certs = pkg_install.installed_path().join("ssl/cert.pem");

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 31, 2019

Contributor

This is just one cert (or at least one element in the certificates Vec), right? In that case, I expected it to be called pkg_cert.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 6, 2019

Author Member

We seem to use the plural by default as in the "cacerts" package... leaving it as is for now

let pkg_certs = pkg_install.installed_path().join("ssl/cert.pem");
debug!("Found installed certs: {}", pkg_certs.display());
let cert = cert_from_file(&pkg_certs)?;
certificates.push(cert);

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 31, 2019

Contributor

This reads a little easier to be without the additional binding:

        certificates.push(cert_from_file(&pkg_certs)?);

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 6, 2019

Author Member

Updated

/// 3. Other certs files (for example self-signed certs) that are found in the SSL cache directory
/// will also get loaded into the root certs list. Both PEM and DER formats are supported (the
/// extensions should be '.pem' or '.der' respectively)
fn certificates(fs_root_path: Option<&Path>) -> Result<Vec<Certificate>> {

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 31, 2019

Contributor

Since it's used 3 times, it might be nice to add a binding:

    let cert_cache_dir = cache_ssl_path(fs_root_path);

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 6, 2019

Author Member

Done

fs::create_dir_all(cache_ssl_path(fs_root_path))?;
debug!("Creating cached cacert.pem: {}", cached_certs.display());
let mut file = File::create(&cached_certs)?;
file.write_all(CACERT_PEM.as_bytes())?;

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 31, 2019

Contributor

These two lines can be simplified to:

            fs::write(cached_certs, CACERT_PEM)?;

Or, if you want to logically show the connection between these two:

            fs::create_dir_all(&cert_cache_dir).and_then(|_| fs::write(cached_certs, CACERT_PEM))?;

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 6, 2019

Author Member

Done

/// 3. Other certs files (for example self-signed certs) that are found in the SSL cache directory
/// will also get loaded into the root certs list. Both PEM and DER formats are supported (the
/// extensions should be '.pem' or '.der' respectively)
fn certificates(fs_root_path: Option<&Path>) -> Result<Vec<Certificate>> {

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 31, 2019

Contributor

This function is fairly complex. It would be good to get some test coverage on it. For that, though, it would probably be helpful to break it up into its 3 logical segments described in the doc comment.

This comment has been minimized.

Copy link
@chefsalim

chefsalim Aug 6, 2019

Author Member

Simplified the function. For testing I'll open a separate issue to add cert cases to our e2e tests.