Replace `curl` with `reqwest` #1499

sgrif · 2018-09-24T23:05:21Z

The diff stats may look scary, but the vast majority of the added lines is because reqwest depends on tokio, which caused 448 lines to be added to Cargo.lock. The actual code changes are not particularly large

Why should we do this?

curl::Easy is anything but. Its API is verbose, and consistently misunderstood in our codebase. These commits replace the curl crate with reqwest, which is designed to be a higher level HTTP client. It doesn't have every random nicety we might want (like setting Date on POST/PUT requests), but does have a ton of things we want built in like helpers for JSON request/response bodies that assumes the use of serde, and "turn 4xx/5xx response codes into an error" helper functions.

At best the curl crate required some very verbose code, or confusing APIs to make it possible to work with stack data, but at worst it led to some code that was confusing or even dangerous. A few things that were weird or gave me pause while writing this were:

Setting the Host header manually, which is at best pointless and at worst a major latent bug waiting to strike. I suspect that the first instance of this was in the S3 code, which was done because Host is mentioned in the S3 docs for common headers for some reason, and the other places this was done were due to that code being copied from the S3 code
Specifically calling a CURL function that opted into Expect: 100-Continue, and then manually overriding the Expect header with a comment that it's not clear what it's purpose is
Setting the Date header on requests for no reason (which on requests with no body, is actually a violation of the HTTP/1.1 spec)
Choosing "hello!" as a User-Agent string for endpoints which require it

What changes occur in our HTTP requests?

The http-data files are basically impossible to review, so here are the changes that were made:

All requests have the following headers added:

User-Agent: reqwest/0.9.1
Accept-Encoding: gzip

All requests had the following header removed:

Proxy-Connection: Keep-Alive
- Wikipedia describes this header as "Implemented as a misunderstanding of the HTTP specifications. Common because of mistakes in implementations of early HTTP versions." It has been removed from Firefox and more recent versions of curl.

Some requests had the following header removed:

User-Agent: hello!
- The HTTP specifications states that all user agents SHOULD send this header. All requests that we make are a direct result of user interaction, making us a user agent (and it's pretty standard to expect this header even from non-user-agents like crawlers). Arguably we should be setting this to crates.io, but this is at least marginally useful unlike "hello!" which is not appropriate here.

What notable changes happened in the code?

For the most part, the code is just replacing manual stuff with built-in stuff. I've removed anything that set a Date or User-Agent header manually (unless Date was required, which is the case for S3, since the date is used as part of the auth signature). I've removed all cases that were manually setting a Host header, since if this ever differed from the URL we used to lookup the domain, it would be very bad.

While making this change, a few refactorings occurred.

The S3 code was taking a file_length parameter as a u64. This was expected to be the length of the bytes passed in, but... we have a slice so we can just ask it the length. Due to a combination of the API reqwest wants to expose, and the fact that its backed by tokio, it can only work with static data. This was only relevant for S3 uploads, but changing the relevant functions from taking &[u8] to Vec<u8> was a trivial change.

The functions for interacting with the GitHub API were split into two pieces, one which made the request, and one which processed it. The only reason for this split was so that one consumer could intercept 404 responses and return Ok(false). I've opted to merge all of this into a single function, and return NotFound for 404, which we can check for in that one place. I was expecting to use Any::is there, but due to the way its written, it required duplicating its implementation (see the commit message for that commit for more details)

The only other point of note is that the changes to GhUser::init are not tested, since they only run if some files in this repo are not present, and for that code to run we'd need the passwords for two GitHub users that we don't have access to. I suspect we should probably just delete that code and panic if we don't have the file, but I haven't done so here.

There's some minor differences in behavior here -- We're no longer setting the `Date` header, as clients aren't supposed to send it for requests without bodies (and even for post/put it's optional). We're also not manually setting the host header because there's literally no reason for us to be doing so.

This behavior was split into two functions so that one of the 5 places we're calling it from could intercept a 404 response code. Since the happy path otherwise is just calling `.json`, this felt really awkward to me. Instead I've opted to return `NotFound`, and downcast to it in the error path for that one place. I expected to just be able to call `Any::is`, but it turns out this method is an inherent method not a trait method (which makes sense, otherwise `Any` wouldn't be object safe). However, a side effect of that is that even though `Any` is a supertrait of `CargoError`, we can't call `Any::is` since `dyn CargoError` can't be cast to `dyn Any`. This may change at some point in the future (see rust-lang/rfcs#2035), but we have to duplicate the body of `Any::is` for now. Except we can't even just duplicate the body of `Any::is`, because the only trait method for `Any` is unstable so we have to duplicate that method, too.......... Other notable changes: - We're no longer sending `Proxy-Connection: Keep-Alive`. According to Wikipedia, this is "Implemented as a misunderstanding of the HTTP specifications. Common because of mistakes in implementations of early HTTP versions". Firefox recently removed it, as have recent versions of curl. - We will accept gzipped responses now. This is good. - We send an actual user agent, instead of "hello!"

Notable differences in behavior: - We now send "User-Agent: reqwest/0.9.1" to S3 - We no longer send "Proxy-Connection: Keep-Alive" (which was useless) - We now send "Accept-Encoding: gzip", which has no effect on these requests, as we do not receive any response body We are no longer manually specifying the `Date` header, since this is at best completely pointless, and at worse a latent bug waiting to crop up. We are still setting the `Date` header, as S3 requires either that header or `x-amz-date` when `Authorization` is set, since the date is part of the signature used. This commit has the largest effect from removing curl, as this was the oldest code using it, had accumulated the most cruft, and had an API most affected by its use. Going through this code was a fun jump through history. There were a few notable interesting things about the old code. The return type of `Transfer` appeared to be to work around some lifetime issues from taking a slice. Notably, `reqwest` actually requires a static lifetime on the request body (due to the API it wants to expose, and the fact that it's backed by tokio, it cannot deal with non-static lifetimes). Luckily, there were no callers that actually needed to pass a slice. My favorite part of this code though, was using a curl method that has no purpose other than to opt into `Expect: 100-continue`, and then explicitly overriding the `Expect:` header due to not understanding it's purpose. (For the record, `Expect: 100-continue` is used when sending the request body is "inappropriate or inefficient", and specifies that the server should reply with 100 CONTINUE before the client will send the request body). I'm actually surprised this interfered with the recorder. The HTTP/1.1 specification points out that older servers may not recognize the `Expect` header, and that clients should not wait an indefinite period of time before sending the request body.

And for our final replacement, we fix some code that never actually runs, and would fail if it did because running it requires credentials for two specific github users that none of us have! Hooray!

sgrif · 2018-10-12T16:26:35Z

bors: r+

1499: Replace `curl` with `reqwest` r=sgrif a=sgrif **The diff stats may look scary, but the vast majority of the added lines is because `reqwest` depends on `tokio`, which caused 448 lines to be added to `Cargo.lock`. The actual code changes are not particularly large** Why should we do this? --- `curl::Easy` is anything but. Its API is verbose, and consistently misunderstood in our codebase. These commits replace the `curl` crate with `reqwest`, which is designed to be a higher level HTTP client. It doesn't have every random nicety we might want (like setting `Date` on `POST`/`PUT` requests), but does have a ton of things we want built in like helpers for JSON request/response bodies that assumes the use of serde, and "turn 4xx/5xx response codes into an error" helper functions. At best the curl crate required some very verbose code, or confusing APIs to make it possible to work with stack data, but at worst it led to some code that was confusing or even dangerous. A few things that were weird or gave me pause while writing this were: - Setting the `Host` header manually, which is at best pointless and at worst a major latent bug waiting to strike. I suspect that the first instance of this was in the S3 code, which was done because `Host` is mentioned in the S3 docs for common headers for some reason, and the other places this was done were due to that code being copied from the S3 code - Specifically calling a CURL function that opted into `Expect: 100-Continue`, and then manually overriding the `Expect` header with a comment that it's not clear what it's purpose is - Setting the `Date` header on requests for no reason (which on requests with no body, is actually a violation of the HTTP/1.1 spec) - Choosing "hello!" as a User-Agent string for endpoints which require it What changes occur in our HTTP requests? ---- The `http-data` files are basically impossible to review, so here are the changes that were made: All requests have the following headers added: - `User-Agent: reqwest/0.9.1` - `Accept-Encoding: gzip` All requests had the following header removed: - `Proxy-Connection: Keep-Alive` - Wikipedia describes this header as "Implemented as a misunderstanding of the HTTP specifications. Common because of mistakes in implementations of early HTTP versions." It has been removed from Firefox and more recent versions of curl. Some requests had the following header removed: - `User-Agent: hello!` - The HTTP specifications states that all user agents SHOULD send this header. All requests that we make are a direct result of user interaction, making us a user agent (and it's pretty standard to expect this header even from non-user-agents like crawlers). Arguably we should be setting this to `crates.io`, but this is at least marginally useful unlike "hello!" which is not appropriate here. What notable changes happened in the code? --- For the most part, the code is just replacing manual stuff with built-in stuff. I've removed anything that set a `Date` or `User-Agent` header manually (unless `Date` was required, which is the case for S3, since the date is used as part of the auth signature). I've removed all cases that were manually setting a `Host` header, since if this ever differed from the URL we used to lookup the domain, it would be very bad. While making this change, a few refactorings occurred. The S3 code was taking a `file_length` parameter as a `u64`. This was expected to be the length of the bytes passed in, but... we have a slice so we can just ask it the length. Due to a combination of the API reqwest wants to expose, and the fact that its backed by tokio, it can only work with static data. This was only relevant for S3 uploads, but changing the relevant functions from taking `&[u8]` to `Vec<u8>` was a trivial change. The functions for interacting with the GitHub API were split into two pieces, one which made the request, and one which processed it. The only reason for this split was so that one consumer could intercept 404 responses and return `Ok(false)`. I've opted to merge all of this into a single function, and return `NotFound` for 404, which we can check for in that one place. I was expecting to use `Any::is` there, but due to the way its written, it required duplicating its implementation (see the commit message for that commit for more details) The only other point of note is that the changes to `GhUser::init` are not tested, since they only run if some files in this repo are not present, and for that code to run we'd need the passwords for two GitHub users that we don't have access to. I suspect we should probably just delete that code and panic if we don't have the file, but I haven't done so here. Co-authored-by: Sean Griffin <sean@seantheprogrammer.com>

bors-voyager · 2018-10-12T16:49:50Z

Build succeeded

continuous-integration/travis-ci/push

sgrif added 5 commits September 24, 2018 12:56

Replace curl with reqwest in tests

20b7a11

And for our final replacement, we fix some code that never actually runs, and would fail if it did because running it requires credentials for two specific github users that none of us have! Hooray!

rustfmt

7920080

sgrif mentioned this pull request Sep 24, 2018

Rollup of #1498 and #1499 #1500

Closed

Make clippy happy

0d5ed7a

sgrif requested a review from jtgeibel October 11, 2018 22:05

jtgeibel approved these changes Oct 12, 2018

View reviewed changes

bors-voyager bot merged commit 0d5ed7a into rust-lang:master Oct 12, 2018

sgrif deleted the sg-remove-curl branch October 23, 2018 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace `curl` with `reqwest` #1499

Replace `curl` with `reqwest` #1499

Uh oh!

sgrif commented Sep 24, 2018

Uh oh!

sgrif commented Oct 12, 2018

Uh oh!

bors-voyager bot commented Oct 12, 2018

Uh oh!

Uh oh!

Replace curl with reqwest #1499

Replace curl with reqwest #1499

Uh oh!

Conversation

sgrif commented Sep 24, 2018

Why should we do this?

What changes occur in our HTTP requests?

What notable changes happened in the code?

Uh oh!

sgrif commented Oct 12, 2018

Uh oh!

bors-voyager bot commented Oct 12, 2018

Build succeeded

Uh oh!

Uh oh!

Replace `curl` with `reqwest` #1499

Replace `curl` with `reqwest` #1499