Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
The diff stats may look scary, but the vast majority of the added lines is because
Why should we do this?
At best the curl crate required some very verbose code, or confusing APIs to make it possible to work with stack data, but at worst it led to some code that was confusing or even dangerous. A few things that were weird or gave me pause while writing this were:
What changes occur in our HTTP requests?
All requests have the following headers added:
All requests had the following header removed:
Some requests had the following header removed:
What notable changes happened in the code?
For the most part, the code is just replacing manual stuff with built-in stuff. I've removed anything that set a
While making this change, a few refactorings occurred.
The S3 code was taking a
The functions for interacting with the GitHub API were split into two pieces, one which made the request, and one which processed it. The only reason for this split was so that one consumer could intercept 404 responses and return
The only other point of note is that the changes to
There's some minor differences in behavior here -- We're no longer setting the `Date` header, as clients aren't supposed to send it for requests without bodies (and even for post/put it's optional). We're also not manually setting the host header because there's literally no reason for us to be doing so.
This behavior was split into two functions so that one of the 5 places we're calling it from could intercept a 404 response code. Since the happy path otherwise is just calling `.json`, this felt really awkward to me. Instead I've opted to return `NotFound`, and downcast to it in the error path for that one place. I expected to just be able to call `Any::is`, but it turns out this method is an inherent method not a trait method (which makes sense, otherwise `Any` wouldn't be object safe). However, a side effect of that is that even though `Any` is a supertrait of `CargoError`, we can't call `Any::is` since `dyn CargoError` can't be cast to `dyn Any`. This may change at some point in the future (see rust-lang/rfcs#2035), but we have to duplicate the body of `Any::is` for now. Except we can't even just duplicate the body of `Any::is`, because the only trait method for `Any` is unstable so we have to duplicate that method, too.......... Other notable changes: - We're no longer sending `Proxy-Connection: Keep-Alive`. According to Wikipedia, this is "Implemented as a misunderstanding of the HTTP specifications. Common because of mistakes in implementations of early HTTP versions". Firefox recently removed it, as have recent versions of curl. - We will accept gzipped responses now. This is good. - We send an actual user agent, instead of "hello!"
Notable differences in behavior: - We now send "User-Agent: reqwest/0.9.1" to S3 - We no longer send "Proxy-Connection: Keep-Alive" (which was useless) - We now send "Accept-Encoding: gzip", which has no effect on these requests, as we do not receive any response body We are no longer manually specifying the `Date` header, since this is at best completely pointless, and at worse a latent bug waiting to crop up. We are still setting the `Date` header, as S3 requires either that header or `x-amz-date` when `Authorization` is set, since the date is part of the signature used. This commit has the largest effect from removing curl, as this was the oldest code using it, had accumulated the most cruft, and had an API most affected by its use. Going through this code was a fun jump through history. There were a few notable interesting things about the old code. The return type of `Transfer` appeared to be to work around some lifetime issues from taking a slice. Notably, `reqwest` actually requires a static lifetime on the request body (due to the API it wants to expose, and the fact that it's backed by tokio, it cannot deal with non-static lifetimes). Luckily, there were no callers that actually needed to pass a slice. My favorite part of this code though, was using a curl method that has no purpose other than to opt into `Expect: 100-continue`, and then explicitly overriding the `Expect:` header due to not understanding it's purpose. (For the record, `Expect: 100-continue` is used when sending the request body is "inappropriate or inefficient", and specifies that the server should reply with 100 CONTINUE before the client will send the request body). I'm actually surprised this interfered with the recorder. The HTTP/1.1 specification points out that older servers may not recognize the `Expect` header, and that clients should not wait an indefinite period of time before sending the request body.
1499: Replace `curl` with `reqwest` r=sgrif a=sgrif **The diff stats may look scary, but the vast majority of the added lines is because `reqwest` depends on `tokio`, which caused 448 lines to be added to `Cargo.lock`. The actual code changes are not particularly large** Why should we do this? --- `curl::Easy` is anything but. Its API is verbose, and consistently misunderstood in our codebase. These commits replace the `curl` crate with `reqwest`, which is designed to be a higher level HTTP client. It doesn't have every random nicety we might want (like setting `Date` on `POST`/`PUT` requests), but does have a ton of things we want built in like helpers for JSON request/response bodies that assumes the use of serde, and "turn 4xx/5xx response codes into an error" helper functions. At best the curl crate required some very verbose code, or confusing APIs to make it possible to work with stack data, but at worst it led to some code that was confusing or even dangerous. A few things that were weird or gave me pause while writing this were: - Setting the `Host` header manually, which is at best pointless and at worst a major latent bug waiting to strike. I suspect that the first instance of this was in the S3 code, which was done because `Host` is mentioned in the S3 docs for common headers for some reason, and the other places this was done were due to that code being copied from the S3 code - Specifically calling a CURL function that opted into `Expect: 100-Continue`, and then manually overriding the `Expect` header with a comment that it's not clear what it's purpose is - Setting the `Date` header on requests for no reason (which on requests with no body, is actually a violation of the HTTP/1.1 spec) - Choosing "hello!" as a User-Agent string for endpoints which require it What changes occur in our HTTP requests? ---- The `http-data` files are basically impossible to review, so here are the changes that were made: All requests have the following headers added: - `User-Agent: reqwest/0.9.1` - `Accept-Encoding: gzip` All requests had the following header removed: - `Proxy-Connection: Keep-Alive` - Wikipedia describes this header as "Implemented as a misunderstanding of the HTTP specifications. Common because of mistakes in implementations of early HTTP versions." It has been removed from Firefox and more recent versions of curl. Some requests had the following header removed: - `User-Agent: hello!` - The HTTP specifications states that all user agents SHOULD send this header. All requests that we make are a direct result of user interaction, making us a user agent (and it's pretty standard to expect this header even from non-user-agents like crawlers). Arguably we should be setting this to `crates.io`, but this is at least marginally useful unlike "hello!" which is not appropriate here. What notable changes happened in the code? --- For the most part, the code is just replacing manual stuff with built-in stuff. I've removed anything that set a `Date` or `User-Agent` header manually (unless `Date` was required, which is the case for S3, since the date is used as part of the auth signature). I've removed all cases that were manually setting a `Host` header, since if this ever differed from the URL we used to lookup the domain, it would be very bad. While making this change, a few refactorings occurred. The S3 code was taking a `file_length` parameter as a `u64`. This was expected to be the length of the bytes passed in, but... we have a slice so we can just ask it the length. Due to a combination of the API reqwest wants to expose, and the fact that its backed by tokio, it can only work with static data. This was only relevant for S3 uploads, but changing the relevant functions from taking `&[u8]` to `Vec<u8>` was a trivial change. The functions for interacting with the GitHub API were split into two pieces, one which made the request, and one which processed it. The only reason for this split was so that one consumer could intercept 404 responses and return `Ok(false)`. I've opted to merge all of this into a single function, and return `NotFound` for 404, which we can check for in that one place. I was expecting to use `Any::is` there, but due to the way its written, it required duplicating its implementation (see the commit message for that commit for more details) The only other point of note is that the changes to `GhUser::init` are not tested, since they only run if some files in this repo are not present, and for that code to run we'd need the passwords for two GitHub users that we don't have access to. I suspect we should probably just delete that code and panic if we don't have the file, but I haven't done so here. Co-authored-by: Sean Griffin <email@example.com>