Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using http::HeaderMap internally #1498

Closed
SergioBenitez opened this issue Dec 29, 2020 · 11 comments
Closed

Consider using http::HeaderMap internally #1498

SergioBenitez opened this issue Dec 29, 2020 · 11 comments
Labels
enhancement A minor feature request help wanted Contributions to this issue are needed

Comments

@SergioBenitez
Copy link
Member

At present, Rocket spends significantly more time than necessary converting an http::HeaderMap and its internal values into a rocket::http::HeaderMap. More importantly, the conversion is directly in the request processing path, so it effects the performance of every request:

https://github.com/SergioBenitez/Rocket/blob/9671115796e42865eaebd020ac27542802027b02/core/lib/src/request/request.rs#L887-L893

Instead of our entirely custom HeaderMap type, we should consider leveraging the work in http by using an http::HeaderMap internally, wrapping it in a custom type to preserve our documentation and interoperability with the rest of Rocket's types. The http::HeaderMap allows customization of the internal value; for hyper, this is http::header::HeaderValue. Unfortunately, unlike Rocket's header map, this does not allow borrowing header values of a particular lifetime; all values are heap-allocated. The same issue occurs with header names. What's more, because values are stored as bytes in HeaderMap (a good thing), retrieving them as strings requires a UTF-8 check (a bad thing). The map also does not preserve case which is known to cause issues downstream.

In summary, there are at least the following road-blocks to using an http::HeaderMap internally:

  • Header names and values cannot borrow.
  • Retrieving a value as a string requires a UTF-8 check every time.
  • The map does not preserve header name casing; it should still be case-insensitive during comparison.
@SergioBenitez SergioBenitez added enhancement A minor feature request help wanted Contributions to this issue are needed labels Dec 29, 2020
@jebrosen
Copy link
Collaborator

jebrosen commented Dec 30, 2020

Header names and values cannot borrow.

We might actually be able to do this already, by having Request::from_hyp take an &'r HeaderMap! This still allocates when String::from_utf8_lossy does.

@SergioBenitez
Copy link
Member Author

Header names and values cannot borrow.

We might actually be able to do this already, by having Request::from_hyp take an &'r HeaderMap! This still allocates when String::from_utf8_lossy does.

I'm not sure I follow. As far as I can tell, there is no way to get HeaderMap to accept borrowed strings for header names, and the only way to do the same for values would be to have a custom HeaderValue type. When converting the hyper values into these new Rocket one's, it would mean at least copying all of the names/values into the new HeaderMap, though it would save the allocation time for each value, and we could optimize the value so that we don't "require a UTF-8 check every time". Is this what you had in mind?

@jebrosen
Copy link
Collaborator

Sorry - I conflated that point with the situation in both 0.4 and master, where the request headers are cloned into a new rocket HeaderMap (with new allocations), instead of taking references to hyper's original HeaderMap or reusing the allocations somehow where possible. Reusing or wrapping the HeaderMap wholesale makes neither of those necessary.

@jespersm
Copy link

I've tried implementing a new HeaderMap which wraps http::HeaderMap so that it can be moved in during request initialisation and response generation. Sure, it works, but there are a ton of doctest which end up looking rather ugly, where header data now are compared as strings.
Exposing the actual HeaderValues would make these tests slightly more tolerable, and push the eventual bytes->string handling onto the user.

As for the issues raised above:

Header names and values cannot borrow.

True, but this is slightly mitigated by taking ownership of Hyper's headers as suggested above.

For header names, it's not much waste anyway, since the internal representation is either an owned string (which will be initialized just once during processing), or a "known" header, which is represented efficiently.

Retrieving a value as a string requires a UTF-8 check every time.

Exposing the actual HeaderValues would make these tests slightly more tolerable, and push the eventual bytes->string handling onto the user. The only other way to deal with this is to ensure that only values stored in the map are always valid UTF-8, but that's not spec compliant.

The map also does not preserve case which is known to cause issues downstream.

We have those issues already on master, since the incoming AND outgoing headers are processed by http::HeaderMap anyway. So no user-chosen case gets in or out.

Finally, how should we insert and lookups by &str values which are not always valid header names? There's an issue in initializing a HeaderName form a bare string, since this can fail. This is fixable by using a declarative macro like header_name!("X-My-Header") which can use a const fn to ensure that the eventual unwrap() is safe, like const_assert does. Would that be too clunky? It's better to provide a compile-time check than subjecting the user to a panic.
Ideally, this should be done in http.

@SergioBenitez
Copy link
Member Author

Thanks for taking a look at this!

I've tried implementing a new HeaderMap which wraps http::HeaderMap so that it can be moved in during request initialisation and response generation. Sure, it works, but there are a ton of doctest which end up looking rather ugly, where header data now are compared as strings.

Are you saying you've done something like:

pub struct HeaderMap<'h> {
    hyper: http::HeaderMap
    headers: IndexMap<Uncased<'h>, Vec<Cow<'h, str>>>
}

Or something else? If you haven't done the former, this is the approach that I think would be a good fit in the interim.

Exposing the actual HeaderValues would make these tests slightly more tolerable, and push the eventual bytes->string handling onto the user.

I don't think the user should have to deal with such a common task themselves. At worst, we should have two methods, one which returns bytes and the other a string, caching the result of the latter for later retrieval.

As for the issues raised above:

Header names and values cannot borrow.

True, but this is slightly mitigated by taking ownership of Hyper's headers as suggested above.

The issue is with adding new headers that borrow from the request. Unless the approach I've suggested above is taken, the issue isn't at all mitigated.

Retrieving a value as a string requires a UTF-8 check every time.

Exposing the actual HeaderValues would make these tests slightly more tolerable, and push the eventual bytes->string handling onto the user. The only other way to deal with this is to ensure that only values stored in the map are always valid UTF-8, but that's not spec compliant.

I think a header get() should something look like:

fn get(&self name: &str) -> impl Iterator<Item = &str> {
    self.headers.get(name)
        .or_else(self.hyper.get(name))
        .filter_map(|h| h.as_str())
}

That is, try to get from self.headers first then from self.hyper. An insert would need to do the appropriate thing.

The map also does not preserve case which is known to cause issues downstream.

We have those issues already on master, since the incoming AND outgoing headers are processed by http::HeaderMap anyway. So no user-chosen case gets in or out.

That's a good point, though ideally this wouldn't be the case.

In all, I really think http::HeaderMap should simply be fixed to address the issues above.

jespersm added a commit to jespersm/Rocket that referenced this issue Feb 18, 2021
@jespersm
Copy link

jespersm commented Feb 18, 2021

Thanks for taking a look at this!

You're welcome. I really like Rocket's approach, and I'd like to help it towards a nice 0.5 release on stable Rust.

I've tried implementing a new HeaderMap which wraps http::HeaderMap so that it can be moved in during request initialisation and response generation. Sure, it works, but there are a ton of doctest which end up looking rather ugly, where header data now are compared as strings.

Are you saying you've done something like:

pub struct HeaderMap<'h> {
    hyper: http::HeaderMap
    headers: IndexMap<Uncased<'h>, Vec<Cow<'h, str>>>
}

No, I have just one map, the http::HeaderMap<> one.
Illegal header names and values are checked on insertion, rather than when committing the response, as now.

Or something else? If you haven't done the former, this is the approach that I think would be a good fit in the interim.

Using two maps would help with the duplication of incoming request headers, but not the overhead around outgoing response headers (nor the problem with converting values and header names to and from strings).
The response headers will ultimately be put into a http::HeaderMap on preparing the response, since hyper only deals with http::HeaderMap. That why I opted for the wholesale reuse of that.

Exposing the actual HeaderValues would make these tests slightly more tolerable, and push the eventual bytes->string handling onto the user.

I don't think the user should have to deal with such a common task themselves. At worst, we should have two methods, one which returns bytes and the other a string, caching the result of the latter for later retrieval.

I don't get this. Would you just "panic" on illegal header names or values, or "sanitize" them by dropping the offending bytes? Perhaps I am focusing too much on edge cases? ;-)

As for the issues raised above:

Header names and values cannot borrow.

True, but this is slightly mitigated by taking ownership of Hyper's headers as suggested above.

The issue is with adding new headers that borrow from the request. Unless the approach I've suggested above is taken, the issue isn't at all mitigated.

But the results are eventually allocated and put into http::HeaderMap before being returned to hyper (in Rocket::make_response. So there no allocations saved on the outgoing header?

Retrieving a value as a string requires a UTF-8 check every time.

Exposing the actual HeaderValues would make these tests slightly more tolerable, and push the eventual bytes->string handling onto the user. The only other way to deal with this is to ensure that only values stored in the map are always valid UTF-8, but that's not spec compliant.

I think a header get() should something look like:

fn get(&self name: &str) -> impl Iterator<Item = &str> {
    self.headers.get(name)
        .or_else(self.hyper.get(name))
        .filter_map(|h| h.as_str())
}

That is, try to get from self.headers first then from self.hyper. An insert would need to do the appropriate thing.

The map also does not preserve case which is known to cause issues downstream.

We have those issues already on master, since the incoming AND outgoing headers are processed by http::HeaderMap anyway. So no user-chosen case gets in or out.

That's a good point, though ideally this wouldn't be the case.

In all, I really think http::HeaderMap should simply be fixed to address the issues above.

Yes, that's probably the right place to address both the case-preservation and the borrowing issue, if fixable.

Anyway, I've thrown my work in progress in a branch on my fork: jespersm@fcb5981 (note: it has multiple issues, TODOs, and the tests don't compile.)

@SergioBenitez
Copy link
Member Author

SergioBenitez commented Feb 19, 2021

Excellent points regarding allocations! I would be in favor of a change to use http::HeaderMap internally as long as the public API does not change and continues to "allow" borrows of request-scoped data.

I don't get this. Would you just "panic" on illegal header names or values, or "sanitize" them by dropping the offending bytes? Perhaps I am focusing too much on edge cases? ;-)

Perhaps the correct surface API to change here is that of Header: should we make Header fail when an invalid header name/value is used? It would be great if we could have a const API that fails at compile-time for statically known strings.

@jespersm
Copy link

Excellent points regarding allocations! I would be in favor of a change to use http::HeaderMap internally as long as the public API does not change and continues to "allow" borrows of request-scoped data.

Right. This does require adding lifetimes for the borrowed values inside the http crate, for structs http::header::HeaderValue, http::header::HeaderMap, http::Response and http::Builder, a new http release, and it requires hyper to pick up that dependency. Perhaps we could also sneak in a way of inserting a complete HeaderMap into the Builder when making the response, that would save a bit of copying. I'll give it a go, and ask the maintainer.

It also requires either adding silent value dropping, or panic'ing on bad values, when getting and setting values.

I don't get this. Would you just "panic" on illegal header names or values, or "sanitize" them by dropping the offending bytes? Perhaps I am focusing too much on edge cases? ;-)

Perhaps the correct surface API to change here is that of Header: should we make Header fail when an invalid header name/value is used? It would be great if we could have a const API that fails at compile-time for statically known strings.

Exactly. I have a PoC of this in my branch already (for HeaderName):

let vals: Vec<_> = req.headers().get(header_name!("My-Header")).collect();

(Changing My-Header to e.g. Rødgrød-Header will cause a compile error, by the power of const fn). See at https://github.com/jespersm/Rocket/blob/wip_1498/examples/hello_world/src/main.rs#L22-L57 for details.

@SergioBenitez
Copy link
Member Author

SergioBenitez commented Feb 26, 2021

Right. This does require adding lifetimes for the borrowed values inside the http crate, for structs http::header::HeaderValue, http::header::HeaderMap, http::Response and http::Builder, a new http release, and it requires hyper to pick up that dependency. Perhaps we could also sneak in a way of inserting a complete HeaderMap into the Builder when making the response, that would save a bit of copying. I'll give it a go, and ask the maintainer.

I'm suggesting "lying": keep Rocket's API (except for Header related safety changes) exactly the same and allocate internally as needed. If we get an &'r str, call .to_string(), and so on. The idea is that we can fix http::HeaderMap later without any effect on Rocket's API.

Exactly. I have a PoC of this in my branch already (for HeaderName):

Very cool!

@jespersm
Copy link

I'm suggesting "lying"

Noted.

There's also the question of "bad" values, such as newlines, in a header value.

Today, the Rocket accepts them but "silently" crashes when pivoting the response. We have the option of either silently sanitizing the value, or dropping it altogether on insertion, if it contains illegal ASCII values. This should probably be logged in either case, so the unsuspecting developer can realize their mistake.

By the way, I've posted a draft PR for http with the borrowing version of HeaderValue, so the maintainers can decide if that's a road they want consider: hyperium/http#467

@SergioBenitez
Copy link
Member Author

Exactly. I have a PoC of this in my branch already (for HeaderName):

Very cool!

Just a note: if/when rust-lang/rust#51999 lands, we can make Header::new() exhibit the same behavior!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A minor feature request help wanted Contributions to this issue are needed
Projects
None yet
Development

No branches or pull requests

3 participants