Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw byte access for Headers is not ergonomic #50

Closed
reem opened this issue Sep 23, 2014 · 14 comments
Closed

Raw byte access for Headers is not ergonomic #50

reem opened this issue Sep 23, 2014 · 14 comments
Labels
A-client Area: client. A-headers Area: headers. A-server Area: server.

Comments

@reem
Copy link
Contributor

reem commented Sep 23, 2014

We invalidate raw headers, will soon return *const, etc.

The story here is complex. You can treat the Raw representation as just another Typed header, except this one is Raw(&[Vec<u8>]) or so, so the problem devolves to providing safe re-parsing of headers. This is possible through one of three methods:

  • Demand &mut self in every getter method.
  • Use RWLock and return RWLockReadGuard or RWLockWriteGuard, depending.
  • Use RefCell to vastly simplify the code, and return Ref or RefMut, depending.
  • Return Rc or Arc and otherwise make life simpler by introducing refcounting.

The first approach is not ergonomic:

let length = headers.get::<ContentLength>();
let date = headers.get::<Date>(); // illegal

The second is liable to cause deadlocks:

let length = request.headers.get::<ContenLength>();
let crazy = request.headers.get::<CrazyLength>(); // Tries to get a write lock on the ContentLength item, deadlocks.

The third is liable to cause runtime errors in the same way, and the fourth introduces a possibly ridiculous amount of overhead.

@reem reem added A-server Area: server. servo and removed A-server Area: server. labels Sep 23, 2014
@reem reem modified the milestone: servo Sep 23, 2014
@reem
Copy link
Contributor Author

reem commented Sep 23, 2014

The issue is that it is not possible to provide the granular guarantee to the type system/borrow checker that no two headers with the same name are accessed concurrently. The &mut self solution just bans all access to two headers at the same time, and the RWLock/RefCell solution pushes the problem to runtime, but allows it in the case where you access two different representations.

@seanmonstar seanmonstar added the A-client Area: client. label Sep 25, 2014
@seanmonstar
Copy link
Member

Additionally, while for most use-cases, the Typed API should be preferred, Servo has a legitimate reason for accessing all the Raw headers: providing the JS methods xhr.getResponseHeader() and family. In JavaScript, you can pass any string, and Servo cannot possibly know ahead of time (and so use compiled in types) which strings may be used. Also, the result passed to JS must a be a String. So, in those bindings, accessing raw headers (and many of them, thanks to xhr.getAllResponseHeaders()).


Those restraints mean that we cannot do what is currently done: we must keep the raw bytes around even if accessed as a Typed value.

The steps I think we need are:

  1. We go back to struct Item { raw: Vec<Vec<u8>>, typed: Box<Header> }. We provided headers.get_raw(key: &str) -> Option<&[Vec<u8>]. mut isn't needed, since it doesn't require any transforming to get the raw bytes.
  2. The get(&mut self) still takes mut, and the compiler prevents a mutable borrow while there exists immutable borrows.
  3. Request<W>.headers() returns a &mut HeadersView that only provides get, get_ref, and has methods.
  4. Request<Fresh>.headers_mut() returns a &mut Headers, as it does normally.

Alternatives: instead of a &mut HeadersView, we could use a phantom type like with Request, such as &mut Headers<Fresh> and &mut Headers<Frozen>.

@reem
Copy link
Contributor Author

reem commented Sep 25, 2014

I was thinking about allowing access to raw headers through a type which just wraps a regular header and returns the input in parse_header - for instance headers.get::<Raw<ContentLength>>()

However, I didn't consider that this still doesn't allow for dynamic raw header access through strings.

On Wed, Sep 24, 2014 at 11:15 PM, Sean McArthur notifications@github.com
wrote:

Additionally, while for most use-cases, the Typed API should be preferred, Servo has a legitimate reason for accessing all the Raw headers: providing the JS methods xhr.getResponseHeader() and family. In JavaScript, you can pass any string, and Servo cannot possibly know ahead of time (and so use compiled in types) which strings may be used. Also, the result passed to JS must a be a String. So, in those bindings, accessing raw headers (and many of them, thanks to xhr.getAllResponseHeaders()).

Those restraints mean that we cannot do what is currently done: we must keep the raw bytes around even if accessed as a Typed value.
The steps I think we need are:

  1. We go back to struct Item { raw: Vec<Vec<u8>>, typed: Box<Header> }. We provided headers.get_raw(key: &str) -> Option<&[Vec<u8>]. mut isn't needed, since it doesn't require any transforming to get the raw bytes.
  2. The get(&mut self) still takes mut, and the compiler prevents a mutable borrow while there exists immutable borrows.
  3. Request<W>.headers() returns a &mut HeadersView that only provides get, get_ref, and has methods.
  4. Request<Fresh>.headers_mut() returns a &mut Headers, as it does normally.

Alternatives: instead of a &mut HeadersView, we could use a phantom type like with Request, such as &mut Headers<Fresh> and &mut Headers<Frozen>.

Reply to this email directly or view it on GitHub:
#50 (comment)

@reem
Copy link
Contributor Author

reem commented Sep 30, 2014

I think that we can provide both DynamicRaw and TypedRaw<H: Header>, where you can get TypedRaw using the regular methods and DynamicRaw using a special raw method which takes SendStr.

@seanmonstar
Copy link
Member

Any reason to receive a SendStr as an argument instead of a &str?

@reem
Copy link
Contributor Author

reem commented Sep 30, 2014

I just spent some time whiteboarding this issue, and I think that I've found a pretty good solution:

struct Headers { .. }

impl Headers {
   // Typed
   fn get<'a, M: Marker>(&'a self) -> Option<HeaderView<'a, M>>;
   fn set<M: Marker, H: Header<M>>(&mut self, H);

   // Raw
   fn get_raw<'a>(&'a self, &str) -> Option<RawView<'a>>;
   fn set_raw(&self, &str, Vec<Vec<u8>>);
}

struct HeaderView<'a, M: Marker> {
    item: &'a HeaderItem 
}

struct HeaderItem(RWLock<Box<Header<Erased> + Send + Sync>>);

impl<M: Marker> HeaderView<M> {
  fn as<'a, H: Header<M>>(&'a self) -> Option<ReadGuard<'a, H>>
  fn as_mut<'a, H: Header<M>>(&'a self) -> Option<WriteGuard<'a, H>>
}

// A bit of hypothetical Associated Statics syntax. Takes the form header_name in real code today.
struct Erased; impl Marker for Erased { name = "" };
trait Marker { name: &'static str; }
trait Header<M: Marker> {} // current methods, minus name stuff
// RawView works a lot like HeaderView except it parses to 
// ReadGuard<Vec<Vec<u8>]> and WriteGuard<Vec<Vec<u8>>>

This allows multiple headers to be accessed at once with minimum likelihood of stepping on each-others toes. It has some of the same issues as just returning ReadGuard directly, but due to the ergonomics of doing something bad, I think it's not as big of an issue.

If you think this API might be valuable, I can take a shot at implementing it.

@reem
Copy link
Contributor Author

reem commented Oct 1, 2014

Here's a gist with the meat of the above implementation, corrected for reality: https://gist.github.com/27fb134d2554572b21f8

(Untested, uncompiled code)

@reem
Copy link
Contributor Author

reem commented Oct 1, 2014

I've been thinking about the above interface and I'm having second thoughts. The interface is unfortunately not as safe as just taking &mut, because it can cause deadlocks, but it does allow immutable access to multiple headers at once, which is a benefit.

By introducing the marker and separating get and read_as/write_as we reduce the chance of deadlocks but they are still possible.

@seanmonstar
Copy link
Member

Sorry about the late response, work gets in the way some days.

I'd like to take a second to map out exact objectives, so the end goal is clearer. I'll probably update the initial comment with the points.

@seanmonstar
Copy link
Member

Here's a crazy idea. What if we didn't store the typed version after
parsing? How often does someone get the same header multiple times, thus
paying the cost more than once?

Instead, by not storing the typed value, we don't need a mutable borrow to
parse. You can safely access get_raw, since the only way to change it is
with set(), and that takes a mutable borrow.

Getting a different type for the same header would still have required a
new parse, so that's no different, except it wouldn't invalidate any
references.

@reem
Copy link
Contributor Author

reem commented Oct 18, 2014

I think the performance implications of that are too bad. &mut self requires cloning, which I think is likely to be much cheaper than re-parsing.

@seanmonstar
Copy link
Member

Yea, cloning would be faster than reparsing. Still, I can't imagine needing
to get a header more than once in most cases. So by optimizing the small
case, we make it more painful by requiring a mutable borrow.

Anyway, I'm trying it in a branch and we can see benchmarks.

@reem
Copy link
Contributor Author

reem commented Oct 18, 2014

I'm worried that it will be particularly nasty for certain complex headers, but stats are always better than speculation.

@seanmonstar seanmonstar added the A-headers Area: headers. label Oct 18, 2014
@seanmonstar
Copy link
Member

This is mostly dealt with. Specific pain points can become new issues, if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-client Area: client. A-headers Area: headers. A-server Area: server.
Projects
None yet
Development

No branches or pull requests

2 participants