Raw byte access for Headers is not ergonomic #50

reem · 2014-09-23T07:26:28Z

We invalidate raw headers, will soon return *const, etc.

The story here is complex. You can treat the Raw representation as just another Typed header, except this one is Raw(&[Vec<u8>]) or so, so the problem devolves to providing safe re-parsing of headers. This is possible through one of three methods:

Demand &mut self in every getter method.
Use RWLock and return RWLockReadGuard or RWLockWriteGuard, depending.
Use RefCell to vastly simplify the code, and return Ref or RefMut, depending.
Return Rc or Arc and otherwise make life simpler by introducing refcounting.

The first approach is not ergonomic:

let length = headers.get::<ContentLength>();
let date = headers.get::<Date>(); // illegal

The second is liable to cause deadlocks:

let length = request.headers.get::<ContenLength>();
let crazy = request.headers.get::<CrazyLength>(); // Tries to get a write lock on the ContentLength item, deadlocks.

The third is liable to cause runtime errors in the same way, and the fourth introduces a possibly ridiculous amount of overhead.

The text was updated successfully, but these errors were encountered:

reem · 2014-09-23T07:46:13Z

The issue is that it is not possible to provide the granular guarantee to the type system/borrow checker that no two headers with the same name are accessed concurrently. The &mut self solution just bans all access to two headers at the same time, and the RWLock/RefCell solution pushes the problem to runtime, but allows it in the case where you access two different representations.

seanmonstar · 2014-09-25T03:15:06Z

Additionally, while for most use-cases, the Typed API should be preferred, Servo has a legitimate reason for accessing all the Raw headers: providing the JS methods xhr.getResponseHeader() and family. In JavaScript, you can pass any string, and Servo cannot possibly know ahead of time (and so use compiled in types) which strings may be used. Also, the result passed to JS must a be a String. So, in those bindings, accessing raw headers (and many of them, thanks to xhr.getAllResponseHeaders()).

Those restraints mean that we cannot do what is currently done: we must keep the raw bytes around even if accessed as a Typed value.

The steps I think we need are:

We go back to struct Item { raw: Vec<Vec<u8>>, typed: Box<Header> }. We provided headers.get_raw(key: &str) -> Option<&[Vec<u8>]. mut isn't needed, since it doesn't require any transforming to get the raw bytes.
The get(&mut self) still takes mut, and the compiler prevents a mutable borrow while there exists immutable borrows.
Request<W>.headers() returns a &mut HeadersView that only provides get, get_ref, and has methods.
Request<Fresh>.headers_mut() returns a &mut Headers, as it does normally.

Alternatives: instead of a &mut HeadersView, we could use a phantom type like with Request, such as &mut Headers<Fresh> and &mut Headers<Frozen>.

reem · 2014-09-25T03:23:49Z

I was thinking about allowing access to raw headers through a type which just wraps a regular header and returns the input in parse_header - for instance headers.get::<Raw<ContentLength>>().

However, I didn't consider that this still doesn't allow for dynamic raw header access through strings.

On Wed, Sep 24, 2014 at 11:15 PM, Sean McArthur notifications@github.com
wrote:

Additionally, while for most use-cases, the Typed API should be preferred, Servo has a legitimate reason for accessing all the Raw headers: providing the JS methods xhr.getResponseHeader() and family. In JavaScript, you can pass any string, and Servo cannot possibly know ahead of time (and so use compiled in types) which strings may be used. Also, the result passed to JS must a be a String. So, in those bindings, accessing raw headers (and many of them, thanks to xhr.getAllResponseHeaders()).

Those restraints mean that we cannot do what is currently done: we must keep the raw bytes around even if accessed as a Typed value.
The steps I think we need are:

We go back to struct Item { raw: Vec<Vec<u8>>, typed: Box<Header> }. We provided headers.get_raw(key: &str) -> Option<&[Vec<u8>]. mut isn't needed, since it doesn't require any transforming to get the raw bytes.

The get(&mut self) still takes mut, and the compiler prevents a mutable borrow while there exists immutable borrows.

Request<W>.headers() returns a &mut HeadersView that only provides get, get_ref, and has methods.

Request<Fresh>.headers_mut() returns a &mut Headers, as it does normally.

Alternatives: instead of a &mut HeadersView, we could use a phantom type like with Request, such as &mut Headers<Fresh> and &mut Headers<Frozen>.

Reply to this email directly or view it on GitHub:
#50 (comment)

reem · 2014-09-30T03:00:57Z

I think that we can provide both DynamicRaw and TypedRaw<H: Header>, where you can get TypedRaw using the regular methods and DynamicRaw using a special raw method which takes SendStr.

seanmonstar · 2014-09-30T03:56:43Z

Any reason to receive a SendStr as an argument instead of a &str?

reem · 2014-09-30T04:44:49Z

I just spent some time whiteboarding this issue, and I think that I've found a pretty good solution:

struct Headers { .. }

impl Headers {
   // Typed
   fn get<'a, M: Marker>(&'a self) -> Option<HeaderView<'a, M>>;
   fn set<M: Marker, H: Header<M>>(&mut self, H);

   // Raw
   fn get_raw<'a>(&'a self, &str) -> Option<RawView<'a>>;
   fn set_raw(&self, &str, Vec<Vec<u8>>);
}

struct HeaderView<'a, M: Marker> {
    item: &'a HeaderItem 
}

struct HeaderItem(RWLock<Box<Header<Erased> + Send + Sync>>);

impl<M: Marker> HeaderView<M> {
  fn as<'a, H: Header<M>>(&'a self) -> Option<ReadGuard<'a, H>>
  fn as_mut<'a, H: Header<M>>(&'a self) -> Option<WriteGuard<'a, H>>
}

// A bit of hypothetical Associated Statics syntax. Takes the form header_name in real code today.
struct Erased; impl Marker for Erased { name = "" };
trait Marker { name: &'static str; }
trait Header<M: Marker> {} // current methods, minus name stuff
// RawView works a lot like HeaderView except it parses to 
// ReadGuard<Vec<Vec<u8>]> and WriteGuard<Vec<Vec<u8>>>

This allows multiple headers to be accessed at once with minimum likelihood of stepping on each-others toes. It has some of the same issues as just returning ReadGuard directly, but due to the ergonomics of doing something bad, I think it's not as big of an issue.

If you think this API might be valuable, I can take a shot at implementing it.

reem · 2014-10-01T04:20:53Z

Here's a gist with the meat of the above implementation, corrected for reality: https://gist.github.com/27fb134d2554572b21f8

(Untested, uncompiled code)

reem · 2014-10-01T20:12:02Z

I've been thinking about the above interface and I'm having second thoughts. The interface is unfortunately not as safe as just taking &mut, because it can cause deadlocks, but it does allow immutable access to multiple headers at once, which is a benefit.

By introducing the marker and separating get and read_as/write_as we reduce the chance of deadlocks but they are still possible.

seanmonstar · 2014-10-01T22:34:19Z

Sorry about the late response, work gets in the way some days.

I'd like to take a second to map out exact objectives, so the end goal is clearer. I'll probably update the initial comment with the points.

seanmonstar · 2014-10-17T17:38:32Z

Here's a crazy idea. What if we didn't store the typed version after
parsing? How often does someone get the same header multiple times, thus
paying the cost more than once?

Instead, by not storing the typed value, we don't need a mutable borrow to
parse. You can safely access get_raw, since the only way to change it is
with set(), and that takes a mutable borrow.

Getting a different type for the same header would still have required a
new parse, so that's no different, except it wouldn't invalidate any
references.

reem · 2014-10-18T00:41:37Z

I think the performance implications of that are too bad. &mut self requires cloning, which I think is likely to be much cheaper than re-parsing.

seanmonstar · 2014-10-18T01:02:15Z

Yea, cloning would be faster than reparsing. Still, I can't imagine needing
to get a header more than once in most cases. So by optimizing the small
case, we make it more painful by requiring a mutable borrow.

Anyway, I'm trying it in a branch and we can see benchmarks.

reem · 2014-10-18T03:03:22Z

I'm worried that it will be particularly nasty for certain complex headers, but stats are always better than speculation.

seanmonstar · 2014-12-03T20:02:29Z

This is mostly dealt with. Specific pain points can become new issues, if needed.

reem added A-server Area: server. servo and removed A-server Area: server. labels Sep 23, 2014

reem modified the milestone: servo Sep 23, 2014

seanmonstar added the A-client Area: client. label Sep 25, 2014

gtolle mentioned this issue Oct 16, 2014

Creating headers without creating Header structs #81

Closed

seanmonstar added the A-headers Area: headers. label Oct 18, 2014

seanmonstar closed this as completed Dec 3, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raw byte access for Headers is not ergonomic #50

Raw byte access for Headers is not ergonomic #50

reem commented Sep 23, 2014

reem commented Sep 23, 2014

seanmonstar commented Sep 25, 2014

reem commented Sep 25, 2014

Alternatives: instead of a `&mut HeadersView`, we could use a phantom type like with `Request`, such as `&mut Headers<Fresh>` and `&mut Headers<Frozen>`.

reem commented Sep 30, 2014

seanmonstar commented Sep 30, 2014

reem commented Sep 30, 2014

reem commented Oct 1, 2014

reem commented Oct 1, 2014

seanmonstar commented Oct 1, 2014

seanmonstar commented Oct 17, 2014

reem commented Oct 18, 2014

seanmonstar commented Oct 18, 2014

reem commented Oct 18, 2014

seanmonstar commented Dec 3, 2014

Raw byte access for Headers is not ergonomic #50

Raw byte access for Headers is not ergonomic #50

Comments

reem commented Sep 23, 2014

reem commented Sep 23, 2014

seanmonstar commented Sep 25, 2014

reem commented Sep 25, 2014

Alternatives: instead of a &mut HeadersView, we could use a phantom type like with Request, such as &mut Headers<Fresh> and &mut Headers<Frozen>.

reem commented Sep 30, 2014

seanmonstar commented Sep 30, 2014

reem commented Sep 30, 2014

reem commented Oct 1, 2014

reem commented Oct 1, 2014

seanmonstar commented Oct 1, 2014

seanmonstar commented Oct 17, 2014

reem commented Oct 18, 2014

seanmonstar commented Oct 18, 2014

reem commented Oct 18, 2014

seanmonstar commented Dec 3, 2014

Alternatives: instead of a `&mut HeadersView`, we could use a phantom type like with `Request`, such as `&mut Headers<Fresh>` and `&mut Headers<Frozen>`.