Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quoted pairs #68

Merged
merged 10 commits into from Dec 1, 2017
Merged

Quoted pairs #68

merged 10 commits into from Dec 1, 2017

Conversation

rustonaut
Copy link
Contributor

I originally only wanted to add a as_str_repr, a to_content method and fix equality comparsion wrt. to quoted-pairs but because Name was used for both names and values and names should be usable in
patterns this was not possible (to be usable in patterns derive(PartialEq) is required so no custom eq implementation). So I also had to split it into Name+Value.

@rustonaut
Copy link
Contributor Author

I also ran the benchmarks and there where basically no changes in performance (the new version was a bit faster but only in a degree which can be cause by noise)

@rustonaut
Copy link
Contributor Author

Wups, travis is still testing a quite a old rust version... I will fix this (I my code, not travis)

@rustonaut
Copy link
Contributor Author

rustonaut commented Nov 7, 2017

It might make change to raise the minimal rust version from 1.15.0 to the first stable rust version which supports pub(crate) which then would allow you have multiple modules instead of pseudo modules separated with // Name ==============. I originally had Name/Value in sub modules but
hade to throw them back into lib.

If its fine with you I would go through the crate and clean thinks a bit up after bumping
the minimal rust version, like e.g. splitting up the parsing function into multiple ones or
even better use nom for it, rename some of the fields like e.g. slash to slash_idx
, split thinks into multiple modules for readability, etc. but that would be another pull request.

@seanmonstar
Copy link
Member

Wow, this looks legit. I'll need a moment to review, but wanted to answer your questions: it seems that this is a breaking change, by splitting to a new Value type. In that case, it seems fine to include an increase to the minimum Rust version (preferably not the absolute latest).

For utf-8 vs utf8, would it be a good idea to encode that possibility in the Value type itself, such that UTF_8 will equal either version, without needing 2 constants?

@rustonaut
Copy link
Contributor Author

The rust version I would like to bump it to would be 1.18.0 as it is the first which allows pub(crate) in stable.

Wrt. utf8 I currently consider 3 options:

  1. make Value and enum ( Utf8 / Other { source: &'a str })
  2. move all special handling of utf8 in a method (either has_utf8_charset() on Mime or is_utf8() on Value)
  3. remove all special handling for utf8 from Value but something like Pattern trait + impl<T> PartialEq<T> for Value where T: Pattern and a zero-sized UTF_8 type implementing Pattern (this still needs to be flushed out more)

The benefit of 1. is a faster comparison of utf8 but a very slightly slower comparison for any other param.
(Through not slower then currently, more like the same performance for everything except comparing a utf8 Value to a utf8 Value (not string) which is much faster)

The benefit of 2. is no special handling of utf8 anywhere but in the one method.

The benefit of 3. is that people can create there own Patterns.

@rustonaut
Copy link
Contributor Author

rustonaut commented Nov 10, 2017

I think it's best to go with 1. charset=utf8 is basically the most used mime parameter, so that should be fine.

I also noticed that I made a mistake by assuming the PartialEq implementations for Mime use Name (now Name/Value) to compare parameters, but they don't. So I still have to fix that.

I also noticed that:

  1. the comparsion of two Mimes depends on the order of the parameters, just that in the spec they are order independent
  2. I have to take another look into where you can place whitespaces (i.e. text/plain ;charset=utf8, charset = utf8 etc.)

Through I would leaf this for another pull request (should be combined into a single braking change).

@rustonaut
Copy link
Contributor Author

I think there is a better way to fix all this, which is to "normalize" while parsing.
We do copy the string anyway when parsing, so we could do so and
normalize it:

  • strip linear whitespaces
  • make all case insensitive letters lowercase
  • replacing any unnecessary quoted pair with a unquoted one,
  • remove quoted where possible
  • special handle utf8 to replace any form of utf-8 (UTF8, Utf-8, ...) with utf-8

This would:

  1. this removes any problem with utf-8 and utf8 and case sensitivity of utf8 wrt. converting the mime back to an string / Deref<Target=str>
    • through a Value of utf8 compared to a string still needs special handling
  2. it also removes the problem that the mime crate can accept mimes which are valid but unusual
    and might be rejected by other tools (e.g. r#"param="a\bc""# can be used but r#"param="abc""#
    or better r#"param=abc"# should be preferred)
  3. make some comparisons faster (mainly Value to Value as they can memcmp, but also more values which are not quoted and can use memcmp in a Value to str comparision)

I'm not sure why I didn't do that from the beginning...
I will prototype and benchmark it now, to see if parsing is slower.

@seanmonstar
Copy link
Member

What you say sounds very reasonable! Does that mean you'd like to edit this PR more, or should I merge and you'd like to explore separately?

@rustonaut
Copy link
Contributor Author

I originally wanted to work a bit more on the PR.
While doing so I reread all relevant RFC (parts) and came to the conclusion that long term parser parameters are needed.

So I will now just fix the comparison of Media Types with string to use Name/Value for comparison and conclude this pull request and then continue to work on a rewriting the parser in a way which accepts parameters.


Why parameters are needed:
see new issue: #73

This required to split Name into Name & Value, as Value
needs to custom implement PartialEq but Name needs to
derive partial eq, as it should be usable with match
statements as pattern.
So now quoted-pairs can be used for what they where meant to
escape ".

Some additional tests where added to make sure nothing breakes
with this change.
(rust stability gurantees do not apply to
 crates using deny(warnings) as introducing
 warnings is not seen as a braking change)
Adds newlines to the end of file where nessesary and
removes all `eq_str` function moving their functionality
into a `PartialEq<str>` implementation
@rustonaut
Copy link
Contributor Author

rustonaut commented Nov 17, 2017

I cleaned thinks up a bit, should now be ready to be merged.

Note that I for now removed all special "utf8" (instead of utf-8) handling,
as the best way to making it work completely correct is to normalize while parsing
thinks into the internal buffer (called Source in the crate). Which I moved
to a later pull request due to the thinks in #73.

@seanmonstar
Copy link
Member

Awesome, thank you! (Sorry, I was off the internet completely for over a week).

@seanmonstar seanmonstar merged commit 843636c into hyperium:master Dec 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants