Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systematically ignore tabs and newlines, per spec change #190

Merged
merged 4 commits into from Apr 26, 2016

Conversation

@SimonSapin
Copy link
Member

SimonSapin commented Apr 25, 2016

The test changes correspond to bugs in rust-url before this change:
the value of pathname/search/hash setters does go through the URL parser
(though with a state override), which includes removing trailing and leading
controls and spaces.
@asajeffrey asajeffrey self-assigned this Apr 25, 2016
@asajeffrey
Copy link
Member

asajeffrey commented Apr 26, 2016

This seems like a lot of extra complexity to handle an uncommon case (URLs with tabs or spaces). It seems like there should be a simpler way to do this, something like making Input implement a peekable and perhaps cloneable iterator interface.

Am I right that one of the goals is to keep this as a one-pass algorithm, which is why the obvious brute force scan of the input for tabs and spacing before parsing (and filtering/collecting it if any are found) isn't acceptable?


Reviewed 6 of 6 files at r1.
Review status: all files reviewed at latest revision, 8 unresolved discussions.


src/lib.rs, line 594 [r1] (raw file):
It's a pity this (and the other functions like it) isn't just input.chars().filter(is_url_char), possibly with some stuff about stripping off leading/trailing characters.


src/parser.rs, line 130 [r1] (raw file):
I'm a bit surprised we need this many helper methods. Does the URL parser need that many methods that aren't in some mix of Peekable<Iterator<char>>? and IntoIterator?


src/parser.rs, line 326 [r1] (raw file):
Why this change?


src/parser.rs, line 340 [r1] (raw file):
This is doing a lot more allocation than the original.


src/parser.rs, line 351 [r1] (raw file):
The code seems more complex now, is there a way to keep the simplicity of the original?


src/parser.rs, line 621 [r1] (raw file):
More allocation.


src/parser.rs, line 678 [r1] (raw file):
The new code is more complex.


src/parser.rs, line 745 [r1] (raw file):
And more complexity.


Comments from Reviewable

@SimonSapin SimonSapin force-pushed the input-iterator branch from e55d437 to 22e8027 Apr 26, 2016
@SimonSapin
Copy link
Member Author

SimonSapin commented Apr 26, 2016

The complexity was already there, just (inconsistently) spread in various parts of the parser that used to each have code to ignore these characters. But with this spec change I think it would be unreasonable complex to ignore them manually everywhere, for example between the slashes of http://.

something like making Input implement a peekable and perhaps cloneable iterator interface.

That’s kinda what this PR is all about? (iterator.clone().next() is the same as peek when clone is cheap.)

Am I right that one of the goals is to keep this as a one-pass algorithm,

Yes. Or maybe filtering and collecting into a String is acceptable, but it seemed undesirable. I don’t know.


Review status: 5 of 6 files reviewed at latest revision, 8 unresolved discussions.


src/lib.rs, line 594 [r1] (raw file):
That’s… exactly what Input is?


src/parser.rs, line 130 [r1] (raw file):
Sure, you can always manually inline all these helpers instead of having them in methods. But that’s overall more code, mostly duplicated?


src/parser.rs, line 326 [r1] (raw file):
str::find returns a byte index that can be used for indexing. Input supports neither indexing nor .len().


src/parser.rs, line 340 [r1] (raw file):
Only for users that want the full details of "syntax violations", which so far no-one has ever used as far as I know.


src/parser.rs, line 351 [r1] (raw file):
If this comment is about this specific line, I don’t see how it is more complex. If it is about this PR in general… do you have a more concrete suggestion?


src/parser.rs, line 621 [r1] (raw file):
Still only when logging "syntax violations".


src/parser.rs, line 678 [r1] (raw file):
¯_(ツ)_/¯

This PR is not just an internal refactoring. It’s about implementing a change (linked from the PR message) in the specification’s requirements.

If you look at individual commits, the first one "Parse based on iterators" is a refactoring that doesn’t change the behavior, but it only exists to enable the behavior change in the second commit "Systematically ignore tabs and newlines".


src/parser.rs, line 745 [r1] (raw file):
See previous comment.


Comments from Reviewable

@SimonSapin SimonSapin force-pushed the input-iterator branch from 22e8027 to cd7d83b Apr 26, 2016
@asajeffrey
Copy link
Member

asajeffrey commented Apr 26, 2016

@asajeffrey
Copy link
Member

asajeffrey commented Apr 26, 2016

I think at this point we're into style differences. You can r=me. Squash the commits?


Reviewed 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


src/parser.rs, line 107 [r2] (raw file):
Much nicer!


Comments from Reviewable

@SimonSapin
Copy link
Member Author

SimonSapin commented Apr 26, 2016

Commits are as intended.

@bors-servo r+

@bors-servo
Copy link
Contributor

bors-servo commented Apr 26, 2016

📌 Commit cd7d83b has been approved by SimonSapin

@bors-servo
Copy link
Contributor

bors-servo commented Apr 26, 2016

Testing commit cd7d83b with merge 28d691c...

bors-servo added a commit that referenced this pull request Apr 26, 2016
Systematically ignore tabs and newlines, per spec change

whatwg/url@7b40216
whatwg/url#101

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-url/190)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Apr 26, 2016

☀️ Test successful - travis

@bors-servo bors-servo merged commit cd7d83b into master Apr 26, 2016
5 of 6 checks passed
5 of 6 checks passed
code-review/reviewable Review in progress: all files reviewed, 1 unresolved discussion
Details
continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
homu Test successful
Details
@SimonSapin SimonSapin deleted the input-iterator branch Apr 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.