Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make tokenizer not own the input stream #226

Merged
merged 5 commits into from Oct 26, 2016
Merged

Make tokenizer not own the input stream #226

merged 5 commits into from Oct 26, 2016

Conversation

@nox
Copy link
Member

nox commented Oct 26, 2016

This change is Reviewable

@SimonSapin
Copy link
Member

SimonSapin commented Oct 26, 2016

Reviewed 2 of 2 files at r1, 6 of 6 files at r2, 1 of 1 files at r3, 9 of 9 files at r4, 1 of 1 files at r5.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


Cargo.toml, line 27 at r4 (raw file):

[[test]]
name = "serializer"

Why disable this test?


src/tokenizer/buffer_queue.rs, line 121 at r4 (raw file):

    // If so, consume them and return Ok(true).
    // If they do not match, return Ok(false) and don't consume anything.
    // If a partial match is found, return Err(length).

It looks like lenght in the Err case is counted in UTF-8 bytes. Please document it so.

Nit: in the previous doc-comment "not enough characters are available to know" says this happens when the end of the BufferQueue is reached (and more input may be fed later). Here "partial match" sounds like it includes input that has a any common prefix with pat. Maybe rephrase this sentence?


src/tokenizer/mod.rs, line 320 at r4 (raw file):

            Err(length) => {
                for _ in 0..length {
                    self.temp_buf.push_char(input.next().unwrap());

This is using next() (which consumes a code point) and push_char, while length is counted in UTF-8 bytes. Is this a bug in non-ASCII cases?


Comments from Reviewable

@nox nox force-pushed the nox:write branch from 0dfd054 to 4aeea9c Oct 26, 2016
@nox
Copy link
Member Author

nox commented Oct 26, 2016

Review status: 13 of 15 files reviewed at latest revision, 3 unresolved discussions.


Cargo.toml, line 27 at r4 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Why disable this test?

It's not part of my changes AFAIK. And it's not really disabled, I don't know what that harness thing is.

src/tokenizer/buffer_queue.rs, line 121 at r4 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

It looks like lenght in the Err case is counted in UTF-8 bytes. Please document it so.

Nit: in the previous doc-comment "not enough characters are available to know" says this happens when the end of the BufferQueue is reached (and more input may be fed later). Here "partial match" sounds like it includes input that has a any common prefix with pat. Maybe rephrase this sentence?

Done.

src/tokenizer/mod.rs, line 320 at r4 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

This is using next() (which consumes a code point) and push_char, while length is counted in UTF-8 bytes. Is this a bug in non-ASCII cases?

Nope, BufferQueue::eat panics in case of non-ASCII chars. I added a comment there.

Comments from Reviewable

@nox nox force-pushed the nox:write branch from 4aeea9c to 3adcddd Oct 26, 2016
@SimonSapin
Copy link
Member

SimonSapin commented Oct 26, 2016

Reviewed 1 of 2 files at r6, 1 of 1 files at r7.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


src/tokenizer/buffer_queue.rs, line 121 at r4 (raw file):

Previously, nox (Anthony Ramine) wrote…

Done.

I meant that input = "foobar" is a partial match for pat = "foofoo", but not "not enough characters available to know".

Comments from Reviewable

@nox nox force-pushed the nox:write branch from 3adcddd to 6f6338f Oct 26, 2016
nox added 2 commits Oct 26, 2016
This is the first step towards supporting document.write.
@nox nox force-pushed the nox:write branch from 6f6338f to ceb1bd3 Oct 26, 2016
@SimonSapin
Copy link
Member

SimonSapin commented Oct 26, 2016

@bors-servo r+


Reviewed 2 of 3 files at r8, 1 of 1 files at r9.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


Comments from Reviewable

@bors-servo
Copy link
Contributor

bors-servo commented Oct 26, 2016

📌 Commit ceb1bd3 has been approved by SimonSapin

@bors-servo
Copy link
Contributor

bors-servo commented Oct 26, 2016

Testing commit ceb1bd3 with merge c56d8e5...

bors-servo added a commit that referenced this pull request Oct 26, 2016
Make tokenizer not own the input stream

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/html5ever/226)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Oct 26, 2016

☀️ Test successful - status-travis

@bors-servo bors-servo merged commit ceb1bd3 into servo:master Oct 26, 2016
2 of 3 checks passed
2 of 3 checks passed
code-review/reviewable 1 discussion left (nox, SimonSapin)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
@nox nox deleted the nox:write branch Oct 26, 2016
Ygg01 added a commit to Ygg01/html5ever that referenced this pull request Apr 29, 2017
- Moves `TreeSink` into markup5ever, and applies to xhtml5ever.
- Moves `TokenSink` into markup5ever, but doesn't apply, see discussion
in servo#226
- Renames `QName` from xml5ever to `QualName`.
- Adds `prefix` field to `QualName`.
- Moves `Attributes` into markup5ever
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.