Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make tokenizer not own the input stream #226

Merged
merged 5 commits into from
Oct 26, 2016
Merged

Make tokenizer not own the input stream #226

merged 5 commits into from
Oct 26, 2016

Conversation

nox
Copy link
Contributor

@nox nox commented Oct 26, 2016

This change is Reviewable

@SimonSapin
Copy link
Member

Reviewed 2 of 2 files at r1, 6 of 6 files at r2, 1 of 1 files at r3, 9 of 9 files at r4, 1 of 1 files at r5.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


Cargo.toml, line 27 at r4 (raw file):

[[test]]
name = "serializer"

Why disable this test?


src/tokenizer/buffer_queue.rs, line 121 at r4 (raw file):

    // If so, consume them and return Ok(true).
    // If they do not match, return Ok(false) and don't consume anything.
    // If a partial match is found, return Err(length).

It looks like lenght in the Err case is counted in UTF-8 bytes. Please document it so.

Nit: in the previous doc-comment "not enough characters are available to know" says this happens when the end of the BufferQueue is reached (and more input may be fed later). Here "partial match" sounds like it includes input that has a any common prefix with pat. Maybe rephrase this sentence?


src/tokenizer/mod.rs, line 320 at r4 (raw file):

            Err(length) => {
                for _ in 0..length {
                    self.temp_buf.push_char(input.next().unwrap());

This is using next() (which consumes a code point) and push_char, while length is counted in UTF-8 bytes. Is this a bug in non-ASCII cases?


Comments from Reviewable

@nox
Copy link
Contributor Author

nox commented Oct 26, 2016

Review status: 13 of 15 files reviewed at latest revision, 3 unresolved discussions.


Cargo.toml, line 27 at r4 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Why disable this test?

It's not part of my changes AFAIK. And it's not really disabled, I don't know what that harness thing is.

src/tokenizer/buffer_queue.rs, line 121 at r4 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

It looks like lenght in the Err case is counted in UTF-8 bytes. Please document it so.

Nit: in the previous doc-comment "not enough characters are available to know" says this happens when the end of the BufferQueue is reached (and more input may be fed later). Here "partial match" sounds like it includes input that has a any common prefix with pat. Maybe rephrase this sentence?

Done.

src/tokenizer/mod.rs, line 320 at r4 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

This is using next() (which consumes a code point) and push_char, while length is counted in UTF-8 bytes. Is this a bug in non-ASCII cases?

Nope, BufferQueue::eat panics in case of non-ASCII chars. I added a comment there.

Comments from Reviewable

@SimonSapin
Copy link
Member

Reviewed 1 of 2 files at r6, 1 of 1 files at r7.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


src/tokenizer/buffer_queue.rs, line 121 at r4 (raw file):

Previously, nox (Anthony Ramine) wrote…

Done.

I meant that input = "foobar" is a partial match for pat = "foofoo", but not "not enough characters available to know".

Comments from Reviewable

This is the first step towards supporting document.write.
@SimonSapin
Copy link
Member

@bors-servo r+


Reviewed 2 of 3 files at r8, 1 of 1 files at r9.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


Comments from Reviewable

@bors-servo
Copy link
Contributor

📌 Commit ceb1bd3 has been approved by SimonSapin

@bors-servo
Copy link
Contributor

⌛ Testing commit ceb1bd3 with merge c56d8e5...

bors-servo pushed a commit that referenced this pull request Oct 26, 2016
Make tokenizer not own the input stream

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/html5ever/226)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

☀️ Test successful - status-travis

@bors-servo bors-servo merged commit ceb1bd3 into servo:master Oct 26, 2016
@nox nox deleted the write branch October 26, 2016 10:53
Ygg01 added a commit to Ygg01/html5ever that referenced this pull request Apr 29, 2017
- Moves `TreeSink` into markup5ever, and applies to xhtml5ever.
- Moves `TokenSink` into markup5ever, but doesn't apply, see discussion
in servo#226
- Renames `QName` from xml5ever to `QualName`.
- Adds `prefix` field to `QualName`.
- Moves `Attributes` into markup5ever
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants