Make tokenizer not own the input stream #226

nox · 2016-10-26T08:21:29Z

This change is

SimonSapin · 2016-10-26T09:00:48Z

Reviewed 2 of 2 files at r1, 6 of 6 files at r2, 1 of 1 files at r3, 9 of 9 files at r4, 1 of 1 files at r5.
Review status: all files reviewed at latest revision, 3 unresolved discussions.

Cargo.toml, line 27 at r4 (raw file):

[[test]]
name = "serializer"

Why disable this test?

src/tokenizer/buffer_queue.rs, line 121 at r4 (raw file):

    // If so, consume them and return Ok(true).
    // If they do not match, return Ok(false) and don't consume anything.
    // If a partial match is found, return Err(length).

It looks like lenght in the Err case is counted in UTF-8 bytes. Please document it so.

Nit: in the previous doc-comment "not enough characters are available to know" says this happens when the end of the BufferQueue is reached (and more input may be fed later). Here "partial match" sounds like it includes input that has a any common prefix with pat. Maybe rephrase this sentence?

src/tokenizer/mod.rs, line 320 at r4 (raw file):

            Err(length) => {
                for _ in 0..length {
                    self.temp_buf.push_char(input.next().unwrap());

This is using next() (which consumes a code point) and push_char, while length is counted in UTF-8 bytes. Is this a bug in non-ASCII cases?

Comments from Reviewable

nox · 2016-10-26T09:50:19Z

Review status: 13 of 15 files reviewed at latest revision, 3 unresolved discussions.

Cargo.toml, line 27 at r4 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Why disable this test?

It's not part of my changes AFAIK. And it's not really disabled, I don't know what that harness thing is.

src/tokenizer/buffer_queue.rs, line 121 at r4 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

It looks like lenght in the Err case is counted in UTF-8 bytes. Please document it so.

Nit: in the previous doc-comment "not enough characters are available to know" says this happens when the end of the BufferQueue is reached (and more input may be fed later). Here "partial match" sounds like it includes input that has a any common prefix with pat. Maybe rephrase this sentence?

Done.

src/tokenizer/mod.rs, line 320 at r4 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

This is using next() (which consumes a code point) and push_char, while length is counted in UTF-8 bytes. Is this a bug in non-ASCII cases?

Nope, BufferQueue::eat panics in case of non-ASCII chars. I added a comment there.

Comments from Reviewable

SimonSapin · 2016-10-26T09:57:01Z

Reviewed 1 of 2 files at r6, 1 of 1 files at r7.
Review status: all files reviewed at latest revision, 1 unresolved discussion.

src/tokenizer/buffer_queue.rs, line 121 at r4 (raw file):

Previously, nox (Anthony Ramine) wrote…

Done.

I meant that input = "foobar" is a partial match for pat = "foofoo", but not "not enough characters available to know".

Comments from Reviewable

This is the first step towards supporting document.write.

SimonSapin · 2016-10-26T10:10:04Z

@bors-servo r+

Reviewed 2 of 3 files at r8, 1 of 1 files at r9.
Review status: all files reviewed at latest revision, 1 unresolved discussion.

Comments from Reviewable

bors-servo · 2016-10-26T10:10:05Z

📌 Commit ceb1bd3 has been approved by SimonSapin

bors-servo · 2016-10-26T10:10:07Z

⌛ Testing commit ceb1bd3 with merge c56d8e5...

Make tokenizer not own the input stream  This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/html5ever/226)

bors-servo · 2016-10-26T10:17:18Z

☀️ Test successful - status-travis

- Moves `TreeSink` into markup5ever, and applies to xhtml5ever. - Moves `TokenSink` into markup5ever, but doesn't apply, see discussion in servo#226 - Renames `QName` from xml5ever to `QualName`. - Adds `prefix` field to `QualName`. - Moves `Attributes` into markup5ever

nox added 3 commits October 24, 2016 10:08

Silence two warnings in examples

43f0984

Use a saner, smaller enum for query_state_change

b502a3e

Remove a useless arm in BeforeAttributeValue

856fcb2

nox force-pushed the write branch from 0dfd054 to 4aeea9c Compare October 26, 2016 09:50

nox force-pushed the write branch from 4aeea9c to 3adcddd Compare October 26, 2016 09:53

nox force-pushed the write branch from 3adcddd to 6f6338f Compare October 26, 2016 10:04

nox added 2 commits October 26, 2016 12:07

Make tokenizer not own the input stream

7fec1da

This is the first step towards supporting document.write.

Bump version to 0.8.0

ceb1bd3

nox force-pushed the write branch from 6f6338f to ceb1bd3 Compare October 26, 2016 10:07

bors-servo merged commit ceb1bd3 into servo:master Oct 26, 2016

nox deleted the write branch October 26, 2016 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make tokenizer not own the input stream #226

Make tokenizer not own the input stream #226

nox commented Oct 26, 2016 •

edited by larsbergstrom

Loading

SimonSapin commented Oct 26, 2016

nox commented Oct 26, 2016

SimonSapin commented Oct 26, 2016

SimonSapin commented Oct 26, 2016

bors-servo commented Oct 26, 2016

bors-servo commented Oct 26, 2016

bors-servo commented Oct 26, 2016

Make tokenizer not own the input stream #226

Make tokenizer not own the input stream #226

Conversation

nox commented Oct 26, 2016 • edited by larsbergstrom Loading

SimonSapin commented Oct 26, 2016

nox commented Oct 26, 2016

SimonSapin commented Oct 26, 2016

SimonSapin commented Oct 26, 2016

bors-servo commented Oct 26, 2016

bors-servo commented Oct 26, 2016

bors-servo commented Oct 26, 2016

nox commented Oct 26, 2016 •

edited by larsbergstrom

Loading