Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upAdded current_line variable to Tokenizer and test to check that curre… #240
Conversation
|
Great work! I've noted some code style nits, as well as some changes that should improve correctness. |
| @@ -200,6 +203,7 @@ impl<Sink: TokenSink> Tokenizer<Sink> { | |||
| temp_buf: StrTendril::new(), | |||
| state_profile: BTreeMap::new(), | |||
| time_in_sink: 0, | |||
| current_line: 1, | |||
This comment has been minimized.
This comment has been minimized.
|
|
||
| pub use super::interface::{Token, DoctypeToken, TagToken, CommentToken}; | ||
| pub use super::interface::{CharacterTokens, NullCharacterToken, EOFToken, ParseError}; | ||
| pub use super::interface::{Doctype, Attribute, TagKind, StartTag, EndTag, Tag}; |
This comment has been minimized.
This comment has been minimized.
| struct TokenMatch { | ||
| tokens: Vec<Token>, | ||
| current_str: StrTendril, | ||
| exact_errors: bool, |
This comment has been minimized.
This comment has been minimized.
| // TokenMatch implements the TokenSink trait. It is used for testing to see | ||
| // if current_line is being updated when process_token is called. The lines | ||
| // vector is a collection of the line numbers that each token is on. | ||
| struct TokenMatch { |
This comment has been minimized.
This comment has been minimized.
| #[test] | ||
| fn check_lines() { | ||
|
|
||
| let opts = TokenizerOpts{exact_errors: false, discard_bom: true, profile: false, |
This comment has been minimized.
This comment has been minimized.
| fn check_lines() { | ||
|
|
||
| let opts = TokenizerOpts{exact_errors: false, discard_bom: true, profile: false, | ||
| initial_state: None, last_start_tag_name: None,}; |
This comment has been minimized.
This comment has been minimized.
| @@ -270,6 +274,7 @@ impl<Sink: TokenSink> Tokenizer<Sink> { | |||
| if c == '\r' { | |||
| self.ignore_lf = true; | |||
| c = '\n'; | |||
| self.current_line += 1; | |||
This comment has been minimized.
This comment has been minimized.
jdm
Dec 9, 2016
Member
Let's make a separate block after this instead:
if c == '\r' {
self.ignore_lf = true;
c = '\n';
}
if c == '\n' {
self.current_line += 1;
}This means that input that only contains \n tokens will correctly count lines, and code that contains \r\n tokens (which is what the ignore_lf checks are for) will also count lines correctly.
| initial_state: None, last_start_tag_name: None,}; | ||
|
|
||
| let vector = vec![StrTendril::from("<a>\r"), StrTendril::from("<b>\r"), | ||
| StrTendril::from("</b>\r"), StrTendril::from("</a>\r")]; |
This comment has been minimized.
This comment has been minimized.
jdm
Dec 9, 2016
Member
We should run this test twice - once using \n separators, and once using \r\n.
|
|
||
| let vector = vec![StrTendril::from("<a>\r"), StrTendril::from("<b>\r"), | ||
| StrTendril::from("</b>\r"), StrTendril::from("</a>\r")]; | ||
| let expected = vec![1, 2, 2, 3, 3, 4, 4, 5, 5]; |
This comment has been minimized.
This comment has been minimized.
jdm
Dec 9, 2016
Member
We'll need to update this to contain pairs of tokens and the corresponding line number.
|
I've mostly requested small changes; we'll just need to make sure that the tests are verifying what we want, and that we're processing \n characters as expected. |
| @@ -29,7 +29,7 @@ struct Sink; | |||
| impl TokenSink for Sink { | |||
| type Handle = (); | |||
|
|
|||
| fn process_token(&mut self, token: Token) -> TokenSinkResult<()> { | |||
| fn process_token(&mut self, token: Token, line_number: u64) -> TokenSinkResult<()> { | |||
This comment has been minimized.
This comment has been minimized.
jdm
Dec 22, 2016
Member
Let's use _line_number here and elsewhere to avoid warnings about unused variables in code that never makes use of it.
| @@ -161,6 +161,9 @@ pub struct RcDom { | |||
|
|
|||
| /// The document's quirks mode. | |||
| pub quirks_mode: QuirksMode, | |||
|
|
|||
| /// The current line being parsed | |||
| pub current_line: u64, | |||
This comment has been minimized.
This comment has been minimized.
| @@ -326,6 +329,10 @@ impl TreeSink for RcDom { | |||
| _ => unreachable!(), | |||
| } | |||
| } | |||
|
|
|||
| fn set_current_line(&mut self, line_number: u64) { | |||
| self.current_line = line_number; | |||
This comment has been minimized.
This comment has been minimized.
jdm
Dec 22, 2016
Member
This can be an empty method instead (and use _line_number to avoid warnings).
|
|
||
| fn push(&mut self, token: Token, line_number: u64) { | ||
| self.finish_str(); | ||
| // self.tokens.push(token); |
This comment has been minimized.
This comment has been minimized.
|
|
||
| use super::buffer_queue::{BufferQueue}; | ||
| use std::mem::replace; | ||
| use std::borrow::Cow::{self, Borrowed}; |
This comment has been minimized.
This comment has been minimized.
| pub struct LineCountingDOM { | ||
| pub line_vec: Vec<(QualName, u64)>, | ||
| pub rcdom: RcDom, | ||
| } |
This comment has been minimized.
This comment has been minimized.
| } | ||
| } | ||
|
|
||
| impl Default for LineCountingDOM { |
This comment has been minimized.
This comment has been minimized.
jdm
Dec 22, 2016
Member
We don't need this implementation; we can write out the initializer by hand instead of using LineCountingDOM::default().
| resultTok.process(StrTendril::from("<a>")); | ||
| resultTok.process(StrTendril::from("</a>")); | ||
| resultTok.process(StrTendril::from("<b>")); | ||
| resultTok.process(StrTendril::from("</b>")); |
This comment has been minimized.
This comment has been minimized.
jdm
Dec 22, 2016
Member
We're going to want some newlines in this test, or it's not actually doing much good ;)
| (qualname!(html, "head"), 1), | ||
| (qualname!(html, "body"), 1), | ||
| (qualname!(html, "a"), 1), | ||
| (qualname!(html, "b"), 1)]; |
This comment has been minimized.
This comment has been minimized.
jdm
Dec 22, 2016
Member
Our expected output will want to ensure that at least some elements are associated with different lines.
| (qualname!(html, "body"), 1), | ||
| (qualname!(html, "a"), 1), | ||
| (qualname!(html, "b"), 1)]; | ||
| let result = actual.line_vec.clone(); |
This comment has been minimized.
This comment has been minimized.
…all in step function to account to make sure current_line is updated.
Track current line when parsing Rebased from #240. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/html5ever/243) <!-- Reviewable:end -->
|
Everything here looked good to me, so I squashed the commits in #243 and merged them there. Thanks for working on this! |
karenher commentedDec 1, 2016
•
edited by larsbergstrom
…nt_line is correctly updated
This change is