Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added current_line variable to Tokenizer and test to check that curre… #240

Closed
wants to merge 5 commits into from

Conversation

@karenher
Copy link
Contributor

karenher commented Dec 1, 2016

…nt_line is correctly updated


This change is Reviewable

karenher added 2 commits Dec 1, 2016
…nt_line is correctly updated
Copy link
Member

jdm left a comment

Great work! I've noted some code style nits, as well as some changes that should improve correctness.

@@ -200,6 +203,7 @@ impl<Sink: TokenSink> Tokenizer<Sink> {
temp_buf: StrTendril::new(),
state_profile: BTreeMap::new(),
time_in_sink: 0,
current_line: 1,

This comment has been minimized.

@jdm

jdm Dec 9, 2016

Member

nit: replace the tabs in this line with spaces.





This comment has been minimized.

@jdm

jdm Dec 9, 2016

Member

nit: remove these new blank lines.


pub use super::interface::{Token, DoctypeToken, TagToken, CommentToken};
pub use super::interface::{CharacterTokens, NullCharacterToken, EOFToken, ParseError};
pub use super::interface::{Doctype, Attribute, TagKind, StartTag, EndTag, Tag};

This comment has been minimized.

@jdm

jdm Dec 9, 2016

Member

The pub in these lines isn't necessary.

struct TokenMatch {
tokens: Vec<Token>,
current_str: StrTendril,
exact_errors: bool,

This comment has been minimized.

@jdm

jdm Dec 9, 2016

Member

We don't need the tokens or exact_errors members for this test.

// TokenMatch implements the TokenSink trait. It is used for testing to see
// if current_line is being updated when process_token is called. The lines
// vector is a collection of the line numbers that each token is on.
struct TokenMatch {

This comment has been minimized.

@jdm

jdm Dec 9, 2016

Member

Let's call this LinesMatch instead, since that's what it's testing.

#[test]
fn check_lines() {

let opts = TokenizerOpts{exact_errors: false, discard_bom: true, profile: false,

This comment has been minimized.

@jdm

jdm Dec 9, 2016

Member

nit: space before and after {

fn check_lines() {

let opts = TokenizerOpts{exact_errors: false, discard_bom: true, profile: false,
initial_state: None, last_start_tag_name: None,};

This comment has been minimized.

@jdm

jdm Dec 9, 2016

Member

nit: move }; to the next line, like:

let opts = TokenizerOpts { ...,
    ...,
};
@@ -270,6 +274,7 @@ impl<Sink: TokenSink> Tokenizer<Sink> {
if c == '\r' {
self.ignore_lf = true;
c = '\n';
self.current_line += 1;

This comment has been minimized.

@jdm

jdm Dec 9, 2016

Member

Let's make a separate block after this instead:

if c == '\r' {
    self.ignore_lf = true;
    c = '\n';
}

if c == '\n' {
    self.current_line += 1;
}

This means that input that only contains \n tokens will correctly count lines, and code that contains \r\n tokens (which is what the ignore_lf checks are for) will also count lines correctly.

initial_state: None, last_start_tag_name: None,};

let vector = vec![StrTendril::from("<a>\r"), StrTendril::from("<b>\r"),
StrTendril::from("</b>\r"), StrTendril::from("</a>\r")];

This comment has been minimized.

@jdm

jdm Dec 9, 2016

Member

We should run this test twice - once using \n separators, and once using \r\n.


let vector = vec![StrTendril::from("<a>\r"), StrTendril::from("<b>\r"),
StrTendril::from("</b>\r"), StrTendril::from("</a>\r")];
let expected = vec![1, 2, 2, 3, 3, 4, 4, 5, 5];

This comment has been minimized.

@jdm

jdm Dec 9, 2016

Member

We'll need to update this to contain pairs of tokens and the corresponding line number.

Copy link
Member

jdm left a comment

I've mostly requested small changes; we'll just need to make sure that the tests are verifying what we want, and that we're processing \n characters as expected.

@@ -29,7 +29,7 @@ struct Sink;
impl TokenSink for Sink {
type Handle = ();

fn process_token(&mut self, token: Token) -> TokenSinkResult<()> {
fn process_token(&mut self, token: Token, line_number: u64) -> TokenSinkResult<()> {

This comment has been minimized.

@jdm

jdm Dec 22, 2016

Member

Let's use _line_number here and elsewhere to avoid warnings about unused variables in code that never makes use of it.

@@ -161,6 +161,9 @@ pub struct RcDom {

/// The document's quirks mode.
pub quirks_mode: QuirksMode,

/// The current line being parsed
pub current_line: u64,

This comment has been minimized.

@jdm

jdm Dec 22, 2016

Member

Let's remove this, since no RcDom code actually uses it.

@@ -326,6 +329,10 @@ impl TreeSink for RcDom {
_ => unreachable!(),
}
}

fn set_current_line(&mut self, line_number: u64) {
self.current_line = line_number;

This comment has been minimized.

@jdm

jdm Dec 22, 2016

Member

This can be an empty method instead (and use _line_number to avoid warnings).


fn push(&mut self, token: Token, line_number: u64) {
self.finish_str();
// self.tokens.push(token);

This comment has been minimized.

@jdm

jdm Dec 22, 2016

Member

Let's remove this.


use super::buffer_queue::{BufferQueue};
use std::mem::replace;
use std::borrow::Cow::{self, Borrowed};

This comment has been minimized.

@jdm

jdm Dec 22, 2016

Member

We don't use Cow anywhere, so this import can be removed.

pub struct LineCountingDOM {
pub line_vec: Vec<(QualName, u64)>,
pub rcdom: RcDom,
}

This comment has been minimized.

@jdm

jdm Dec 22, 2016

Member

Let's store the current line in this structure, rather than RcDom.

}
}

impl Default for LineCountingDOM {

This comment has been minimized.

@jdm

jdm Dec 22, 2016

Member

We don't need this implementation; we can write out the initializer by hand instead of using LineCountingDOM::default().

resultTok.process(StrTendril::from("<a>"));
resultTok.process(StrTendril::from("</a>"));
resultTok.process(StrTendril::from("<b>"));
resultTok.process(StrTendril::from("</b>"));

This comment has been minimized.

@jdm

jdm Dec 22, 2016

Member

We're going to want some newlines in this test, or it's not actually doing much good ;)

(qualname!(html, "head"), 1),
(qualname!(html, "body"), 1),
(qualname!(html, "a"), 1),
(qualname!(html, "b"), 1)];

This comment has been minimized.

@jdm

jdm Dec 22, 2016

Member

Our expected output will want to ensure that at least some elements are associated with different lines.

(qualname!(html, "body"), 1),
(qualname!(html, "a"), 1),
(qualname!(html, "b"), 1)];
let result = actual.line_vec.clone();

This comment has been minimized.

@jdm

jdm Dec 22, 2016

Member

Why is this necessary?

…all in step function to account to make sure current_line is updated.
bors-servo added a commit that referenced this pull request Dec 28, 2016
Track current line when parsing

Rebased from #240.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/html5ever/243)
<!-- Reviewable:end -->
@jdm
Copy link
Member

jdm commented Dec 28, 2016

Everything here looked good to me, so I squashed the commits in #243 and merged them there. Thanks for working on this!

@jdm jdm closed this Dec 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.