Skip to content

html: Duplicate input provided by Tokenizer.Raw when tokenization errors occur #7029

@CSEMike

Description

@CSEMike
Tokenizer.Raw is intended to provide the unmodified text of the current token --
http://godoc.org/code.google.com/p/go.net/html#Tokenizer.Raw

But, when a tokenization error occurs, the raw text may be duplicated. For example:

    z := NewTokenizer(strings.NewReader("foo  bar"))
    tt := z.Next()
    fmt.Printf("%v: %q\n", tt, string(z.Raw()))
    tt = z.Next()
    fmt.Printf("%v: %q\n", tt, string(z.Raw()))

duplicates "foo  bar" in both a text and error token:

    Text: "foo  bar"
    Error: "foo  bar"

The concatenated results of z.Raw() should reproduce the original input without
duplication.

The fix is adjust the way Next updates the raw and data spans. I'll send a change for
this shortly.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions