Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: golang.org/x/net/html: Add Tokenizer as Option to html.ParseWithOptions() #68871

Open
typomania opened this issue Aug 14, 2024 · 2 comments
Labels
Milestone

Comments

@typomania
Copy link

Proposal Details

Issue

I was working with parsing HTML data, and I ran into an issue. When something like
<![CDATA[ .... ]]>
came up, the default html.Parse() method will parse that as an html comment. Calling html.Render() yields:
<!--[CDATA[ .... ]]-->
creating a comment that isn't useful.
String manipulation can be used to get around this, but I know that under the hood the html.Tokenizer has the method:
func (z *Tokenizer) AllowCDATA(allowCDATA bool)
which sets the tokenizer to process this properly.

Proposed Solution

I propose that the user be allowed to set the html.Tokenizer used by the html.Parse()/html.ParseWithOptions() methods. Under the hood, ParseWithOptions() creates a new html.Tokenizer based on the io.Reader passed to it. If the user was also allowed to pass the Tokenizer used by the parser, the user could then set those options as appropriate/necessary, avoiding the above problem.

This could be solved by adding an html.ParseOption. Namely:

func ParseOptionWithTokenizer(tokenizer *Tokenizer) ParseOption {
	return func(p *parser) {
		p.tokenizer = tokenizer
	}
}

and this would be called like:

tokenizer := html.NewTokenizer(myReader)
tokenizer.AllowCDATA(true)

html.ParseWithOptions(myReader, html.ParseOptionWithTokenizer(tokenizer))

The name of the method can be changed as well to whatever makes more sense.

@gopherbot gopherbot added this to the Proposal milestone Aug 14, 2024
@gabyhelp
Copy link

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@ianlancetaylor
Copy link
Contributor

CC @neild @bradfitz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

4 participants