Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Charrefs should not be interpreted for url attributes like src, href, action. #84
When parsing a html like this:
The charref is automatically interpreted, but this is not what you want for url attributes. The problem is that this makes it impossible to turn into html again in the right way because both all ampersands would be escaped. The resulting url will be wrong. (Note that parsing a correct href with a query string also results in a broken url.)
I think the way to fix bug this is to not interpret charrefs in url attributes and reading them as is. When the tree is converted back into html again they should not be escaped and written as-is.
What do you think? I would gladly provide you with a pull-request with tests if you agree.