Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When 2 tags with only a space between them, space is removed #14

Closed
leighhalliday opened this issue Jun 23, 2015 · 1 comment
Closed

Comments

@leighhalliday
Copy link

Example html:
<p>I want <a href="http://www.google.com">link to</a> <strong>article</strong></p>

When run through floki:
[{"p", [],
["I want ", {"a", [{"href", "http://www.google.com"}], ["link to"]},
{"strong", [], ["article"]}]}]}]

Maybe this is the intended result... which is perfectly fine. I was hoping to use floki for a weird use-case... basically run HTML through floki, parse through the floki response, removing tags/attributes that aren't allowed, and then re-create html. Essentially using floki to help me cleanse user inputted html to remove tags that aren't allowed and/or malicious javascript attributes (xss attacks).

Because of how I wanted to use it, the space between </a> and <strong> is actually important, otherwise "link to" and "article" will be touching each other.

Feel free to close the issue if it was intended.

@philss
Copy link
Owner

philss commented Jun 23, 2015

Hi @leighhalliday, great to know that your are using Floki! :)

This is an important issue, but it's not easy to solve. I'm using the mochiweb lib which is the responsible for parsing HTML inside Floki, so the problem is in that library. It's a great parser, but has some issues like this. Unfortunally I don't have much control over the parser.

I'm planning to write and release a complete new HTML parser in Elixir in some point of this year. But for now I can't help you :/.

I'm closing the issue for now, but I will revisit in the future when I have that parser.
Thanks!

@philss philss closed this as completed Jun 23, 2015
@philss philss mentioned this issue Oct 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants