Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The sanitization method changes the tag structure if there is a <table> tag inside an <a> tag. #155

Closed
naitoh opened this issue May 9, 2023 · 3 comments

Comments

@naitoh
Copy link

naitoh commented May 9, 2023

Description

In the sanitize method, if there is <table> tag inside <a> tag, the result will be different than expected.

Steps to Reproduce

$ ruby -v
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]
$ gem list rails-html-sanitizer loofah nokogiri crass

*** LOCAL GEMS ***

rails-html-sanitizer (1.5.0)
loofah (2.20.0)
nokogiri (1.14.2 arm64-darwin
crass (1.0.6)

No problem case

> Rails::Html::SafeListSanitizer.new.sanitize('<a href="https://example.com"><li>test</li></a>', tags: %w(a li), attributes: %w(href))
=> "<a href=\"https://example.com\"><li>test</li></a>"
> Rails::Html::SafeListSanitizer.new.sanitize('<a href="https://example.com"><dummy>test</dummy></a>', tags: %w(a dummy), attributes: %w(href))
=> "<a href=\"https://example.com\"><dummy>test</dummy></a>"

Problem case

> Rails::Html::SafeListSanitizer.new.sanitize('<a href="https://example.com"><table>test</table></a>', tags: %w(a table), attributes: %w(href))
=> "<a href=\"https://example.com\"></a><table>test</table>"

I would expect <a href=\"https://example.com\"><table>test</table></a> response.

But it may be a problem with the behavior of libxml2 (Nokogiri's HTML4 parser)....

> puts Nokogiri::HTML4('<a href="https://example.com"><table>test</table></a>').to_html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<a href="https://example.com"></a><table>test</table>
</body></html>

🙅

> puts Nokogiri::HTML5('<a href="https://example.com"><table>test</table></a>').to_html
<html><head></head><body><a href="https://example.com">test<table></table></a></body></html>

👌

@flavorjones
Copy link
Member

Hi, thanks for asking this question. As you diagnosed, you're seeing the behavior of the HTML4 parser used by Nokogiri (libxml2).

Nokogiri::HTML4::DocumentFragment.parse('<a href="https://example.com"><table>test</table></a>').to_html
# => "<a href=\"https://example.com\"></a><table>test</table>"
Nokogiri::HTML5::DocumentFragment.parse('<a href="https://example.com"><table>test</table></a>').to_html
# => "<a href=\"https://example.com\">test<table></table></a>"

Nokogiri just wraps the parser, and so there's nothing we can easily do to change this behavior.

Upgrading the full stack of rails-html-sanitizer, Loofah, and Nokogiri to support HTML5 has been a long road. Loofah was just this week released with HTML5 support, and now I'm working on updating rails-html-sanitizer. This is what I've been working on for the last few days.

I just closed (this morning) the previous PR #133 which was an exploration of behavioral differences, because I'm very close to shipping a new PR with the necessary API and code changes. Hang tight!

@flavorjones flavorjones closed this as not planned Won't fix, can't repro, duplicate, stale May 11, 2023
@flavorjones
Copy link
Member

flavorjones commented May 12, 2023

See #158 for the latest on HTML5 support. On that branch:

Rails::Html::SafeListSanitizer.new.sanitize('<a href="https://example.com"><table>test</table></a>', tags: %w(a tab
le), attributes: %w(href))
# => "<a href=\"https://example.com\"></a><table>test</table>"
Rails::HTML5::SafeListSanitizer.new.sanitize('<a href="https://example.com"><table>test</table></a>', tags: %w(a ta
ble), attributes: %w(href))
# => "<a href=\"https://example.com\">test<table></table></a>"

@naitoh
Copy link
Author

naitoh commented May 14, 2023

Thank you for your comment.
I will wait for the release of a version where Rails::HTML5::SafeListSanitizer can be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants