Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support html5 parsing #158

Merged
merged 13 commits into from
May 12, 2023
Merged

support html5 parsing #158

merged 13 commits into from
May 12, 2023

Conversation

flavorjones
Copy link
Member

@flavorjones flavorjones commented May 12, 2023

This PR introduces new variations of LinkSanitizer, FullSanitizer, and SafeListSanitizer that use Loofah's (Nokogiri's) HTML5 parsing functionality.

  • Rails::HTML5::FullSanitizer
  • Rails::HTML5::LinkSanitizer
  • Rails::HTML5::SafeListSanitizer

It also introduces a Rails::HTML4 module which is the new canonical home for the existing sanitizers (which implicity always used the HTML4 parser). Finally, the Rails::Html module has been renamed to Rails::HTML.

The following aliases are maintained for backwards compatibility:

  • Rails::Html points to Rails::HTML
  • Rails::HTML::FullSanitizer points to Rails::HTML4::FullSanitizer
  • Rails::HTML::LinkSanitizer points to Rails::HTML4::LinkSanitizer
  • Rails::HTML::SafeListSanitizer points to Rails::HTML4::SafeListSanitizer

Miscellaneous other changes:

  • documentation cleanup
  • added an .rdoc_options file and have added or removed :nodoc: as appropriate
  • added a rubocop job to CI
  • updated tests to use assert_equal instead of assert_dom_equal to avoid obfuscation
  • removed the dev dependency on rails-dom-testing
  • added test coverage for the API used by Rails

Notably this PR does not include integration with Rails. It offers up new Sanitizer classes that provide HTML5 parsing suitable for use by Rails, but making Rails choose the sanitizer flavor will be in a subsequent PR.

Usage was removed in #156
These are the methods used by Rails. Also move these methods into the
Sanitizer class definition.
because we're preparing to extract fragment parsing
@flavorjones flavorjones force-pushed the flavorjones-support-html5-parsing branch from 2774a3d to d8e3251 Compare May 12, 2023 16:47
lib/rails/html/sanitizer.rb Show resolved Hide resolved
- parse_fragment
- scrub
- serialize

These are composed in each sanitizer's `#sanitize` method.

We're preparing to make html4 and html5 variations and I'd like to use
mixins for code reuse.
The three concerns are:

- fragment parsing
- scrubbing
- serialization

These are combined in a fourth concern which implements a `#sanitize`
method that composes the other concerns like:

  serialize(scrub(parse_fragment(html)))

This should enable us to easily add HTML5 fragment parsing in a
subsequent commit.
but Rails::Html is an alias for backwards compatibility
and test that the sanitizer class names are HTML4 variations
create a new test class for each sanitizer
which use Loofah.html5_fragment.

Note that we repeat the sanitizer tests for both variations using a
module that's mixed into two test classes.
This feels pretty good, not gonna lie. The majority of tests that
needed to change were the ones related to the CDATA node issues:

https://github.com/flavorjones/loofah/blob/main/docs/2022-10-decision-on-cdata-nodes.md

and I'm happy to see everything working as expected.
@flavorjones flavorjones force-pushed the flavorjones-support-html5-parsing branch from d8e3251 to 53e9aa8 Compare May 12, 2023 19:53
@flavorjones flavorjones merged commit 4122362 into main May 12, 2023
@flavorjones flavorjones deleted the flavorjones-support-html5-parsing branch May 12, 2023 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants