Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove erb_parser and add deface for transforming HTML+ERB #60

Merged
merged 5 commits into from
Jan 22, 2023

Conversation

marcoroth
Copy link
Owner

@marcoroth marcoroth commented Jan 22, 2023

This pull request removes the erb_parser dependency and adds deface for turning HTML+ERB into HTML with <erb> tags.

Here's a summary of what changed:

  • erb_parser used to transform the input from <div><%= method %></div> to <div><erb loud> method </erb></div> in Phlexing::ErbTransformer. This was replaced with the Deface::Parser

  • erb_parser used to extract the Ruby code from ERB templates in Phlexing::RubyAnalyzer. This was replaces with our Phlexing::Parser class.

  • most html variables were renamed to source

  • a bunch of new tests for, mostly notably for Phlexing::Minifier, Phlexing::Parser and Phlexing::ErbTransformer

The most important reason for switching from erb_parser to deface is the way how deface transforms the HTML+ERB source into HTML.

Example

Input:

<div class="<%= something? "class-1" : "class-2" %>"><%= some_method %></div>

erb_parser transform output:

<div class=\"<erb interpolated=\"true\"> something? &quot;class-1&quot; : &quot;class-2&quot; </erb>\"><erb interpolated=\"true\"> some_method </erb></div>

Which parsed in Nokogiri ends up with this invalid HTML, because <erb> tags (or any tags for that matter) aren't allowed in attributes:

#(Document:0x5f104 { 
  name = "document", 
  children = [ 
    #(Element:0x5f230 { name = "div", attributes = [ #(Attr:0x5f35c { name = "class", value = "" })] })
  ] 
})

deface transform output:
Deface handles this by prefxing the attribute name with data-erb- and escapes the value. With that we are able to detect which HTML attributes not to be interpolated, so we can process the value of the attribute manually:

<div data-erb-class='&lt;%= something? \"class-1\" : \"class-2\" %&gt;'><erb loud> some_method </erb></div>

Parsing this with Nokogiri gives us the thing we are looking for:

#(Document:0x1eb504 {
  name = "document",
  children = [
    #(Element:0x1eb658 {
      name = "div",
      attributes = [ #(Attr:0x1eb7ac { name = "data-erb-class", value = "<%= something? \"class-1\" : \"class-2\" %>" })],
      children = [ #(Element:0x1eba2c { name = "erb", children = [ #(Text " some_method ")] })]
    })
  ]
})

TL;DR:

This allows us to solve issues like #48.

@marcoroth marcoroth added the enhancement New feature or request label Jan 22, 2023

if source =~ html_tag
Nokogiri::HTML::Document.parse(source)
elsif initial =~ head_tag && source =~ body_tag

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data

This [regular expression](1) that depends on a [user-provided value](2) may run slow on strings starting with '<head' and with many repetitions of '<head'.
Nokogiri::HTML::Document.parse(source)
elsif initial =~ head_tag && source =~ body_tag
Nokogiri::HTML::Document.parse(source).css("html").first
elsif initial =~ head_tag

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data

This [regular expression](1) that depends on a [user-provided value](2) may run slow on strings starting with '<head' and with many repetitions of '<head'.
@marcoroth marcoroth merged commit 985cb6c into main Jan 22, 2023
@marcoroth marcoroth deleted the swap-html-erb-transformer branch January 22, 2023 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant