Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown with HTML::Pipeline & Rails #19

Open
JuanitoFatas opened this issue Jun 30, 2016 · 0 comments
Open

Markdown with HTML::Pipeline & Rails #19

JuanitoFatas opened this issue Jun 30, 2016 · 0 comments
Labels

Comments

@JuanitoFatas
Copy link
Contributor

JuanitoFatas commented Jun 30, 2016

Markdown

Markdown is a lightweight and easy-to-use syntax for styling all forms of writing on the modern web platforms. Checkout this excellent guide by GitHub to learn everything about Markdown.

HTML::Pipeline intro

HTML::Pipeline is HTML Processing filters and utilities. It includes a small framework for defining DOM based content filters and applying them to user provided content. Read an introduction about HTML::Pipeline in this blog post. GitHub uses the HTML::Pipeline to implement markdown.

Implementing Markdown

[ Markdown Content ] -> [ RenderMarkdown ] -> [ HTML ]

Content goes into our pipeline, outputs HTML, as simple as that!

Let's implement RenderMarkdown.

Install HTML::Pipeline & dependency for Markdown

First we'll need to install HTML::Pipeline and associated dependencies for each feature:

# Gemfile
gem "github-markdown"
gem "html-pipeline"

1-min HTML::Pipeline tutorial

require "html/pipeline"
filter = HTML::Pipeline::MarkdownFilter.new("Hi **world**!")
filter.call

Filters can be combined into a pipeline:

pipeline = HTML::Pipeline.new [
  HTML::Pipeline::MarkdownFilter,
  # more filter ...
]
result = pipeline.call "Hi **world**!"
result[:output].to_s

Each filter to hand its output to the next filter's input:

--------------- Pipeline ----------------------
|                                             |
| [Filter 1] -> [Filter 2] ... -> [Filter N]  |
|                                             |
-----------------------------------------------

RenderMarkdown

We can then implement RenderMarkdown class by leveraging HTML::Pipeline:

class RenderMarkdown
  def initialize(content)
    @content = content
  end

  def call
    pipeline = HTML::Pipeline.new [
      HTML::Pipeline::MarkdownFilter
    ]
    pipeline.call(content)[:output].to_s
  end

  private

    attr_reader :content
end

To use it:

RenderMarkdown.new("Hello, **world**!").call
=> "<p>Hello, <strong>world</strong>!</p>"

It works and it is very easy!

Avoid HTML markup

Sometimes users may be tempted to try something like:

<img src='' onerror='alert(1)' />

which is a common trick to create a popup box on the page, we don't want all users to see a popup box.

Due to the nature of Markdown, HTML is allowed. You can use HTML::Pipeline's built-in SanitizationFilter to sanitize.

But the problem with SanitizationFilter is that, disallowed tags are discarded. That is fine for regular use case of "html sanitization" where we want to let users enter some html. But actually We never want HTML. Any HTML entered should be displayed as-is.

For example, writing:

hello <script>i am sam</script>

Should not result in the usual sanitized output (GitHub's behavior):

hello

Instead, it should output (escaped HTML)

hello <script>i am sam</script>

So in here we take a different approach:

We can add a NohtmlFilter, simply replace < to &lt;:

class NoHtmlFilter < TextFilter
  def call
    @text.gsub('<', '&lt;')
    # keep `>` since markdown needs that for blockquotes
  end
end

Put this NoHtmlFilter Before our markdown filter:

class NoHtmlFilter < HTML::Pipeline::TextFilter
  def call
    @text.gsub('<', '&lt;')
  end
end

class RenderMarkdown
  def initialize(content)
    @content = content
  end

  def call
    pipeline = HTML::Pipeline.new [
      NoHtmlFilter,
      HTML::Pipeline::MarkdownFilter,
    ]
    pipeline.call(content)[:output].to_s
  end

  private

    attr_reader :content
end

We keep > since markdown needs that for blockquotes, let's try this:

RenderMarkdown.new("<img src='' onerror='alert(1)' />").call
=> "<p>&lt;img src=&#39;&#39; onerror=&#39;alert(1)&#39; /&gt;</p>"

While <, > got escaped, it still looks the same from user's perspective.

But what if we want to talk about some HTML in code tag?

> content = <<~CONTENT
  > quoted text

  123`<img src='' onerror='alert(1)' />`45678
CONTENT

> RenderMarkdown.new(content).call
=> "<blockquote>\n<p>quoted text</p>\n</blockquote>\n\n<p>123<code>&amp;lt;img src=&#39;&#39; onerror=&#39;alert(1)&#39; /&gt;</code>45678</p>"

The & in the code tag also got escaped, we don't want that. Let's fix this:

class NohtmlMarkdownFilter < HTML::Pipeline::MarkdownFilter
  def call
    while @text.index(unique = SecureRandom.hex); end
    @text.gsub!("<", unique)
    super.gsub(unique, "&lt;")
  end
end

class RenderMarkdown
  def initialize(content)
    @content = content
  end

  def call
    pipeline = HTML::Pipeline.new [
      NohtmlMarkdownFilter,
      HTML::Pipeline::MarkdownFilter,
    ]
    pipeline.call(content)[:output].to_s
  end

  private

    attr_reader :content
end

> RenderMarkdown.new(content).call
=> "<blockquote>\n<p>quoted text</p>\n</blockquote>\n\n<p>123<code>&lt;img src=&#39;&#39; onerror=&#39;alert(1)&#39; /&gt;</code>45678</p>"

This is awesome, but here comes another bug report, autolink does not work anymore:

content = "hey Juanito <juanito@example.com>"

> RenderMarkdown.new(content).call
=> "<p>hey Juanito <a href=\"mailto:&lt;juanito@example.com\">&lt;juanito@example.com</a>&gt;</p>"

The fix is to add a space after our unique string when replacing the <:

class NohtmlMarkdownFilter < HTML::Pipeline::MarkdownFilter
  def call
    while @text.index(unique = "#{SecureRandom.hex} "); end
    @text.gsub!("<", unique)
    super.gsub(unique, "&lt;")
  end
end

class RenderMarkdown
  def initialize(content)
    @content = content
  end

  def call
    pipeline = HTML::Pipeline.new [
      NohtmlMarkdownFilter,
      HTML::Pipeline::MarkdownFilter,
    ]
    pipeline.call(content)[:output].to_s
  end

  private

    attr_reader :content
end

Now autolink works as usual:

content = "hey Juanito <juanito@example.com>"

> RenderMarkdown.new(content).call
=> "<p>hey Juanito &lt;<a href=\"mailto:juanito@example.com\">juanito@example.com</a>&gt;</p>"

But other cases come in. Final version:

class NohtmlMarkdownFilter < HTML::Pipeline::MarkdownFilter
  def call
    while @text.index(unique = SecureRandom.hex); end
    @text.gsub!("<", "#{unique} ")
    super.gsub(Regexp.new("#{unique}\\s?"), "&lt;")
  end
end

Sanitization

While we can display escaped HTML, we still need to add sanitization.

Add SanitizationFilter after our markdown got translated into HTML:

# Gemfile
gem "sanitize"

# RenderMarkdown
class RenderMarkdown
  ...


  def call
    pipeline = HTML::Pipeline.new [
      NohtmlMarkdownFilter,
      HTML::Pipeline::SanitizationFilter,
    ]

    ...
  end

  ...

end

So that our HTML is safe!

Nice to have

Syntax Highlight with Rouge

No more pygements dependency, syntax highlight with Rouge.

# Gemfile
gem "html-pipeline-rouge_filter"

# RenderMarkdown
class RenderMarkdown
  ...


  def call
    pipeline = HTML::Pipeline.new [
      NohtmlMarkdownFilter,
      HTML::Pipeline::SanitizationFilter,
      HTML::Pipeline::RougeFilter
    ]

    ...
  end

  ...

end

Twemoji instead of gemoji (more emojis)

While HTML::Pipeline originally came with an EmojiFilter, which uses gemoji under the hood, there is an alternative solution, twemoji.

# Gemfile
gem "twemoji"

# new file
class EmojiFilter < HTML::Pipeline::Filter
  def call
    Twemoji.parse(doc,
      file_ext: context[:file_ext] || "svg",
      class_name: context[:class_name] || "emoji",
      img_attrs:  context[:img_attrs] || {},
    )
  end
end

# RenderMarkdown
class RenderMarkdown
  ...


  def call
    pipeline = HTML::Pipeline.new [
      NohtmlMarkdownFilter,
      HTML::Pipeline::SanitizationFilter,
      EmojiFilter,
      HTML::Pipeline::RougeFilter
    ]

    ...
  end

  ...

end

Wrap Up

We now have a markdown that can:

  • Can output escaped HTML
  • Syntax highlight with Ruby's Rouge
  • And Better Emoji Support via Twemoji

See JuanitoFatas/markdown@eb7f434...377125 for full implementation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant