Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
A High level HTML fragment sanitizer.
Ruby HTML

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib
spec
.gitignore
.travis.yml
Gemfile
Guardfile
README.md
Rakefile
crazy_harry.gemspec
rvmrc.example

README.md

CrazyHarry

Build Status

CrazyHarry is a high-level html fragment sanitiser/cleaner in use at Lonely Planet. It is based on Flavour Jones's Loofah Gem.

Loofah is a great tool and we've been using it in a number of different projects. Hoewever, we found that we were repeating the same types of cleaning job in multiple places.

CrazyHarry wraps up a number these tasks in a simple DSL, while adding commands for a few edge cases that are not straightforward with Loofah.

Installation

gem 'crazy_harry' 

bundle install 

Usage

object_with_description.each do |obj|

  if descriptions[l.external_id]
    sanitised_fragment = CrazyHarry.fragment(descriptions[obj.external_id])
      .redact!( unsafe: true,     tags: 'img')
      .change!( from: 'b',        to: 'h3' )
      .change!( from: 'strong',   to: 'h3' )

    obj.update_column(:description, sanitised_fragment.to_s)
  end
end

Default Actions

It automatically removes blank tags, converts <br /> tags to wrapped paragraphs and de-dupes content.

Chaining

As per the previous example, all calls except .strip! (which removes all markup) may be chained. (.strip! can be the last element in a chain. See below).

Scoping and Targeting by Content

All commands except .strip! accept scope: and text: attributes:

CrazyHarry.fragment( '<div><b>Hotels</b></div><p><b>Hotels</b></p><b>Tents</b>' ).change!( from: 'b', to: 'em', scope: 'p' ).to_s

will produce:

<div><b>Hotels</b></div><p><em>Hotels</em></p><b>Tents</b>

while:

CrazyHarry.fragment( 'Hot <b>hotels</b> in <b>Saigon</b>' ).change!( from: 'b', to: 'em', text: 'hotels' ).to_s

will produce:

Hot <em>hotels</em> in <b>Saigon</b>

Adding Attributes

Use the .translate command to do this:

harry.fragment( '<b>Header</b><p>Content</p>' ).translate!( add_attributes: { class: 'partner'} ).to_s

will return:

<b class="partner">Header</b><p class="partner">Content</p>

If a tag already has an attribute, the new attribute will be appended to the existing one:

<b class="bright-red partner">Header</b><p class="partner">Content</p>

Stripping

Specific Tags

Use the .redact! command. It does not strip unsafe tags by default. To do this, pass the unsafe: true option.

All Tags

Use the .strip! command. It can be used as the last tag in a chain (with .translate( from_text: <some text>, to_text: <other text> ), for instance), but should generally be the only call you make.

Text Translation

The .translate! command can change tag content, preserving case:

CrazyHarry.fragment( '<h3>Lodging</h3> lodging' ).translate!( from_text: 'lodging', to_text: 'hotel' ).to_s

will return:

<h3>Hotel</h3> lodging

Fostering orphaned tags.

The .foster! command will wrap orphaned li tags in a ul tag. This only works for li tag for now.

CrazyHarry.fragment('<li>Flying list tag</li>').foster!.to_s

will return:

<ul><li>Flying list tag</li></ul>

Truncating

The .truncate! command will truncate input preserving HTML tags.

CrazyHarry.fragment( '<p>Long <b>text goes here</b></p>' ).truncate!(3)

will return

<p>Long <b>text goes</b>…</p>

It accepts hash of options and passes it to HTML_Truncator. It returns self and can be chained with other commands.

Known Issues/TODO

  • De-duping does not take account of whitespace. So, <p>Some Content</p> and <p>Some Content </p> will not be treated as duplicates.
  • Be able to turn off default actions.
  • It should be able to work on documents as well as fragments.
  • Merge .translate! with .change!
  • Foster other orphaned tags besides just li.

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Added some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request
Something went wrong with that request. Please try again.