Skip to content


Subversion checkout URL

You can clone with
Download ZIP
A High level HTML fragment sanitizer.

Merge pull request #8 from lonelyplanet/ds-bump-loofah

Bump loofah version in gemspec file
latest commit 404c9387ba
davidthesavage authored
Failed to load latest commit information.
spec allow dupes as an option
.gitignore Ignore coverage directory.
Gemfile Added RSpec rake task.
rvmrc.example Added RSpec rake task.


Build Status

CrazyHarry is a high-level html fragment sanitiser/cleaner in use at Lonely Planet. It is based on Flavour Jones's Loofah Gem.

Loofah is a great tool and we've been using it in a number of different projects. Hoewever, we found that we were repeating the same types of cleaning job in multiple places.

CrazyHarry wraps up a number these tasks in a simple DSL, while adding commands for a few edge cases that are not straightforward with Loofah.


gem 'crazy_harry' 

bundle install 


object_with_description.each do |obj|

  if descriptions[l.external_id]
    sanitised_fragment = CrazyHarry.fragment(descriptions[obj.external_id])
      .redact!( unsafe: true,     tags: 'img')
      .change!( from: 'b',        to: 'h3' )
      .change!( from: 'strong',   to: 'h3' )

    obj.update_column(:description, sanitised_fragment.to_s)

Default Actions

It automatically removes blank tags, converts <br /> tags to wrapped paragraphs and de-dupes content.


As per the previous example, all calls except .strip! (which removes all markup) may be chained. (.strip! can be the last element in a chain. See below).

Scoping and Targeting by Content

All commands except .strip! accept scope: and text: attributes:

CrazyHarry.fragment( '<div><b>Hotels</b></div><p><b>Hotels</b></p><b>Tents</b>' ).change!( from: 'b', to: 'em', scope: 'p' ).to_s

will produce:



CrazyHarry.fragment( 'Hot <b>hotels</b> in <b>Saigon</b>' ).change!( from: 'b', to: 'em', text: 'hotels' ).to_s

will produce:

Hot <em>hotels</em> in <b>Saigon</b>

Adding Attributes

Use the .translate command to do this:

harry.fragment( '<b>Header</b><p>Content</p>' ).translate!( add_attributes: { class: 'partner'} ).to_s

will return:

<b class="partner">Header</b><p class="partner">Content</p>

If a tag already has an attribute, the new attribute will be appended to the existing one:

<b class="bright-red partner">Header</b><p class="partner">Content</p>


Specific Tags

Use the .redact! command. It does not strip unsafe tags by default. To do this, pass the unsafe: true option.

All Tags

Use the .strip! command. It can be used as the last tag in a chain (with .translate( from_text: <some text>, to_text: <other text> ), for instance), but should generally be the only call you make.

Text Translation

The .translate! command can change tag content, preserving case:

CrazyHarry.fragment( '<h3>Lodging</h3> lodging' ).translate!( from_text: 'lodging', to_text: 'hotel' ).to_s

will return:

<h3>Hotel</h3> lodging

Fostering orphaned tags.

The .foster! command will wrap orphaned li tags in a ul tag. This only works for li tag for now.

CrazyHarry.fragment('<li>Flying list tag</li>').foster!.to_s

will return:

<ul><li>Flying list tag</li></ul>


The .truncate! command will truncate input preserving HTML tags.

CrazyHarry.fragment( '<p>Long <b>text goes here</b></p>' ).truncate!(3)

will return

<p>Long <b>text goes</b>…</p>

It accepts hash of options and passes it to HTML_Truncator. It returns self and can be chained with other commands.

Known Issues/TODO

  • De-duping does not take account of whitespace. So, <p>Some Content</p> and <p>Some Content </p> will not be treated as duplicates.
  • Be able to turn off default actions.
  • It should be able to work on documents as well as fragments.
  • Merge .translate! with .change!
  • Foster other orphaned tags besides just li.


  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Added some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request
Something went wrong with that request. Please try again.