A real-time HTML filter and WYSIWYG / Microsoft Word / Rich Text editor cleanup plugin for ExpressionEngine
PHP
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
ex_sponge
.gitignore
README.textile

README.textile

ExSponge

A real-time HTML filtering and WYSIWYG / Microsoft Word / Rich Text editor cleanup plugin for ExpressionEngine v2

Info

This plugin cleans up the mess your clients (and other filters) leave behind!

Whether your markup was originally entered via WYSIWYG (Rich Text) editors (such as TinyMCE, CKEditor, FCKEditor, Expresso, Wyvern, Wygwam, Blogger’s online editor, or ExpressionEngine’s own built-in Rich Text Editor), pasted in from Microsoft Word or Adobe InDesign, or bulk-imported from XML via WordPress or Blogger or another CMS, ExSponge leaves it properly formatted and free of layout-breaking cruft.

It will also optionally remove all tags, or keep only the tags you want. Limit tag parameters too. And you can even trim the fully filtered, cruft-free content down to a specified number of paragraphs.

This plugin is for developers who want neatly formatted paragraphs with minimal, semantic styling, and who do not want the proprietary tags and unnecessary parameters inserted by word processors (or the “tag soup” unwittingly generated by clients) compromising their layout.

Although undoubtedly less comprehensive than HTML TIDY or HTML Purifier, it is also more efficient, easier to set up, and focused on the specific problems you will likely encounter if you give your clients a WYSIWG field with which to edit their channel entries. Especially if they are composing in Word and pasting the content in. In my worst-case scenario (a Microsoft Word document exported to HTML and pasted into an EE Rich Text field), ExSponge reduced the data size by 97% without any loss in content.

ExSponge is not just a real-time (inline) cleaner for text markup. Used with your importing routine, it can clean up markup exported from Blogger or WordPress. And with a little Ajax, it can clean code entered in your SafeCracker forms before they hit your database (set up a simple template that sends your text through the ExSponge filter, and call it via Ajax before submitting the form; more details on that will be added here soon).

Some of what is removed by default:

  • Word document garbage (including comments, proprietary styles, useless XML tags, “smart” tags, etc.)
  • Empty tags (including empty paragraphs, unnecessary tag pairs like <strong></strong>, etc.)
  • Purposefully empty paragraphs that WYSIWYG editors are so fond of (<p> </p>, etc.)
  • Out-of-scope sections (head, title, style, form, script, object, applet, xml)
  • Unnecessary or layout-breaking tags (html, head, iframe, object, center, etc.)
  • Unnecessary parameters within tags (unless otherwise specified)
  • Inline styling (unless otherwise specified)
  • JavaScript (including malicious code)
  • Non-printing and control characters
  • Newlines (\n) and linefeeds (\r)
  • Images with no source
  • Extra whitespace
  • Zero-width spaces
  • Empty lines
  • PHP

In addition, ExSponge will:

  • Convert oddball characters and entities to the appropriate web-safe ASCII equivalent or entity
  • Convert ampersands to entities where appropriate (including inside URLs)
  • Convert smart quotes (curly quotes) to normal quotes
  • Close unterminated tags and quotes
  • Convert non-breaking spaces (&nbsp;) to normal spaces
  • Normalize all tags to lowercase
  • Reformat table text to be readable (if tables tags are to be removed)
  • Give special attention to paragraph formatting, and insert missing paragraph start and end tags
  • Prettify the output (with newlines and tabs)

The final output will be compact, tidy, and ready to use in your layout.

Demo

A live demonstration of ExSponge is available here:

http://fcgrx.com/sponge

Installation

Place the `ex_sponge` folder in your `system/expressionengine/third_party` folder.

Parameters

All parameters are optional:

Parameter Description Default
allow_tags Remove all HTML tags from the markup and leave only raw, unformatted text (“no”), strip most tags but keep the most useful and safe (“safe”, which is the equivalent of “<p><br><b><a><i><em><strong><del><ins><u><ul><ol><li><img><h1><h2><h3><h4><h5><h6><blockquote><q><sup><sub><dl><dt><dd><cite><table><tr><td><th><thead><tbody><tfoot>”), strip most tags but the minimum (“minimal”, which is the equivalent of “<p><br><b><a><i><em><strong><del><ins><u><ul><ol><li><img><h1><h2><h3><h4><h5><h6><blockquote><q><sup><sub>”), or strip all tags except the ones you list. Tip: if you set this parameter to “<p>”, text will be reduced to paragraphs only. Note that out-of-scope tags (html, head, link, header, footer etc) will be removed regardless. “safe”
allow_breaks Allow <br> tags to remain as-is (“yes”), or only convert double-breaks (<br><br>) to paragraphs while leaving single breaks alone (“single”), or consolidate all breaks into paragraphs (“no”). “no”
allow_parameters Allow tag parameters to remain (“yes”), strip all but the most necessary (“no”, which is the equivalent of “href|src|height|width|alt|title|name|cite|colspan”, or strip all parameters except the ones you list. “no”
convert_tags Convert presentational tags <i> and <b> and <s> and <strike> to the semantic <em> and <strong> and <del> and <ins> (“yes”), or leave them as-is (“no”). “yes”
paragraphs Clip the text after a specified number of paragraphs. Any positive number (“1”, “4”, “9999”) will cause the text to be trimmed. “-1” will not clip the text at all. “-1”

NOTE: allow_styles parameter removed as of v0.9; it is redundant since the addition of the more flexible allow_attributes parameter

Usage

To use this plugin, simply wrap the text you want processed between these tag pairs:

{exp:ex_sponge}
     ( your mess goes here )
{/exp:ex_sponge}

In my templates, I typically wrap the above tag (with no parameters) around the output of any Rich Text or WYSIWYG field the client is allowed to edit.

A more complex example, which reduces the markup down to the basics, keeps only the first four paragraphs, and takes advantage of EE’s built-in tag caching:

{exp:ex_sponge allow_tags="<p><strong><em><ul><li>" 
 allow_attributes="href|src|alt|title" paragraphs="4" cache="yes" refresh="1440"}
     ( your mess goes here )
{/exp:ex_sponge}

License

Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License

http://creativecommons.org/licenses/by-nc-sa/3.0/

Contact

Support / Feature Requests

This project is an active part of all my ExpressionEngine installations, and I’d like to keep it as fast, full-featured and bulletproof as possible.

Have a bug? Feature request? Please create an issue on GitHub at https://github.com/fcgrx/ex_sponge/issues

Changelog

  • v0.9.1 – Speed improvements. allow_tags now can override the default purging of out-of-scope tags. Added support for tables. Many small refinements.
  • v0.9.0 – Added “minimal” option to allow_tags parameter. Further refinements to the allow_parameters parameter. Many optimizations to filters. Additional filtering for malformed HTML. Support for tables. Removed allow_styles parameter (made redundant by allow_parameters).
  • v0.8.9 – Added “single” option to allow_breaks.
  • v0.8.8 – Added height and width to attribute whitelist
  • v0.8.7 – Refined allow_parameters
  • v0.8.6 – Added allow_parameters argument, which allows control over how tag parameters are filtered. Slight rearrangement of filter order. Removed redundant filters and made minor refinements to others.
  • v0.8.5 – Changed the allow_tags argument to default to new “safe” value, which includes only critical tags. Expanded the MS Word filters. Rearranged filter order for better interaction. Removed redundant filters. Fixed an issue with lost spaces. Made some searches case-insensitive. Made output HTML a little prettier.
  • v0.8.4 – Minor additions to MS Word filters.
  • v0.8.3 – Initial release.