Hopper's fixes for HtmlCleaner (based on amplafi htmlcleaner fork)
Pull request Compare This branch is 259 commits behind amplafi:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


* omitHtmlEnvelope behavior change:
 * output all the html contained in the body not just first TagNode contents. ( useful for cleaning html fragments )  ( creates a new blank TagNode to hold the nodes to be outputed
 * omitHtmlEnvelope also triggers omitDoctype

* TagNodes that can be reopened after their parent is closed ( i.e. <b><i></b> -- would result in <b><i></i><b><i> ) if the reopened tag ( <i> in this example ) is immediately closed, the reopened tag is pruned. -- accomplished by checking the autoGenerated boolean on TagNode ) 

* refactoring template methods from Utils to TagTransformer.

*CleanerTransformations changes:
 * Utils.updateTagTransformations now member function.
 * Handles the transformation work so that multiple TagTransformations can be applied to a given tag. ( sets up for regex expression matching ) 
 * now owns responsibility for determining transformed tagname.
 *concept of global AttributeTransformations -- used to strip all attributes that start with "on" for example ( i.e. "onclick" , "onblur" ) 
 * plus added regular expressions matching on values/attribute names

XmlSerializer/HtmlCleaner -- remove IOException being thrown when reading from strings.

* work on spotting "tricky" encoding -- unencode normal ascii characters.

 * get Default Output charset from CleanerProperties

 * handle badly encoded numbers better for example &x0fx , &0A; were parsed badly before

 * added a bunch of html special entities

 * convert &apos; in html context to &#39; 
 * added regex attribute/value matching

 * random spelling corrections
 * additional documentation
* add greek and math symbols

* cleanup change - if tag was closed due to improperly placed child it will be reopened after the child.
  See ClosedTagReopenTest.java for examples
* added audit code - now it is possible to hook in code that will be notified about changes that htmlcleaner does.
  See CleanerProperties#addHtmlModificationListener.