Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

A context-aware, medium-neutral entity maker

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 bin
Octocat-spinner-32 inc
Octocat-spinner-32 tests
Octocat-spinner-32 www
Octocat-spinner-32 README.md
README.md

Lexentity: A context-aware, medium-neutral entity maker

by Sean Coates

Let's face it--this sentence is much "uglier" than the one below it.
Let’s face it–this sentence is much “prettier” than the one above it.

Lexentity is a simple piece of software that takes HTML as input and outputs a context-aware, medium-neutral representation of that HTML, with apostrophes, quotes, emdashes, ellipses, accents, etc., replaced with their respective numeric XML/Unicode entities.

Context-aware

Context is important. It is especially important when considering a piece of HTML like this:

<p>…and here's the example code:</p>
<pre><code>echo "watermelon!\n";</pre></code>

Contextually, you'd want here's to become here’s, but you certainly don't want the code to read echo “watermelon!\n”;.

A fancy/smart/curly quotes apostrophe is appropriate, but curly quotes in the code are likely to cause a parse error.

Lexentity understands its context, and acts appropriately, my means of lexical analysis, and turning tokens into text, not through a mostly-naive and overly-complicated regular expression.

Medium-neutral

My friend and colleague Jon Gibbins said it best in http://dotjay.co.uk/2006/sep/named-html-entities-in-rss. In modern systems, you can't count on your HTML to always be represented as HTML. It's often (poorly) embedded in RSS or other HTML-like media, as XML.

Therefore, it is important to avoid HTML-specific entities like and , and instead use their Unicode code point to form numeric entities such as &#8230;. This ensures proper display on any terminal that can properly render Unicode XML, and avoids missing entity errors.

Demo

Try a demo at http://files.seancoates.com/lexentity/.

Something went wrong with that request. Please try again.