Tipi is for typographic replacements in HTML.
- Input is HTML code, output is the same HTML code with changes in typography (entities, spaces, quotes, etc.).
- You can't parse HTML with regex.
- The best existing HTML parser and tokenizer for Python is lxml.
- There are more languages than English in the world. Each of them has different typographic rules.
Easy:
$ pip install tipi
Usage of tipi is very straightforward:
>>> from tipi import tipi
>>> html = '<p>"Zavolej mi na číslo <strong class="tel">765-876-888</strong>," řekla, a zmizela...</p>'
>>> html = tipi(html, lang='cs')
>>> html
'<p>\u201eZavolej mi na \u010d\xed\xadslo <strong class="tel">765\u2013876\u2013888</strong>,\u201c \u0159ekla, a\xa0zmizela\u2026</p>'
>>> print html
<p>„Zavolej mi na číslo <strong class="tel">765–876–888</strong>,“ řekla, a zmizela…</p>
Remember that tipi is designed to work with HTML. In case you need to perform replacements on plaintext, escape it first:
>>> fron tipi import tipi
>>> tipi('b -> c') # this works only by coincidence!
u'b → c'
>>> tipi('a <- b -> c')
u'a c'
>>> import cgi
>>> html = cgi.escape(u'a <- b -> c')
>>> html
u'a <- b -> c'
>>> tipi(html)
u'a ← b → c'
- Support for multiple languages.
- Language-sensitive replacements for single quotes and double quotes.
- Ellipsis, dashes, nonbreakable spaces, ...
- Arrows (--> turned into → ), dimensions (12 × 30).
- Symbols (trademark, registered, copyright, EUR, ...)
- Typogrify - English only, adds markup for styling, on top of smartypants
- cstypo - Czech only, not working well with HTML
- Inspiration from Typogrify?
- Get some inspiration from Dero's and Typomil's typography guides.
- Get some inspiration from Liteera.cz) (source).
- Maybe also some inspiration from here.
© 2013-2014 Jan Javorek <mail@honzajavorek.cz>
This work is licensed under MIT license.