Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.rst

Tipi

Tipi is for typographic replacements in HTML.

https://travis-ci.org/honzajavorek/tipi.png?branch=master https://coveralls.io/repos/honzajavorek/tipi/badge.png?branch=master

Ideas behind this project

  • Input is HTML code, output is the same HTML code with changes in typography (entities, spaces, quotes, etc.).
  • You can't parse HTML with regex.
  • The best existing HTML parser and tokenizer for Python is lxml.
  • There are more languages than English in the world. Each of them has different typographic rules.

Installation

Easy:

$ pip install tipi

Quickstart

Usage of tipi is very straightforward:

>>> from tipi import tipi
>>> html = '<p>"Zavolej mi na číslo <strong class="tel">765-876-888</strong>," řekla, a zmizela...</p>'
>>> html = tipi(html, lang='cs')
>>> html
'<p>\u201eZavolej mi na \u010d\xed\xadslo <strong class="tel">765\u2013876\u2013888</strong>,\u201c \u0159ekla, a\xa0zmizela\u2026</p>'
>>> print html
<p>Zavolej mi na čí­slo <strong class="tel">765876888</strong>,“ řekla, a zmizela</p>

Remember that tipi is designed to work with HTML. In case you need to perform replacements on plaintext, escape it first:

>>> fron tipi import tipi
>>> tipi('b -> c')  # this works only by coincidence!
u'b → c'
>>> tipi('a <- b -> c')
u'a  c'
>>> import cgi
>>> html = cgi.escape(u'a <- b -> c')
>>> html
u'a &lt;- b -&gt; c'
>>> tipi(html)
u'a ← b → c'

Features

  • Support for multiple languages.
  • Language-sensitive replacements for single quotes and double quotes.
  • Ellipsis, dashes, nonbreakable spaces, ...
  • Arrows (--> turned into → ), dimensions (12 × 30).
  • Symbols (trademark, registered, copyright, EUR, ...)

Alternatives

Plans

License: MIT

© 2013-2014 Jan Javorek <mail@honzajavorek.cz>

This work is licensed under MIT license.

About

Typographic replacements in HTML

Resources

License

Packages

No packages published

Languages