glyphIgo is a Swiss Army knife for dealing with fonts and EPUB eBooks
- Version: 3.0.3
- Date: 2015-06-07
- Developer: Alberto Pettarin (contact)
- License: the MIT License (MIT), see LICENSE.md
There are seven main usage scenarios:
- check whether a given font file contains all the glyphs needed to properly display the given EPUB or plain text file,
- convert a font file from/to TTF/OTF/WOFF format,
- count the number of characters in an EPUB file or a plain text UTF-8 file,
- list all Unicode characters used in an EPUB file or a plain text UTF-8 file or all Unicode glyphs present in a TTF/OTF/WOFF font file,
- lookup for information about a given Unicode character, including heuristic name matching,
- (de)obfuscate a font, with either the IDPF or the Adobe algorithm, and
- subset a given font file, that is, create a new font file containing only the subset of glyphs of a given font that are contained in a EPUB or plain text file.
Optionally, you can export a list of Unicode glyphs/characters, produced by the above commands, as an EPUB file for quick testing on an eReader.
2016-06-30 I planned to deeply restructure glyphIgo during the 2016 summer. In particular, to update it to use the new fontforge and fonttools libraries, and to add better documentation. Thank you for your patience.
2016-10-05 ... and of course I have not had time to work on glyphIgo, since my plans for the 2016 summer went out of the window. However, I have some more time now (October 2016), and I would like to address the following issues:
fontforgeas the core font library;
- restructure the code as a library, usable in third-party code;
- fix the command line parsing (
- release on PyPI;
- better documentation.
Stay tuned (and/or have a look at the
2016-12-22 ... and of course I have not worked on glyphIgo. I am sorry about that. I am not sure when I will have the time for it.
$ ./glyphIgo.py check|convert|count|list|lookup|obfuscate|subset [options] optional arguments: -h, --help show this help message and exit --version print version and exit -c CHARACTER, --character CHARACTER lookup CHARACTER, specified as name, partial name, dec/hex codepoint, or Unicode character -d DECODE, --decode DECODE use DECODE encoding to decode the input EBOOK or PLAIN file -e EBOOK, --ebook EBOOK ebook file, in EPUB/ZIP format -f FONT, --font FONT font file, in TTF/OTF/WOFF format -g GLYPHS, --glyphs GLYPHS font file, specified as a list of decimal Unicode codepoints contained in plain text file GLYPHS, one codepoint per line -i ID, --id ID (de)obfuscate FONT using ID to compute the obfuscation key -o OUTPUT, --output OUTPUT create OUTPUT file -p PLAIN, --plain PLAIN ebook file, in plain text format -r RANGE, --range RANGE range, in '0x????-0x????' or '????-????' format -q, --quiet quiet output -s, --sort sort output by character count instead of character codepoint -u, --epub output an EPUB file containing the Unicode characters in the input file(s) -v, --verbose verbose output -w, --nohumanreadable verbose output without human readable messages --adobe use Adobe obfuscation algorithm --blocks print range and name of Unicode blocks --compact compact lookup output (Unicode character, name, and codepoint only) --exact use exact Unicode lookup (default) --exclude exclude the characters in EBOOK or PLAIN from the output --full full lookup output (default) --heuristic use heuristic Unicode lookup --idpf use IDPF obfuscation algorithm (default) --preserve preserve X(HT)ML tags instead of stripping them away exit codes: 0 = no error 1 = RESERVED 2 = invalid command line argument(s) 4 = missing glyphs in the font file to correctly display the given ebook or file 8 = failure while executing the requested command
1. Print this usage message $ ./glyphIgo.py -h 2. Check whether all the characters in ebook.epub can be displayed by font.ttf $ ./glyphIgo.py check -f font.ttf -e ebook.epub 3. As above, but use font_glyph_list.txt containing a list of decimal codepoints for the font glyphs $ ./glyphIgo.py check -g font_glyph_list.txt -e ebook.epub 4. As above, but sort missing characters (if any) by their count (in ebook.epub) instead of by Unicode codepoint $ ./glyphIgo.py check -f font.ttf -e ebook.epub -s 5. As above, but also create missing.epub containing the list of missing Unicode characters $ ./glyphIgo.py check -f font.ttf -e ebook.epub -u -o missing.epub 6. Convert font.ttf (TTF) into font.otf (OTF) $ ./glyphIgo.py convert -f font.ttf -o font.otf 7. Count the number of characters in ebook.epub $ ./glyphIgo.py count -e ebook.epub 8. As above, but preserve tags $ ./glyphIgo.py count -e ebook.epub --preserve 9. Print the list of glyphs in font.ttf $ ./glyphIgo.py list -f font.ttf 10. As above, but just output the decimal codepoints $ ./glyphIgo.py list -f font.ttf -q 11. Print the list of characters in ebook.epub $ ./glyphIgo.py list -e ebook.epub 12. As above, but also create list.epub containing the list of Unicode characters $ ./glyphIgo.py list -e ebook.epub -u -o list.epub 13. Print the list of characters in page.xhtml $ ./glyphIgo.py list -p page.xhtml 14. Print the list of characters in the range 0x2200-0x22ff (Mathematical Operators) $ ./glyphIgo.py list -r 0x2200-0x22ff $ ./glyphIgo.py list -r "Mathematical Operators" 15. Print the range and name of Unicode blocks $ ./glyphIgo.py list --blocks 16. Lookup for information for Unicode character $ ./glyphIgo.py lookup -c 8253 $ ./glyphIgo.py lookup -c 0x203d $ ./glyphIgo.py lookup -c ‽ $ ./glyphIgo.py lookup -c "INTERROBANG" 17. As above, but print compact output $ ./glyphIgo.py lookup --compact -c 8253 $ ./glyphIgo.py lookup --compact -c 0x203d $ ./glyphIgo.py lookup --compact -c ‽ $ ./glyphIgo.py lookup --compact -c "INTERROBANG" 18. Heuristic lookup for information for Unicode characters which are Greek omega letters with oxia $ ./glyphIgo.py lookup --heuristic -c "GREEK OMEGA OXIA" 19. (De)obfuscate font.otf into obf.font.otf using the given id and the IDPF algorithm $ ./glyphIgo.py obfuscate -f font.otf -i "urn:uuid:9a0ca9ab-9e33-4181-b2a3-e7f2ceb8e9bd" -o obf.font.otf 20. As above, but use Adobe algorithm $ ./glyphIgo.py obfuscate -f font.otf -i "urn:uuid:9a0ca9ab-9e33-4181-b2a3-e7f2ceb8e9bd" -o obf.font.otf --adobe 21. Subset font.ttf into min.font.otf by copying only the glyphs appearing in ebook.epub $ ./glyphIgo.py subset -f font.ttf -e ebook.epub -o min.font.otf 22. Subset font.ttf into rem.font.ttf by removing the glyphs appearing in list.txt $ glyphIgo.py subset -f font.ttf -p list.txt -o rem.font.ttf --exclude
Please see OUTPUT.md for usage examples with their actual output.
glyphIgo is released under the MIT License since version 2.0.0 (2014-03-07).
Previous versions, hosted in a Google Code repo, were released under the GNU GPL 3 License.
argcomplete for autocompleting options/filenames.
Please refer to the
for directions on how to enable it.
glyphIgo requires Python 2.7 (or later Python 2.x), and Python module
On Ubuntu/Debian, you can install the
apt-get install python-fontforge.
On other OSes... I do not know, I use it on Debian only. Feel free to let me know, I will add your installation notes here.
For the sake of speed and code clarity, the given EPUB is not "fully parsed". In particular:
- the list of Unicode characters is extracted by inspecting all files inside the ZIP archive whose lowercased name ends in
xml(except those in
META-INF/, which are skipped), and
- the book pages are not parsed (e.g., a Unicode character appearing inside a comment will be accounted for).
Please observe that these approximations err on the "conservative" side, possibly generating "false-positives" but never generating "false-negatives".
You can also pass a ZIP archive, containing several XHTML/HTML/XML pages, using the
By default, glyphIgo assumes that all files are encoded in UTF-8.
You can change the encoding used while decoding plain text files
by specifying the
Conversion from entity (named or not) to Unicode codepoint is supported.
Unfortunately, there is no
python-fontforge module for Python 3 in the stable Debian repo (as of 2014-03-07), so you must use Python 2.7 (or later Python 2.x) to run glyphIgo.
--epub switch, you also need to download
genEPUB.py and put it into the same directory of
Limitations and Missing Features
- Support for Unicode modifiers
- Full EPUB parsing
- Font obfuscation: parse the uid directly from a given EPUB
- Support for autocompleting via
- Shortcuts (e.g.,
"-C" == "count -e")
What does "glyphIgo" mean?
Most people think that
glyphIgo = "glyph I go".
Instead, the name comes from
figo (Italian slang for
Why did you code glyphIgo?
I needed to perform the "font checking" on nearly 100,000 EPUB files at once, for a large project. Then, I felt bad having this little piece of code sitting idly, so I decided to publish it on Google Code. In March 2014, I moved it to GitHub.