alphabet-detector

A library to detect what alphabet something is written in. Works on Python 2.7+ and 3.3+

Author

Eli Finkelshteyn (founder constructor.io)

Installation

pip install alphabet-detector

Usage

To instantiate an AlphabetDetector (the object is used for speed optimization):

from alphabet_detector import AlphabetDetector
ad = AlphabetDetector()

In general, you can just use the only_alphabet_chars(unicode_str, alphabet) method and expect a boolean response:

ad.only_alphabet_chars(u"ελληνικά means greek", "LATIN") #False
ad.only_alphabet_chars(u"ελληνικά", "GREEK") #True
ad.only_alphabet_chars(u'سماوي يدور', 'ARABIC') #True
ad.only_alphabet_chars(u'שלום', 'HEBREW') #True
ad.only_alphabet_chars(u"frappé", "LATIN") #True
ad.only_alphabet_chars(u"hôtel lœwe 67", "LATIN") #True
ad.only_alphabet_chars(u"det forårsaker første", "LATIN") #True
ad.only_alphabet_chars(u"Cyrillic and кириллический", "LATIN") #False
ad.only_alphabet_chars(u"кириллический", "CYRILLIC") #True

You can also request free-style detection of any unicode string:

ad.detect_alphabet(u'Cyrillic and кириллический') #{'CYRILLIC', 'LATIN'}

Convenience methods are also provided for some major languages:

ad.is_cyrillic(u"Привет") #True  
ad.is_latin(u"howdy") #True
# NOTE: this only detects Chinese script characters (Hanzi/Kanji/Hanja).
# it does not detect other CJK script characters like Hangul or Katakana
ad.is_cjk(u"hi") #False
ad.is_cjk(u'汉字') #True

NOTE: all strings are expected to be unicode to keep things consistent. Conversion is never done for you, and errors are thrown when a string is not unicode.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
alphabet_detector		alphabet_detector
tests		tests
.coveralls.yml		.coveralls.yml
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
MANIFEST		MANIFEST
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly