# The `langdetect` and `langid` Libraries

## 1. `langdetect`

[`langdetect`](https://pypi.org/project/langdetect/) supports 55 languages out of the box ([ISO 639-1 codes](https://www.wikiwand.com/en/List_of_ISO_639-1_codes)):

> af, ar, bg, bn, ca, cs, cy, da, de, el, en, es, et, fa, fi, fr, gu, he,
hi, hr, hu, id, it, ja, kn, ko, lt, lv, mk, ml, mr, ne, nl, no, pa, pl,
pt, ro, ru, sk, sl, so, sq, sv, sw, ta, te, th, tl, tr, uk, ur, vi, zh-cn, zh-tw

In [25]:
from langdetect import detect
from langdetect import detect_langs

In [22]:
detect("You are a noob.")

'fr'

In [14]:
detect("Eisai enas ilithios re.")

'lt'

In [15]:
detect("Είσαι")

'el'

In [17]:
detect_langs("What's up man?")

[en:0.7142847174151787, tl:0.1428576551437105, id:0.14285762744101516]

## 2. `langid`

[`langid`](https://github.com/saffsd/langid.py) comes pre-trained on 97 languages (ISO 639-1 codes given):

af, am, an, ar, as, az, be, bg, bn, br, bs, ca, cs, cy, da, de, dz, el, en, eo, es, et, eu, fa, fi, fo, fr, ga, gl, gu, he, hi, hr, ht, hu, hy, id, is, it, ja, jv, ka, kk, km, kn, ko, ku, ky, la, lb, lo, lt, lv, mg, mk, ml, mn, mr, ms, mt, nb, ne, nl, nn, no, oc, or, pa, pl, ps, pt, qu, ro, ru, rw, se, si, sk, sl, sq, sr, sv, sw, ta, te, th, tl, tr, ug, uk, ur, vi, vo, wa, xh, zh, zu

In [26]:
import langid

In [29]:
langid.classify("You are a noob.")[0]

'en'

In [31]:
langid.classify("Eisai enas ilithios re.")[0]

'lt'

In [32]:
langid.classify("What's up man?")

('en', -25.157379627227783)

Note the probability displayed above is not normalized. This is faster because it is not necessary to compute a full probability in order to determine the most probable language in a set of candidate languages.

If a confidence score is required, the following can be done:

In [35]:
# from langid.langid import LanguageIdentifier, model
identifier = LanguageIdentifier.from_modelstring(model, norm_probs=True)
identifier.classify("What's up man?")

('en', 0.9999887931088783)

It is also possible to constrain the language set:

In [36]:
identifier.set_languages(['de','fr','it'])
identifier.classify("What's up man?")

('de', 0.6197852890767914)