Skip to content

omarmhaimdat/whatlang-pyo3

Repository files navigation

WhatLang ⚡

Python Binding for WhatLang, a blazing fast language detection library

PyPI version License PyPI - Downloads Build Status

Showcase

WhatLang is a Python library for detecting the language of a text. It is based on the WhatLang Rust library.

Installation

pip install whatlang-pyo3

Usage

Detect

>>> from whatlang import detect
>>> info = detect("This is written in English")
"Language: eng - Script: Latin - Confidence: 0.11450955767632877 - Is reliable: false"
>>> info.lang
"eng"
>>> info.script
"Latin"
>>> info.confidence
0.11450955767632877
>>> info.is_reliable
False
>>> info.to_iso()
"Language: en - Script: Latin - Confidence: 0.11450955767632877 - Is reliable: false"

Detect language

You can also detect the language of a text without the script and the confidence.

>>> from whatlang import detect_lang
>>> detect_lang("This is written in English")
"eng"
>>> detect_lang("Ceci est écrit en français")
"fra"
>>> detect_lang("Ceci est écrit en français").iso
"fr"

Detect script

You can also detect the script of a text without the language and the confidence.

>>> from whatlang import detect_script
>>> detect_script("This is written in English")
Name: Latin - Languages: spa, eng, por, ind, fra, deu, jav, vie, ita, tur, pol, ron, hrv, nld, uzb, hun, aze, ces, zul, swe, aka, sna, afr, fin, slk, tgl, tuk, dan, nob, cat, lit, slv, epo, lav, est, lat
>>> detect_script.name
"Latin"
>>> detect_script.langs
['spa', 'eng', 'por', 'ind', 'fra', 'deu', 'jav', 'vie', 'ita', 'tur', 'pol', 'ron', 'hrv', 'nld', 'uzb', 'hun', 'aze', 'ces', 'zul', 'swe', 'aka', 'sna', 'afr', 'fin', 'slk', 'tgl', 'tuk', 'dan', 'nob', 'cat', 'lit', 'slv', 'epo', 'lav', 'est', 'lat']

Batch detection

You can also detect the language of a list of texts, and take advantage of the parallelism with the n_jobs parameter.

Batch detection

>>> from whatlang import batch_detect
>>> batch_detect(["This is written in English", "Ceci est écrit en français"], n_jobs=-1)
[LanguageInfo(lang='eng', script='Latin', confidence=0.11450955767632877, is_reliable=False), LanguageInfo(lang='fra', script='Latin', confidence=0.11450955767632877, is_reliable=False)]

The performance of the batch detection is much better than the detection of a single text, it can be up to 5 times faster.

Contributing

Contributions are welcome! Please open an issue or a pull request.

Install maturin

We use maturin for the development of this library.

pip install maturin

Build

maturin build --release

License

This project is licensed under the MIT License - see the LICENSE file for details