A Language Detection (LID) library written in Python, able to detect language from text, image or audio content.
Back-end: Python (numpy, pandas, etc.)
Server: Flask
Models: scikit-learn (sklearn), TensorFlow, Keras
Front-end: HTML, CSS (Bootstrap), vanilla JavaScript
Install LingoDect with pip
Clone this repository, and from the repository root run the following command:
pip install -e .
You should now have lingodect installed as a library and can use it for development and testing.
Clone the project
git clone https://github.com/hadarsharon/lingodect
Go to the project directory
cd lingodect
Install dependencies
pip install -r requirements.txt
Start the server
python app.py
Alternatively, you can run commands via the CLI (run with the -h flag for help and information about available commands)
python cli.py -h
To run tests, run the following command from the project root
pytest
The library currently supports all textual input that can be written directly to it (whether as a string via the CLI or
in a text box via the web application GUI), or via plaintext files such as .txt
files.
Audio (Speech) and Image (Handwriting) inputs are available using most common input
formats (.wav
, .flac
, .jpg
, .png
etc.).
In case your input file format is not supported, .wav
and .png
are a safe bet, so you should convert your audio or
image file to them, respectively.
The library models are currently trained on over 100 languages, so there is a good chance whatever language you want to predict is part of the support languages.