Skip to content
This repository has been archived by the owner on Apr 12, 2024. It is now read-only.

a probabilistic language identification system that identifies the language of a sentence

License

Notifications You must be signed in to change notification settings

pbgnz/automatic-language-identification

Repository files navigation

Automatic-Language-Identification

A probabilistic language identification system that identifies the language of a sentence

Visit demo website.

Requirements

  1. Python 2.7.15
  2. Python 3.7.0
  3. Pip

Installation

pip install -r requirements.txt

Detailed Usage

CLI

ali is a probabilistic language identification system that identifies the langue of a sentence.

usage: ali [-v] (-c TRAIN-CORPUS)* [-t TEST-FILE]
    -v Prints debugging messages.
    -c Specifies the training text(s) for the language.
    -t Specifies the test set for the model.

Examples

generate an unigram and a bigram for each corpus and predict the training sentences using the later.

python ali.py -c "data/en.txt" -c "data/sp.txt" -c "data/fr.txt" -t "data/first10TestSentences.txt"

outputs: see output/output.md

Web App

Train the models

python train.py

Run the server

python server.py
# or
gunicorn server:app