GitHub - goru001/indian-language-classifier: Classifier to distinguish which Indian Language a given text contains

Indian Languages' Classifier

This repository contains classifier to distinguish which Indian Language a given text contains.

The following languages are supported:

Language
Hindi
Punjabi
Sanskrit
Gujarati
Kannada
Malyalam
Nepali
Odia
Marathi
Bengali
Tamil
Urdu

Dataset

You can download the dataset on which model was trained from here.

Scripts to prepare the dataset can be found in dataset-preparation folder.

Results

The classifier has 99% accuracy!

Trained Classifier

You can download the trained classifier from here.

Tokenizer and Vocabulary

You can download the tokenizer and vocab from here.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
classification		classification
dataset-preparation		dataset-preparation
tokenizer		tokenizer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

classification

classification

dataset-preparation

dataset-preparation

tokenizer

tokenizer

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Indian Languages' Classifier

Dataset

Results

Trained Classifier

Tokenizer and Vocabulary

About

Releases

Packages

Languages

License

goru001/indian-language-classifier

Folders and files

Latest commit

History

Repository files navigation

Indian Languages' Classifier

Dataset

Results

Trained Classifier

Tokenizer and Vocabulary

About

Resources

License

Stars

Watchers

Forks

Languages