Skip to content

Classifier to distinguish which Indian Language a given text contains

License

Notifications You must be signed in to change notification settings

goru001/indian-language-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Indian Languages' Classifier

This repository contains classifier to distinguish which Indian Language a given text contains.

The following languages are supported:

Language
Hindi
Punjabi
Sanskrit
Gujarati
Kannada
Malyalam
Nepali
Odia
Marathi
Bengali
Tamil
Urdu

Dataset

You can download the dataset on which model was trained from here.

Scripts to prepare the dataset can be found in dataset-preparation folder.

Results

The classifier has 99% accuracy!

Trained Classifier

You can download the trained classifier from here.

Tokenizer and Vocabulary

You can download the tokenizer and vocab from here.

About

Classifier to distinguish which Indian Language a given text contains

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published