This repository contains classifier to distinguish which Indian Language a given text contains.
The following languages are supported:
Language |
---|
Hindi |
Punjabi |
Sanskrit |
Gujarati |
Kannada |
Malyalam |
Nepali |
Odia |
Marathi |
Bengali |
Tamil |
Urdu |
You can download the dataset on which model was trained from here.
Scripts to prepare the dataset can be found in dataset-preparation folder.
The classifier has 99% accuracy!
You can download the trained classifier from here.
You can download the tokenizer and vocab from here.