Question Answering System for Indian Languages (Hindi, Tamil)

This contains the code for the results obtained using mBERT, XLM-Roberta and indic-NLP on chaii dataset.

Dataset :

The chaii-dataset was directly used for fine-tuning and besides that we used models that have been pre-trained on xQUAD, SQUAD2 and mergedQUAD.
The chai dataset consists of :
id : unique id for each input
context : a paragraph based on which the question has to be answered
question : the question that has to be answered
answer_start : the index from which the answer starts (only train)
answer_text : the answer in string format (only train)

Models :

The models which were used are m-BERT (multilingual BERT), XLM-Roberta and indic-BERT.

Train :

To train, run the notebooks after setting the proper address for tokenizer and model.
The data that were used on kaggle are huggingface-question-answering-models, ai4bharat-indic-bert, chaii-hindi-and-tamil-question-answering.

Test :

For testing we have prepared the submission.csv file which is in the format demanded by kaggle. After running the notebook on kaggle just select the csv file from outputs and submit it.

Results

Model	jaccard_score
mBERT	0.528
XLM-RoBERTa	0.567
XLM-RoBERTa(Fine-Tuned Chaii)	0.586
XLM-RoBERTa(Fine-Tuned with mlqa,xquad, chaii)	0.626

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
metrics		metrics
notebooks		notebooks
postprocess		postprocess
preprocess		preprocess
README.md		README.md
train.ipynb		train.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Question Answering System for Indian Languages (Hindi, Tamil)

Dataset :

Models :

Train :

Test :

Results

About

Releases

Packages

Languages

jothiprakashK/QnA-for-Indian-Languages

Folders and files

Latest commit

History

Repository files navigation

Question Answering System for Indian Languages (Hindi, Tamil)

Dataset :

Models :

Train :

Test :

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages