# Russian Jeopardy! QA System Evaluation

To evaluate the answer of your Russian Jeopardy! QA system, we suggest the following steps:
1. Use the extended dataset, that contains QuestionNumber, i.e. position of the question to which the answer is given in the Topic. That will be the rank.
2. Normalize the predicted and ground-truth answers, for example with SpaCy.
3. Calculate one of the metrics: cosine similarity, Damerau-Levenstein edit distance, Jaccard distance, METEOR.
4.  Measure if the coefficient calcualted with the metric is above the minimum for cosine similarity and METEOR or below it for Damerau-Levenstein edit distance, and evaluate the answer as correct or incorrect. We suggest that for METEOR the minimum is 0.227743.
5. For any answer given by your system, add the rank of the question to the score in case of a correct answer, and subtract the rank, otherwise. If the system has the option of abstaining from the answer, the score remains the same.

In [None]:
!pip install spacy==3.2

In [None]:
import spacy
!python -m spacy download ru_core_news_lg
nlp = spacy.load("ru_core_news_lg")

In [None]:
!pip3 install -U nltk

In [None]:
import nltk
nltk.download('wordnet')
nltk.download('punkt')
nltk.download('omw-1.4')
from nltk.metrics import *
from nltk.translate import meteor_score
from nltk import word_tokenize

In [3]:
import pandas as pd

In [4]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [5]:
QA_file_path = 'YOUR PATH TO FILE Russian_QA_Jeopardy_dataset_extended.csv'
QA_dataframe = pd.read_csv(QA_file_path, sep='\t')
deeppavlov_100_answers_file = 'YOUR PATH TO FILE deeppavlov_0_99_answers.csv'
deeppavlov_dataframe = pd.read_csv(deeppavlov_100_answers_file, header=None)

In [16]:
SCORE=0

In [None]:
for i in range(100):
  predicted_answer=deeppavlov_dataframe.loc[i, 1]
  ground_truth=QA_dataframe.loc[i, 'Answer']
  rank=int(QA_dataframe.loc[i, 'QuestionNumber'])
  predicted_nlp,ground_nlp=nlp(predicted_answer), nlp(ground_truth)
  clean_predicted, clean_ground_truth=[n.lemma_ for n in predicted_nlp if not n.is_punct], [n.lemma_ for n in ground_nlp if not n.is_punct]
  meteor_coef=meteor_score.meteor_score([clean_predicted], clean_ground_truth)
  if meteor_coef>0.227743:
    SCORE+=rank
  else:
    SCORE-=rank

In [20]:
SCORE

-290