<a href="https://colab.research.google.com/github/ris27hav/devrevs_domain_specific_qa/blob/main/DevRev_Experimentation_GPU.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Instructions To Experiment

1. For experimentation purposes, connect to a GPU runtime.
2. Run the implementation section of the notebook.
3. Select the parameters from the Execution section and then run the cell.
4. Run the remaining cells to get the metrics on the selected parameters.

## Implementation

### Load Data

Load validation data for testing, based on missing data in the training data from squad 2.0 dataset. Round 1 data contains themes that are not present in training data. While, round 2 data contains themes that are present in training data.

In [None]:
!pip install --upgrade --no-cache-dir gdown

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting gdown
  Downloading gdown-4.6.0-py3-none-any.whl (14 kB)
Installing collected packages: gdown
  Attempting uninstall: gdown
    Found existing installation: gdown 4.4.0
    Uninstalling gdown-4.4.0:
      Successfully uninstalled gdown-4.4.0
Successfully installed gdown-4.6.0


In [None]:
import gdown

def download_validation_data(round = 1):
    """Download the validation data (4 csv files)"""
    assert round in [1,2], "round can be 1 or 2"
    ids = [
        [
            "15WPYOD3ZLShFq_NRtiBHbpz3RTvc8ZWR",
            "15yxIF27NvEa3l12yNy6F5h8lGCJ2n7rf",
            "1Ilpxyj_0T-1KzQMdVSEbSmc1ybxOv69G",
            "1nkEDQZJY6_cAEVw3JlaKCgz0C6mDSYiv"
        ],
        [
            "1-3fMldkBVsTAX3W5JewdAdlUG_agexG0",
            "1-59pQe8TH7UaORF1RSqzFWybMJShdf1U",
            "1-AbnJRRHQiTU5zyUdDC2gUwbIGkEF5l6",
            "1-Px6FFj043L7lbAEBOAMSy2bdoPiVNhy"
        ]
    ]
    for id in ids[round-1]:
        url = f"https://drive.google.com/u/1/uc?id={id}&export=download"
        gdown.download(url, quiet=True)

### Generate Embeddings

For a given theme, break its paragraphs into sentences and store their paragraph id. Load sentence encoder and calculate embeddings for the sentences from paragraphs and the queries.

In [None]:
!pip install -U sentence-transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentence-transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 KB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting transformers<5.0.0,>=4.6.0
  Downloading transformers-4.26.0-py3-none-any.whl (6.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m79.9 MB/s[0m eta [36m0:00:00[0m
Collecting sentencepiece
  Downloading sentencepiece-0.1.97-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m78.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting huggingface-hub>=0.4.0
  Downloading huggingface_hub-0.12.0-py3-none-any.whl (190 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.3

In [None]:
import nltk
nltk.download('punkt')

def para_to_sentences(para):
    """Splits a paragraph into sentences."""
    para = para.replace('\n', ' ').replace('\t', ' ').replace('\x00', ' ')
    return nltk.sent_tokenize(para)

def load_sents_from_para(paras):
    """Spilits a list of paragraphs into sentences and returns the sentences
    and their corresponding paragraph id"""
    sents = []
    para_id = []
    for i,p in enumerate(paras):
        new_sents = para_to_sentences(p['paragraph'])
        sents += new_sents
        para_id += [i]*len(new_sents)
    return sents, para_id

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [None]:
!gdown https://drive.google.com/file/d/137tZvp-iTMR2xIogasglSH4jTTLW4_Sf/view --fuzzy --no-cookies

Downloading...
From: https://drive.google.com/uc?id=137tZvp-iTMR2xIogasglSH4jTTLW4_Sf
To: /content/finetuned_mpnet_triplet.zip
100% 405M/405M [00:03<00:00, 112MB/s]


In [None]:
!unzip /content/finetuned_mpnet_triplet.zip

Archive:  /content/finetuned_mpnet_triplet.zip
   creating: kaggle/working/finetuned_mpnet_triplet/
  inflating: kaggle/working/finetuned_mpnet_triplet/config.json  
   creating: kaggle/working/finetuned_mpnet_triplet/eval/
  inflating: kaggle/working/finetuned_mpnet_triplet/vocab.txt  
  inflating: kaggle/working/finetuned_mpnet_triplet/special_tokens_map.json  
  inflating: kaggle/working/finetuned_mpnet_triplet/pytorch_model.bin  
  inflating: kaggle/working/finetuned_mpnet_triplet/modules.json  
  inflating: kaggle/working/finetuned_mpnet_triplet/tokenizer_config.json  
   creating: kaggle/working/finetuned_mpnet_triplet/1_Pooling/
  inflating: kaggle/working/finetuned_mpnet_triplet/1_Pooling/config.json  
  inflating: kaggle/working/finetuned_mpnet_triplet/tokenizer.json  
  inflating: kaggle/working/finetuned_mpnet_triplet/config_sentence_transformers.json  
  inflating: kaggle/working/finetuned_mpnet_triplet/README.md  
   creating: kaggle/working/finetuned_mpnet_triplet/2_Norma

In [None]:
import tensorflow as tf
import tensorflow_hub as hub
import torch
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModel

def load_encoder(encoder="universal-sentence-encoder-qa-v3"):
    """Load Google's Universal Sentence Encoder for QA"""
    if encoder == "universal-sentence-encoder-qa-v3":
        module_url = "https://tfhub.dev/google/universal-sentence-encoder-qa/3"
        model = hub.load(module_url)
    elif encoder == "mpnet-base-v2":
        model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
        # model = SentenceTransformer('/content/kaggle/working/finetuned_mpnet_triplet')
    elif encoder == "mpnet-base-v2-fine-tuned":
        model = SentenceTransformer('/content/kaggle/working/finetuned_mpnet_triplet')
    elif encoder == "distilroberta-v1":
        model = SentenceTransformer('sentence-transformers/all-distilroberta-v1')
    elif encoder == "minilm-l12-v2":
        model = SentenceTransformer('sentence-transformers/all-MiniLM-L12-v2')
    elif encoder == "SimCSE":
        tokenizer = AutoTokenizer.from_pretrained("princeton-nlp/sup-simcse-bert-base-uncased")
        model = AutoModel.from_pretrained("princeton-nlp/sup-simcse-bert-base-uncased")
        device = torch.device('cuda:0')
        model.to(device)
        model = (tokenizer, model)
    else:
        raise "Unknown sentence encoder"
    return model

def get_embeddings_guse(sents, paras, para_id, model, sents_type="Context"):
    """Calculate embeddings for given list of sentences based on its type
    i.e. either its a Question or a Context"""
    if sents_type == "Question":
        return model.signatures['question_encoder'](
            tf.constant(sents)
        )['outputs']
    else:
        contexts = [
            paras[para_id[i]]['paragraph'] for i in range(len(sents))
        ]
        return model.signatures['response_encoder'](
            input = tf.constant(sents),
            context = tf.constant(contexts)             # can play with this
        )['outputs']

def get_embeddings_st(sents, model):
    return model.encode(sents)

def get_embeddings_simcse(sents, bundle):
    tokenizer, model = bundle
    tokens = tokenizer(sents, padding=True, truncation=True, return_tensors="pt")
    device = torch.device('cuda:0')
    tokens = tokens.to(device)
    with torch.no_grad():
        embeds = model(**tokens, output_hidden_states=True, return_dict=True).pooler_output
    return embeds.cpu()

def get_embeddings(encoder_name, sents, paras, para_id, model, sents_type="Context"):
    if encoder_name == "universal-sentence-encoder-qa-v3":
        return get_embeddings_guse(sents, paras, para_id, model, sents_type)
    elif encoder_name in ["mpnet-base-v2", "distilroberta-v1", "minilm-l12-v2", "mpnet-base-v2-fine-tuned"]:
        return get_embeddings_st(sents, model)
    elif encoder_name == "SimCSE":
        return get_embeddings_simcse(sents, model)
    else:
        raise "Unknown Sentence Encoder"

### Nearest Neighbour Search

Based on the embeddings calculated, indexes them based on L2 distance and then applies nearest neighbour search to get top k closest sentences for each query

In [None]:
!pip install -U transformers faiss-gpu

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting faiss-gpu
  Downloading faiss_gpu-1.7.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (85.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.5/85.5 MB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: faiss-gpu
Successfully installed faiss-gpu-1.7.2


In [None]:
import faiss
import numpy as np

def get_k_nearest_neighbours(source_embeds, target_embeds, k = 10):
    """Returns k nearest neighbours of target_embeds in source_embeds"""
    index = faiss.IndexFlatL2(source_embeds.shape[1])
    index.add(np.array(source_embeds))
    return index.search(np.array(target_embeds), k)

### Check previously answered queries

### Context Generation

Generates a context for a given query and its nearest neighbours. Also provides a method to get the paragraph id given the start idx of the answer.

In [None]:
def get_context(query_id, query, sents, paras, para_ids, nearest_neighbours, distances, option, m, ctx_threshold, distance_threshold):
    """Generate the context for a given query and store the para_id for
    each sentence"""
    if option in [1, 2]:
        if option == 1:
            m = 0
        context = ""
        context_para_ids, sent_length = [], []
        for sent_id, dist in zip(nearest_neighbours, distances):
            # if dist > distance_threshold*distances[0]:
            #     break
            for j in range(-m, m+1):
                cur_id = sent_id + j
                if cur_id >= 0 and cur_id < len(para_ids) and para_ids[sent_id] == para_ids[cur_id]:
                    context += sents[cur_id] + ' '
                    context_para_ids.append(paras[para_ids[cur_id]]['id'])
                    sent_length.append(len(sents[cur_id]))
            if len(context.split()) >= ctx_threshold:
                break
    # else:

    sum = -1
    for i in range(len(sent_length)):
        sum += sent_length[i] + 1
        sent_length[i] = sum
    return context.strip(), context_para_ids, sent_length

def para_id_retriever(start_idx, sent_length, context_para_ids):
    """Given start index of the answer, return the id of the paragraph
    in which the answer belongs"""
    if start_idx == -1:
        return -1
    for j in range(len(sent_length)):
        if start_idx <= sent_length[j]:
            return context_para_ids[j]
    return context_para_ids[-1]

### Load QA model

Given a theme, download the corresponding fine-tuned QA model and load the QA pipeline

In [None]:
!pip install transformers sentencepiece
# !pip install optimum[onnxruntime]
!pip install optimum[onnxruntime-gpu]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting optimum[onnxruntime-gpu]
  Downloading optimum-1.6.3-py3-none-any.whl (227 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.4/227.4 KB[0m [31m17.3 MB/s[0m eta [36m0:00:00[0m
Collecting coloredlogs
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.0/46.0 KB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
Collecting evaluate
  Downloading evaluate-0.4.0-py3-none-any.whl (81 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.4/81.4 KB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets>=1.2.1
  Downloading datasets-2.9.0-py3-none-any.whl (462 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m462.8/462.8 KB[0m [31m40.9 MB/s[0m eta

In [None]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
from optimum.onnxruntime import ORTModelForQuestionAnswering, ORTOptimizer, ORTQuantizer
from optimum.onnxruntime.configuration import OptimizationConfig, AutoQuantizationConfig
from optimum.pipelines import pipeline


# load vanilla transformers and convert to onnx
def load_optimized_model_pipeline(model_id, save_path, use_onnx, optimize, quantize):
    task = 'question-answering'
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    if use_onnx:
        # model = ORTModelForQuestionAnswering.from_pretrained(model_id, from_transformers=True)
        model = ORTModelForQuestionAnswering.from_pretrained(model_id, from_transformers=True, provider="CUDAExecutionProvider")
    else:
        model = AutoModelForQuestionAnswering.from_pretrained(model_id)

    if optimize:
        optimizer = ORTOptimizer.from_pretrained(model)
        optimization_config = OptimizationConfig(optimization_level=99) # enable all optimizations
        optimizer.optimize(save_dir=save_path, optimization_config=optimization_config)
        # model = ORTModelForQuestionAnswering.from_pretrained(save_path, file_name='model_optimized.onnx')
        model = ORTModelForQuestionAnswering.from_pretrained(save_path, file_name='model_optimized.onnx', provider="CUDAExecutionProvider")

    if quantize:
        quantizer = ORTQuantizer.from_pretrained(model)
        qconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=True)
        quantizer.quantize(save_dir=save_path, quantization_config=qconfig)
        # model = ORTModelForQuestionAnswering.from_pretrained(save_path, file_name="model_optimized_quantized.onnx")
        model = ORTModelForQuestionAnswering.from_pretrained(save_path, file_name="model_optimized_quantized.onnx", provider="CUDAExecutionProvider")

    # save onnx checkpoint and tokenizer
    optimum_qa = pipeline(
        task, model=model, tokenizer=tokenizer, handle_impossible_answer=True
    )

    return optimum_qa

In [None]:
import gdown
import json


def download_qa_models():
    file_url = "https://drive.google.com/file/d/1912D_F3GAkUmFOGQqf5boZaaz5g5-1RC/view?usp=sharing"
    gdown.download(url=file_url, output='model_urls.json', quiet=False, fuzzy=True)
    with open('model_urls.json') as fo:
        model_urls = json.load(fo)
    model_urls = model_urls["urls"]
    for url in model_urls:
        gdown.download_folder(url, quiet=True, use_cookies=False)
    print("All models downloaded successfully.")

def create_mapping():
    file_url = "https://drive.google.com/file/d/1P6dp7f2m67-iPaUbaNZiDYTmTH7Mw9ec/view?usp=share_link"
    gdown.download(url=file_url, output='clusters.json', quiet=False, fuzzy=True)
    mapping = {}
    with open('clusters.json') as fo:
        map = json.load(fo)
    for _, cluster_dict in map.items():
        for cluster, themes in cluster_dict.items():
            cluster = int(cluster)
            for theme in themes:
                mapping[theme] = cluster
    return mapping

def load_fine_tuned_model(theme, mapping):
    cluster = mapping[theme]
    task = 'question-answering'
    model_id = f'/content/qamodels/electra-base-squad2-finetuned-squad-{cluster}'

    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = ORTModelForQuestionAnswering.from_pretrained(model_id, from_transformers=True, provider="CUDAExecutionProvider")

    optimum_qa = pipeline(
        task, model=model, tokenizer=tokenizer, handle_impossible_answer=True
    )
    return optimum_qa


In [None]:
import gdown
import json
from zipfile import ZipFile

def load_model_links(json_link):
    """Downloads the JSON that contains the links to models and tokenizer"""
    gdown.download(json_link, quiet=True)
    with open('theme_wise_models.json') as f:
        model_links = json.load(f)
    return model_links

def load_theme_model_pipeline(theme, tokenizer_link, model_links, model_id=""):
    """Given a theme, loads the corresponding QA model"""
    task = "question-answering"
    if not model_id:
        gdown.download(tokenizer_link, "tokenizer.zip", quiet=True)
        gdown.download(model_links[theme]['link'], "model.zip", quiet=True)
        with ZipFile("tokenizer.zip", 'r') as zObject:
            zObject.extractall()
        with ZipFile("model.zip", 'r') as zObject:
            zObject.extractall()
        tokenizer_path = "tokenizer"
        model_path = model_links[theme]['path']
        model = ORTModelForQuestionAnswering.from_pretrained(model_path)  # Vanilla model
        model = ORTModelForQuestionAnswering.from_pretrained(
            model_path, file_name="model_optimized_quantized.onnx", provider="CUDAExecutionProvider"
        )
        # model = ORTModelForQuestionAnswering.from_pretrained(
        #     model_path, file_name="model_optimized_quantized.onnx"
        # )                                        # Optimized and Quantized Model
        tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
    else:
        model = ORTModelForQuestionAnswering.from_pretrained(
            model_id, from_transformers=True, provider="CUDAExecutionProvider"
        )
        # model = ORTModelForQuestionAnswering.from_pretrained(
        #     model_id, from_transformers=True
        # )
        tokenizer = AutoTokenizer.from_pretrained(model_id)
    optimum_qa = pipeline(
        task, model=model, tokenizer=tokenizer, handle_impossible_answer=True
    )
    return optimum_qa

### Run QA pipeline

Predicts the answer given query and context in the required format

In [None]:
def predict(query_id, query, context, qa_model, pred_paras, sent_length, context_para_ids):
    """Predict the answer given a query and a context"""
    prediction = qa_model(question=query, context=context)
    ans = {
        "question_id": query_id,
        "answers": [prediction['answer']],
        "paragraph_id": -1,
        "context": context                # Extra info
    }
    if prediction['answer'] != "":
        ans["paragraph_id"] = para_id_retriever(
            prediction['start'], sent_length, context_para_ids
        )
    return ans

In [None]:
def divide_context(passes, context, sent_length):
    context_list = []
    rem_sents = len(sent_length)
    passes = min(passes, rem_sents)
    passes_left = passes
    i, end = -1, -1
    for j in range(passes):
        i += int(rem_sents / passes_left)
        context_list.append(context[end+1:sent_length[i]])
        end = sent_length[i]
        rem_sents -= rem_sents / passes_left
        passes_left -= 1
    return context_list

def get_best_prediction(query, context_list, qa_model):
    best_prediction = {'answer': ""}
    text_start = 0
    for context in context_list:
        prediction = qa_model(question=query, context=context)
        if prediction['answer'] != "" and (best_prediction['answer'] == "" or prediction['score'] > best_prediction['score']):
            prediction['start'] += text_start
            best_prediction = prediction
        text_start += len(context)
    return best_prediction

def multiple_pass_prediction(passes, query_id, query, context, qa_model, pred_paras, sent_length, context_para_ids):
    context_list = divide_context(passes, context, sent_length)
    prediction = get_best_prediction(query, context_list, qa_model)
    ans = {
        "question_id": query_id,
        "answers": [prediction['answer']],
        "paragraph_id": -1,
        "context": context                # Extra info
    }
    if prediction['answer'] != "":
        ans["paragraph_id"] = para_id_retriever(
            prediction['start'], sent_length, context_para_ids
        )
    return ans

In [None]:
import time
from tqdm import tqdm

def predict_theme_wise(paras, ques, pred_out, encoder_name, sents_encoder, qa_pipeline, ctx_option, k, m, ctx_threshold, distance_threshold, qa_passes):
    """Predicts the answers for all queries of a particular theme"""
    ann_inference_time, qna_inference_time = 0., 0.
    theme = ques[0]["theme"]
    print(f'Theme: {theme}')

    # Preprocessing of contexts
    sents, para_id = load_sents_from_para(paras)
    sents_embed = get_embeddings(
        encoder_name, sents, paras, para_id, sents_encoder, sents_type="Context"
    )

    # Nearest Neighbour Search
    start_time = time.time()
    ques_list = [q['question'] for q in ques]
    ques_embed = get_embeddings(
        encoder_name, ques_list, None, None, sents_encoder, sents_type="Question"
    )
    D, I = get_k_nearest_neighbours(sents_embed, ques_embed, k)
    ann_inference_time = (time.time() - start_time)*1000.

    pred_paras = [
        [paras[para_id[sent_idx]]['id'] for sent_idx in I[i]]
        for i in range(len(I))
    ]

    start_time = time.time()
    for i in tqdm(range(len(ques))):
        q = ques[i]

        # Context Generation
        context, context_para_ids, sent_length = get_context(
            q["id"], q['question'], sents, paras, para_id, I[i], D[i],
            ctx_option, m, ctx_threshold, distance_threshold
        )

        # Answer Prediction and Paragraph Retrieval
        ans = multiple_pass_prediction(
            qa_passes, q["id"], q['question'], context, qa_pipeline,
            pred_paras[i], sent_length, context_para_ids
        )
        pred_out.append(ans)

    # Print Inference Time
    qna_inference_time = (time.time() - start_time)*1000.
    print(
        f'Avg. ANN IT = {round(ann_inference_time/len(ques), 2)} ms, ' +
        f'Avg. QnA IT = {round(qna_inference_time/len(ques),2)} ms\n'
    )
    return (ann_inference_time, qna_inference_time)

In [None]:
import pandas as pd

def predict_multiple_themes(params):
    """Predicts the answers for queries from multiple (num_themes) themes"""
    # Load paras and queries
    paragraphs = json.loads(pd.read_csv("input_paragraph.csv").to_json(orient="records"))
    questions = json.loads(pd.read_csv("input_question.csv").to_json(orient="records"))
    theme_intervals = json.loads(pd.read_csv("theme_interval.csv").to_json(orient="records"))
    pred_out = []
    theme_inf_time = {}

    # Number of themes for prediction
    if params['num_themes'] == -1 or params['num_themes'] > len(theme_intervals):
        params['num_themes'] = len(theme_intervals)

    # if using pretrained model
    if params['use_pretrained']:
        qa_pipeline = load_optimized_model_pipeline(
            params['model_id'], '/content/model.onnx', params['use_onnx'],
            params['optimize'], params['quantize']
        )

    # Predict for each theme
    themes = [
        'IPod', 'Wayback_Machine', 'Web_browser', 'DevRev',
        '2008_Sichuan_earthquake', 'Nanjing', 'Canadian_Armed_Forces',
        'Cardinal_(Catholicism)', 'Heresy', 'Mary_(mother_of_Jesus)',
        'Human_Development_Index', 'Warsaw_Pact', 'Materialism',
        'Pub', 'Southampton', 'Catalan_language', 'Dialect', 'Paper',
        'Adult_contemporary_music', 'Hard_rock', 'The_Times',
        'United_States_dollar', 'Immunology', 'Imamah_(Shia_doctrine)',
        'Grape', 'Everton_F.C.', 'Great_Plains', 'Biodiversity',
        'Federal_Bureau_of_Investigation', 'Unknown'
    ]
    for theme_interval in theme_intervals[:params['num_themes']]:
        theme = theme_interval["theme"]
        if theme not in themes:
            continue
        if not params['use_pretrained']:
            qa_pipeline = load_theme_model_pipeline(
                theme, params['tok_link'], params['qam_links'],
            )
        theme_ques = questions[int(theme_interval["start"]) - 1: int(theme_interval["end"])]
        theme_paras = [p for p in paragraphs if p["theme"] == theme]
        execution_time = predict_theme_wise(
            theme_paras, theme_ques, pred_out, params['encoder_name'],
            params['encoder'], qa_pipeline, params['ctx_option'], params['k'],
            params['m'], params['ctx_threshold'], params['distance_threshold'],
            params['qa_passes']
        )
        theme_inf_time[theme] = execution_time

    # Export predictions
    pred_df = pd.DataFrame.from_records(pred_out)
    pred_df.to_csv('output_prediction.csv', index=False)

    return theme_inf_time

### Evaluation

Evaluates and prints statistics of the predictions by the given pipeline. Metrics include the F1 Score, Paragraph Accuracy, Mean Rank of the gold paragraph, performance on true positives and negatives, inference times, etc.

In [None]:
import string, re
from collections import Counter

def normalize_answer(s):
    """Lower text and remove punctuation, articles and extra whitespace."""
    def remove_articles(text):
        regex = re.compile(r'\b(a|an|the)\b', re.UNICODE)
        return re.sub(regex, ' ', text)

    def white_space_fix(text):
        return ' '.join(text.split())

    def remove_punc(text):
        exclude = set(string.punctuation)
        return ''.join(ch for ch in text if ch not in exclude)

    def lower(text):
        return text.lower()

    return white_space_fix(remove_articles(remove_punc(lower(s))))


def get_tokens(s):
    if not s:
        return []
    return normalize_answer(s).split()


def calc_f1(a_gold, a_pred):
    """Calulates F1 score, given prediction and a gold answer"""
    gold_toks = get_tokens(a_gold)
    pred_toks = get_tokens(a_pred)
    common = Counter(gold_toks) & Counter(pred_toks)
    num_same = sum(common.values())

    if len(gold_toks) == 0 or len(pred_toks) == 0:
        # If either is no-answer, then F1 is 1 if they agree, 0 otherwise
        return int(gold_toks == pred_toks)

    if num_same == 0:
        return 0

    precision = 1.0 * num_same / len(pred_toks)
    recall = 1.0 * num_same / len(gold_toks)
    f1 = (2 * precision * recall) / (precision + recall)
    return f1


def calc_max_f1(predicted, ground_truths):
    """Calulates the max F1 score, given prediction and the gold answers"""
    max_f1 = 0
    for ground_truth in ground_truths:
        f1 = calc_f1(str(predicted), str(ground_truth))
        max_f1 = max(max_f1, f1)
    return max_f1

In [None]:
from ast import literal_eval
import csv
import pandas as pd

def evaluate_metrics():
    """Calculate metrics using the predictions and the ground truths"""
    metrics = {}
    # Load questions, prediction and ground_truth csv
    questions = pd.read_csv("input_question.csv")
    pred = pd.read_csv("output_prediction.csv")
    truth = pd.read_csv("ground_truth.csv")

    # String to list and numbers conversion
    truth.paragraph_id = truth.paragraph_id.apply(literal_eval)
    truth.answers = truth.answers.apply(literal_eval)
    pred.answers = pred.answers.apply(literal_eval)

    preds_eval = [
        ['theme', 'question', 'gold_para', 'gold_ans', 'context_used',
         'pred_para', 'pred_ans', 'ans_in_context', 'para_acc', 'f1']
    ]

    # Go thorugh each prediction and update the metrics
    for idx in pred.index:
        q_id = pred["question_id"][idx]
        q_rows = questions.loc[questions['id'] == q_id].iloc[-1]
        theme = q_rows["theme"]
        truth_row = truth.loc[truth['question_id'] == q_id].iloc[-1]
        truth_paragraph_id = [ int(i) for i in truth_row["paragraph_id"] ]
        predicted_paragraph = pred["paragraph_id"][idx]
        predicted_ans = pred["answers"][idx][0]

        cur_pred = [
            theme, q_rows['question'], truth_paragraph_id, truth_row["answers"],
            pred['context'][idx], predicted_paragraph, predicted_ans, 0, 0, 0.
        ]

        if theme not in metrics.keys():
            metrics[theme] = {
                "total_positive": 0,
                "total_negative": 0,
                "true_positive": 0,
                "true_negative": 0,
                'ansInCtx': 0,
                "total_predictions": 0,
                "total_ctx_len": 0,
                "f1_sum": 0
            }

        if truth_paragraph_id == []:
            metrics[theme]["total_negative"] += 1
        else:
            metrics[theme]["total_positive"] += 1
            for ans in truth_row["answers"]:
                if ans in pred['context'][idx]:
                    metrics[theme]["ansInCtx"] += 1
                    cur_pred[7] = 1
                    break

        metrics[theme]["total_ctx_len"] += len(pred['context'][idx].split())

        if predicted_paragraph in truth_paragraph_id:
            # Increase TP for that theme.
            metrics[theme]["true_positive"] = metrics[theme]["true_positive"] + 1
            cur_pred[8] = 1

        # -1 prediction in case there is no paragraph which can answer the query.
        if predicted_paragraph == -1 and truth_paragraph_id == []:
            # Increase TN.
            metrics[theme]["true_negative"] = metrics[theme]["true_negative"] + 1
            cur_pred[8] = 1

        # Increase total predictions for that theme.
        metrics[theme]["total_predictions"] = metrics[theme]["total_predictions"] + 1
        if truth_row["answers"] == []:
            truth_row["answers"] = [""]
        f1 = calc_max_f1(predicted_ans, truth_row["answers"])
        metrics[theme]["f1_sum"] = metrics[theme]["f1_sum"] + f1

        cur_pred[9] = f1
        preds_eval.append(cur_pred)

    with open('predictions_evaluation.csv', 'w') as f:
        write = csv.writer(f)
        write.writerows(preds_eval)

    return metrics

In [None]:
def show(val, sz, dec = 0):
    """Prints the value and adds whitespaces so that characters printed = sz"""
    val_str = str(round(val, dec))
    return ' '*(max(0, sz - len(val_str))) + val_str


def calculate_score(theme_inf_time, inf_time_threshold = 1000.0):
    """Calculates and prints theme-wise as well as aggregated metrics score"""
    metrics = evaluate_metrics()
    final_para_score = 0.0
    final_qa_score = 0.0
    q, aic, tait, tqit, totp, totn, tp, tn, tf1 = 0, 0, 0., 0., 0, 0, 0, 0, 0.
    total_ctx_len = 0.0

    print('Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx'
        ' % | TP % (TotP) | TN % (TotN) | Para Acc | F1 Score | Avg Ctx Size'
    )
    print('------------------|---------|-----------------------------|-------'
        '-----|-------------|-------------|----------|----------|-------------'
    )

    # Print theme wise metrics score
    for theme in metrics:
        inf_time_score = 1.0
        metric = metrics[theme]
        para_score = (metric["true_positive"] + metric["true_negative"]) / metric["total_predictions"]
        qa_score = metric["f1_sum"] / metric["total_predictions"]
        avg_ann_inf_time = theme_inf_time[theme][0] / metric["total_predictions"]
        avg_qna_inf_time = theme_inf_time[theme][1] / metric["total_predictions"]
        ctx_length = metric["total_ctx_len"] * 1. / metric["total_predictions"]

        avg_inf_time = avg_ann_inf_time + avg_qna_inf_time
        if avg_inf_time > inf_time_threshold:
            inf_time_score = inf_time_threshold / avg_inf_time

        q += metric["total_predictions"]
        aic += metric["ansInCtx"]
        tait += theme_inf_time[theme][0]
        tqit += theme_inf_time[theme][1]
        totp += metric["total_positive"]
        totn += metric["total_negative"]
        tp += metric["true_positive"]
        tn += metric["true_negative"]
        tf1 += metric["f1_sum"]
        total_ctx_len += metric["total_ctx_len"]
        final_qa_score += inf_time_score * qa_score
        final_para_score += inf_time_score * para_score

        print(f'{(theme + " "*17)[:17]} | '
            f'{show(metric["total_predictions"],7)} | '
            f'{show(avg_ann_inf_time,6,2)} + {show(avg_qna_inf_time,6,2)} = '
            f'{show(avg_inf_time,6,2)} ms | '
            f'{show(metric["ansInCtx"]*100./max(1,metric["total_positive"]),8,2)} '
            f'% | {show(int(metric["true_positive"]*100./max(1,metric["total_positive"])),3)}% '
            f'({show(metric["total_positive"],4)}) | '
            f'{show(int(metric["true_negative"]*100./max(1,metric["total_negative"])),3)}% '
            f'({show(metric["total_negative"],4)}) | '
            f'{show(para_score,8,4)} | {show(qa_score,8,4)} | {show(ctx_length,7,1)}')

    final_qa_score /= len(metrics)
    final_para_score /= len(metrics)
    # Print Aggregated Metrics Score
    print(f'------------------|---------|-----------------------------|'
        f'------------|-------------|-------------|----------|----------'
        f'|--------------')
    print(f'Grand Total       | {show(q,7)} | {show(tait/max(1,q),6,2)} + '
        f'{show(tqit/max(1,q),6,2)} = {show((tait+tqit)/max(1,q),6,2)} ms |'
        f'{show(aic*100./max(1,totp),9,2)} % | {show(int(tp*100./max(1,totp)),3)}% '
        f'({show(totp,4)}) | {show(int(tn*100./max(1,totn)),3)}% '
        f'({show(totn,4)}) | {show((tp+tn)/max(1,q),8,4)} | '
        f'{show(tf1/max(1,q),8,4)} | {show(total_ctx_len/q,7,1)}')


## Execution

In [None]:
# Download validation data
# data 1 contains queries for new themes, while data 2 contains queries for old themes
validation_data = 1 #@param ["1", "2"] {type:"raw"}
# Choose -1 to test on all themes
num_themes_to_test = 10 #@param {type:"integer"}

download_validation_data(round = validation_data)

# Dowload sentence encoder and qa models
sentence_encoder = "SimCSE" #@param ["universal-sentence-encoder-qa-v3", "mpnet-base-v2", "distilroberta-v1", "minilm-l12-v2", "SimCSE", "mpnet-base-v2-fine-tuned"]
sents_encoder = load_encoder(sentence_encoder)
# %cd '/content/qamodels'
# download_qa_models()
# %cd '..'
# models_mapping = create_mapping()

Downloading (…)okenizer_config.json:   0%|          | 0.00/252 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/689 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/438M [00:00<?, ?B/s]

In [None]:
indexing_library = "faiss" #@param ["faiss"]
search_previously_answered_queries = False #@param {type:"boolean"}
context_generation = "top-k nearest sentences" #@param ["top-k nearest sentences", "top-k nearest sentences with window of m sentences", "paragraphs of top k sentences"]
context_option = 1
if context_generation == "top-k nearest sentences with window of m sentences":
    context_option = 2
elif context_generation == "paragraphs of top k sentences":
    context_option = 3
k = 10 #@param {type:"slider", min:1, max:15, step:1}
m = 1 #@param {type:"slider", min:1, max:3, step:1}
context_length_threshold = 205 #@param {type:"slider", min:100, max:250, step:5}
distance_threshold = 1.8 #@param {type:"slider", min:1.0, max:1.8, step:0.05}

use_pretrained_model_for_QA = True #@param {type:"boolean"}
model_id = "PremalMatalia/electra-base-best-squad2" #@param ["twmkn9/albert-base-v2-squad2", "deepset/minilm-uncased-squad2", "deepset/electra-base-squad2", "deepset/roberta-base-squad2", "deepset/deberta-v3-base-squad2", "deepset/roberta-base-squad2-distilled", "PremalMatalia/electra-base-best-squad2", "PremalMatalia/roberta-base-best-squad2"]
use_onnx = True #@param {type:"boolean"}
optimize_model = True #@param {type:"boolean"}
quantize_model = False #@param {type:"boolean"}
tokenizer_link = "https://drive.google.com/u/1/uc?id=1Rq9kXnOpbY1FsDBjHtlx4i7scrnk_0A9&export=download" #@param {type:"string"}
model_links_json = "https://drive.google.com/u/1/uc?id=1usU8GcPTzIakelkJd7ChvQGxqwlEJxoz&export=download" #@param {type:"string"}
num_qa_passes = 1 #@param {type:"slider", min:1, max:5, step:1}

params = {
    'encoder_name': sentence_encoder,
    'encoder': sents_encoder,
    # 'models_mapping': models_mapping,
    'tok_link': tokenizer_link,
    'ctx_option': context_option,
    'k': k,
    'm': m,
    'ctx_threshold': context_length_threshold,
    'distance_threshold': distance_threshold,
    'use_pretrained': use_pretrained_model_for_QA,
    'model_id': model_id,
    'use_onnx': use_onnx,
    'optimize': optimize_model,
    'quantize': quantize_model,
    'num_themes': num_themes_to_test,
    'qa_passes': num_qa_passes
}

In [None]:
theme_inf_time = predict_multiple_themes(params)

Downloading (…)okenizer_config.json:   0%|          | 0.00/549 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/783 [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/436M [00:00<?, ?B/s]



Theme: IPod


100%|██████████| 326/326 [00:07<00:00, 42.86it/s]


Avg. ANN IT = 1.57 ms, Avg. QnA IT = 23.36 ms

Theme: 2008_Sichuan_earthquake


100%|██████████| 521/521 [00:11<00:00, 45.08it/s]


Avg. ANN IT = 1.16 ms, Avg. QnA IT = 22.2 ms

Theme: Wayback_Machine


100%|██████████| 208/208 [00:04<00:00, 49.62it/s]


Avg. ANN IT = 1.82 ms, Avg. QnA IT = 20.18 ms

Theme: Canadian_Armed_Forces


100%|██████████| 396/396 [00:08<00:00, 47.00it/s]


Avg. ANN IT = 1.15 ms, Avg. QnA IT = 21.29 ms

Theme: Cardinal_(Catholicism)


100%|██████████| 322/322 [00:06<00:00, 46.58it/s]


Avg. ANN IT = 1.87 ms, Avg. QnA IT = 21.48 ms

Theme: Human_Development_Index


100%|██████████| 168/168 [00:03<00:00, 42.45it/s]


Avg. ANN IT = 1.49 ms, Avg. QnA IT = 23.6 ms

Theme: Heresy


100%|██████████| 204/204 [00:04<00:00, 46.15it/s]


Avg. ANN IT = 1.57 ms, Avg. QnA IT = 21.69 ms

Theme: Warsaw_Pact


100%|██████████| 146/146 [00:03<00:00, 41.85it/s]


Avg. ANN IT = 1.41 ms, Avg. QnA IT = 23.94 ms

Theme: Materialism


100%|██████████| 203/203 [00:04<00:00, 45.69it/s]


Avg. ANN IT = 1.6 ms, Avg. QnA IT = 21.91 ms

Theme: Pub


100%|██████████| 356/356 [00:07<00:00, 48.58it/s]


Avg. ANN IT = 1.86 ms, Avg. QnA IT = 20.61 ms



In [None]:
# Key Metrics -
#
# AIT - Average Inference Time
#   ANN: AIT for Approximate Nearest Neighbour Search (including time for encoding queries)
#   QnA: AIT for Question Answering on generated context
# ansInCtx - For how many "answerable" queries, the answer exists in the generated context? (in %)
# TP - True Postives
# TN - True Negatives
# TotP - Total Positives
# TotN - Total Negatives
# Para Acc - Paragraph Accuracy for retrieved paragraph id
# F1 SCore - F1 Score of the answer predicted
# Avg Ctx Size - Average length of the generated context (number of words)

In [None]:
calculate_score(theme_inf_time, inf_time_threshold = 1000.0) # with simcse and electra-best (optimized)

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | F1 Score | Avg Ctx Size
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|-------------
IPod              |     326 |   1.46 +  20.07 =  21.53 ms |    79.45 % |  76% ( 326) |   0% (   0) |   0.7638 |   0.7091 |   202.3
2008_Sichuan_eart |     521 |   1.21 +  22.36 =  23.57 ms |    81.19 % |  77% ( 521) |   0% (   0) |   0.7735 |   0.7462 |   212.4
Wayback_Machine   |     208 |   1.91 +  22.31 =  24.21 ms |    81.01 % |  64% (  79) |  89% ( 129) |   0.7981 |   0.8011 |   205.5
Canadian_Armed_Fo |     396 |   1.19 +  21.55 =  22.73 ms |    74.86 % |  71% ( 179) |  91% ( 217) |   0.8258 |   0.8018 |   222.0
Cardinal_(Catholi |     322 |    2.0 +  23.33 =  25.33 ms |     73.5 % |  66% ( 117) |  92% ( 205) |   0.8323 |    0.843 |   216.4
Human_Development |     168 |   1.53 +  22.56 =   24.1 ms |    74.67 % | 

## Results

### Experimentation

#### Pretrained Question Answering Models
electra vs **electra-best** vs roberta-large vs roberta-distilled vs deberta vs albert vs minilm vs roberta

In [None]:
# with mpnet and electra-base (optimized)

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | RPinTopK % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |   1.07 +  14.46 =  15.53 ms |    98.16 % |  82% ( 326) |   0% (   0) |   0.8221 |  0.82209 |   0.8183 |  0.81834
2008_Sichuan_eart |     521 |   0.91 +  17.42 =  18.33 ms |    94.43 % |  80% ( 521) |   0% (   0) |   0.8061 |  0.80614 |   0.7967 |  0.79671
Wayback_Machine   |     208 |    1.1 +  17.33 =  18.44 ms |    98.08 % |  69% (  79) |  93% ( 129) |   0.8462 |  0.84615 |   0.8526 |  0.85256
Canadian_Armed_Fo |     396 |   0.89 +  21.18 =  22.07 ms |    98.23 % |  82% ( 179) |  94% ( 217) |   0.8889 |  0.88889 |   0.8779 |  0.87788
Cardinal_(Catholi |     322 |   1.12 +  18.03 =  19.15 ms |     97.2 % |  71% ( 117) |  95% ( 205) |   0.8696 |  0.86957 |   0.8704 |  0.87045

In [None]:
# with mpnet and electra-best (optimized)

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |   1.14 +  15.67 =   16.8 ms |    93.87 % |  87% ( 326) |   0% (   0) |   0.8742 |  0.87423 |   0.8597 |  0.85967
2008_Sichuan_eart |     521 |   0.97 +  18.65 =  19.62 ms |    93.09 % |  86% ( 521) |   0% (   0) |   0.8656 |  0.86564 |   0.8561 |  0.85607
Wayback_Machine   |     208 |   1.18 +  18.68 =  19.86 ms |    88.61 % |  73% (  79) |  93% ( 129) |   0.8558 |  0.85577 |   0.8619 |  0.86194
Canadian_Armed_Fo |     396 |   0.94 +  21.54 =  22.47 ms |    93.85 % |  88% ( 179) |  92% ( 217) |   0.9066 |  0.90657 |   0.8851 |  0.88513
Cardinal_(Catholi |     322 |   1.22 +  22.73 =  23.95 ms |    88.89 % |  79% ( 117) |  93% ( 205) |    0.882 |  0.88199 |   0.8752 |  0.87523

In [None]:
# with mpnet and roberta-large (optimized)

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |   1.28 +  45.97 =  47.25 ms |    93.87 % |  88% ( 326) |   0% (   0) |   0.8834 |  0.88344 |   0.8505 |   0.8505
2008_Sichuan_eart |     521 |   0.97 +  57.54 =  58.51 ms |    93.09 % |  85% ( 521) |   0% (   0) |   0.8599 |  0.85988 |   0.8329 |  0.83294
Wayback_Machine   |     208 |    1.2 +  55.56 =  56.76 ms |    88.61 % |  77% (  79) |  95% ( 129) |   0.8846 |  0.88462 |   0.8802 |   0.8802
Canadian_Armed_Fo |     396 |   0.99 +  70.48 =  71.47 ms |    93.85 % |  82% ( 179) |  95% ( 217) |   0.8939 |  0.89394 |   0.8676 |  0.86757
Cardinal_(Catholi |     322 |   1.29 +  61.88 =  63.17 ms |    88.89 % |  82% ( 117) |  98% ( 205) |   0.9255 |  0.92547 |   0.9199 |  0.91987

In [None]:
# with mpnet and roberta-distilled (optimized)

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | RPinTopK % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |    1.2 +  15.33 =  16.53 ms |    98.16 % |  87% ( 326) |   0% (   0) |   0.8773 |   0.8773 |   0.8228 |  0.82285
2008_Sichuan_eart |     521 |   0.88 +   21.6 =  22.47 ms |    94.43 % |  85% ( 521) |   0% (   0) |   0.8599 |  0.85988 |   0.8014 |  0.80144
Wayback_Machine   |     208 |   1.17 +  18.54 =  19.71 ms |    98.08 % |  67% (  79) |  90% ( 129) |   0.8173 |  0.81731 |   0.8096 |  0.80964
Canadian_Armed_Fo |     396 |   0.86 +  21.46 =  22.32 ms |    98.23 % |  83% ( 179) |  93% ( 217) |   0.8889 |  0.88889 |   0.8337 |  0.83366
Cardinal_(Catholi |     322 |   1.14 +   18.9 =  20.04 ms |     97.2 % |  79% ( 117) |  94% ( 205) |   0.8882 |   0.8882 |   0.8718 |  0.87177

In [None]:
# with mpnet and deberta base (no onnx support)

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | RPinTopK % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |   1.06 + 888.17 = 889.23 ms |    98.16 % |  86% ( 326) |   0% (   0) |    0.865 |  0.86503 |   0.7159 |   0.7159
2008_Sichuan_eart |     521 |   0.88 + 1038.32 = 1039.2 ms |    94.43 % |  86% ( 521) |   0% (   0) |   0.8618 |  0.82929 |   0.7841 |  0.75456
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
Grand Total       |     847 |   0.95 + 980.53 = 981.48 ms |    95.87 % |  86% ( 847) |   0% (   0) |    0.863 |  0.84716 |   0.7579 |  0.73523


In [None]:
# with mpnet and albert_base (optimized)

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | RPinTopK % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |   1.06 +  15.24 =   16.3 ms |    98.16 % |  84% ( 326) |   0% (   0) |   0.8466 |  0.84663 |   0.7069 |  0.70693
2008_Sichuan_eart |     521 |   0.92 +  17.27 =  18.19 ms |    94.43 % |  86% ( 521) |   0% (   0) |   0.8618 |   0.8618 |   0.7469 |   0.7469
Wayback_Machine   |     208 |    1.1 +  17.62 =  18.72 ms |    98.08 % |  63% (  79) |  93% ( 129) |   0.8173 |  0.81731 |   0.8022 |   0.8022
Canadian_Armed_Fo |     396 |   0.87 +  22.36 =  23.22 ms |    98.23 % |  74% ( 179) |  92% ( 217) |   0.8434 |  0.84343 |   0.7722 |  0.77218
Cardinal_(Catholi |     322 |    1.5 +  20.78 =  22.28 ms |     97.2 % |  77% ( 117) |  94% ( 205) |    0.882 |  0.88199 |   0.8345 |  0.83449

In [None]:
# with mpnet and minilm (optimized)

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | RPinTopK % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |   1.06 +   6.77 =   7.83 ms |    98.16 % |  84% ( 326) |   0% (   0) |   0.8466 |  0.84663 |   0.7855 |  0.78547
2008_Sichuan_eart |     521 |   0.89 +   8.09 =   8.99 ms |    94.43 % |  85% ( 521) |   0% (   0) |   0.8503 |  0.85029 |    0.775 |  0.77496
Wayback_Machine   |     208 |   1.12 +   8.01 =   9.13 ms |    98.08 % |  69% (  79) |  80% ( 129) |   0.7644 |  0.76442 |   0.7456 |  0.74559
Canadian_Armed_Fo |     396 |   0.87 +   9.33 =   10.2 ms |    98.23 % |  78% ( 179) |  85% ( 217) |   0.8207 |  0.82071 |   0.7772 |  0.77722
Cardinal_(Catholi |     322 |   1.07 +   8.19 =   9.26 ms |     97.2 % |  78% ( 117) |  89% ( 205) |    0.854 |  0.85404 |   0.8324 |   0.8324

In [None]:
# with mpnet and roberta (optimized)

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | RPinTopK % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |   1.25 +  19.26 =  20.51 ms |    98.16 % |  85% ( 326) |   0% (   0) |   0.8589 |   0.8589 |   0.7811 |  0.78113
2008_Sichuan_eart |     521 |   1.02 +  20.31 =  21.34 ms |    94.43 % |  84% ( 521) |   0% (   0) |   0.8484 |  0.84837 |   0.7828 |  0.78281
Wayback_Machine   |     208 |   1.37 +  18.11 =  19.48 ms |    98.08 % |  62% (  79) |  85% ( 129) |   0.7644 |  0.76442 |   0.7542 |  0.75419
Canadian_Armed_Fo |     396 |   0.99 +  21.72 =  22.72 ms |    98.23 % |  71% ( 179) |  89% ( 217) |   0.8131 |  0.81313 |    0.774 |  0.77395
Cardinal_(Catholi |     322 |   1.29 +  20.16 =  21.45 ms |     97.2 % |  77% ( 117) |  91% ( 205) |   0.8665 |  0.86646 |   0.8467 |  0.84675

#### Sentence Encoders
**mpnet** vs distilroberta vs minilm vs universal sentence encoder

In [None]:
# with mpnet and electra

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | RPinTopK % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |   1.07 +  14.46 =  15.53 ms |    98.16 % |  82% ( 326) |   0% (   0) |   0.8221 |  0.82209 |   0.8183 |  0.81834
2008_Sichuan_eart |     521 |   0.91 +  17.42 =  18.33 ms |    94.43 % |  80% ( 521) |   0% (   0) |   0.8061 |  0.80614 |   0.7967 |  0.79671
Wayback_Machine   |     208 |    1.1 +  17.33 =  18.44 ms |    98.08 % |  69% (  79) |  93% ( 129) |   0.8462 |  0.84615 |   0.8526 |  0.85256
Canadian_Armed_Fo |     396 |   0.89 +  21.18 =  22.07 ms |    98.23 % |  82% ( 179) |  94% ( 217) |   0.8889 |  0.88889 |   0.8779 |  0.87788
Cardinal_(Catholi |     322 |   1.12 +  18.03 =  19.15 ms |     97.2 % |  71% ( 117) |  95% ( 205) |   0.8696 |  0.86957 |   0.8704 |  0.87045

In [None]:
# with distilroberta and electra

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | RPinTopK % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |   0.56 +  13.61 =  14.16 ms |    95.71 % |  76% ( 326) |   0% (   0) |   0.7669 |  0.76687 |   0.7529 |  0.75287
2008_Sichuan_eart |     521 |   0.46 +   17.3 =  17.76 ms |    94.43 % |  80% ( 521) |   0% (   0) |     0.81 |  0.80998 |      0.8 |  0.80002
Wayback_Machine   |     208 |   0.59 +  16.71 =  17.31 ms |    98.08 % |  79% (  79) |  91% ( 129) |   0.8702 |  0.87019 |    0.867 |  0.86699
Canadian_Armed_Fo |     396 |   0.48 +  18.67 =  19.15 ms |    97.47 % |  79% ( 179) |  95% ( 217) |   0.8838 |  0.88384 |   0.8678 |  0.86782
Cardinal_(Catholi |     322 |   0.56 +  17.29 =  17.84 ms |    96.89 % |  70% ( 117) |  93% ( 205) |    0.854 |  0.85404 |   0.8566 |   0.8566

In [None]:
# with minilm and electra

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | RPinTopK % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |   0.48 +  14.62 =   15.1 ms |    96.93 % |  80% ( 326) |   0% (   0) |   0.8067 |  0.80675 |   0.7684 |  0.76841
2008_Sichuan_eart |     521 |   0.37 +  18.15 =  18.52 ms |    93.47 % |  79% ( 521) |   0% (   0) |   0.7965 |  0.79655 |    0.776 |  0.77597
Wayback_Machine   |     208 |   0.44 +  16.59 =  17.03 ms |    96.15 % |  65% (  79) |  93% ( 129) |   0.8317 |  0.83173 |   0.8253 |  0.82532
Canadian_Armed_Fo |     396 |   0.39 +  19.68 =  20.07 ms |    97.73 % |  79% ( 179) |  92% ( 217) |   0.8687 |  0.86869 |   0.8533 |  0.85333
Cardinal_(Catholi |     322 |   0.44 +  17.76 =   18.2 ms |    96.27 % |  66% ( 117) |  96% ( 205) |   0.8571 |  0.85714 |   0.8547 |   0.8547

In [None]:
# with universal sentence encoder and electra

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | RPinTopK % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |   7.89 +  14.91 =   22.8 ms |    96.32 % |  77% ( 326) |   0% (   0) |   0.7791 |  0.77914 |   0.7541 |  0.75406
2008_Sichuan_eart |     521 |   0.23 +  16.79 =  17.02 ms |    93.28 % |  79% ( 521) |   0% (   0) |   0.7927 |  0.79271 |   0.7884 |  0.78839
Wayback_Machine   |     208 |   0.36 +  17.08 =  17.44 ms |    95.19 % |  62% (  79) |  93% ( 129) |   0.8173 |  0.81731 |   0.8113 |  0.81127
Canadian_Armed_Fo |     396 |   0.25 +  19.19 =  19.44 ms |    94.95 % |  70% ( 179) |  94% ( 217) |   0.8409 |  0.84091 |   0.8325 |  0.83255
Cardinal_(Catholi |     322 |   0.31 +  16.54 =  16.85 ms |    95.65 % |  61% ( 117) |  96% ( 205) |   0.8354 |   0.8354 |   0.8423 |  0.84232

In [None]:
# with simcse and electra-best (optimized)

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | F1 Score | Avg Ctx Size
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|-------------
IPod              |     326 |   1.46 +  20.07 =  21.53 ms |    79.45 % |  76% ( 326) |   0% (   0) |   0.7638 |   0.7091 |   202.3
2008_Sichuan_eart |     521 |   1.21 +  22.36 =  23.57 ms |    81.19 % |  77% ( 521) |   0% (   0) |   0.7735 |   0.7462 |   212.4
Wayback_Machine   |     208 |   1.91 +  22.31 =  24.21 ms |    81.01 % |  64% (  79) |  89% ( 129) |   0.7981 |   0.8011 |   205.5
Canadian_Armed_Fo |     396 |   1.19 +  21.55 =  22.73 ms |    74.86 % |  71% ( 179) |  91% ( 217) |   0.8258 |   0.8018 |   222.0
Cardinal_(Catholi |     322 |    2.0 +  23.33 =  25.33 ms |     73.5 % |  66% ( 117) |  92% ( 205) |   0.8323 |    0.843 |   216.4
Human_Development |     168 |   1.53 +  22.56 =   24.1 ms |    74.67 % | 

#### Variation of k
5 | 6 | **7** | 8 | 9 | 10 | 14 (2 passes)

In [None]:
# with mpnet and electra-best (optimized) k = 7

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |   1.14 +  15.67 =   16.8 ms |    93.87 % |  87% ( 326) |   0% (   0) |   0.8742 |  0.87423 |   0.8597 |  0.85967
2008_Sichuan_eart |     521 |   0.97 +  18.65 =  19.62 ms |    93.09 % |  86% ( 521) |   0% (   0) |   0.8656 |  0.86564 |   0.8561 |  0.85607
Wayback_Machine   |     208 |   1.18 +  18.68 =  19.86 ms |    88.61 % |  73% (  79) |  93% ( 129) |   0.8558 |  0.85577 |   0.8619 |  0.86194
Canadian_Armed_Fo |     396 |   0.94 +  21.54 =  22.47 ms |    93.85 % |  88% ( 179) |  92% ( 217) |   0.9066 |  0.90657 |   0.8851 |  0.88513
Cardinal_(Catholi |     322 |   1.22 +  22.73 =  23.95 ms |    88.89 % |  79% ( 117) |  93% ( 205) |    0.882 |  0.88199 |   0.8752 |  0.87523

In [None]:
# with mpnet and electra-best (optimized) k = 14 num_passes = 2

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |   1.13 +   35.9 =  37.02 ms |    96.01 % |  82% ( 326) |   0% (   0) |   0.8252 |  0.82515 |   0.8054 |  0.80544
2008_Sichuan_eart |     521 |    1.1 +  46.34 =  47.44 ms |    96.35 % |  84% ( 521) |   0% (   0) |   0.8484 |  0.84837 |   0.8314 |  0.83139
Wayback_Machine   |     208 |   1.22 +  47.81 =  49.02 ms |    93.67 % |  72% (  79) |  86% ( 129) |   0.8077 |  0.80769 |   0.8096 |  0.80959
Canadian_Armed_Fo |     396 |   0.94 +  49.38 =  50.33 ms |    95.53 % |  82% ( 179) |  87% ( 217) |    0.851 |  0.85101 |   0.8351 |  0.83507
Cardinal_(Catholi |     322 |   1.18 +  47.67 =  48.85 ms |    93.16 % |  80% ( 117) |  88% ( 205) |   0.8571 |  0.85714 |   0.8619 |  0.86194

In [None]:
# with mpnet and electra-best (optimized) k = 10

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |   1.18 +  21.98 =  23.16 ms |    95.71 % |  86% ( 326) |   0% (   0) |    0.862 |  0.86196 |   0.8531 |   0.8531
2008_Sichuan_eart |     521 |   0.97 +  28.78 =  29.74 ms |    95.01 % |  85% ( 521) |   0% (   0) |    0.856 |  0.85605 |   0.8454 |  0.84537
Wayback_Machine   |     208 |    1.2 +  30.09 =  31.29 ms |    91.14 % |  70% (  79) |  89% ( 129) |   0.8269 |  0.82692 |   0.8368 |  0.83683
Canadian_Armed_Fo |     396 |   0.99 +  34.63 =  35.61 ms |    94.97 % |  83% ( 179) |  88% ( 217) |   0.8662 |  0.86616 |   0.8484 |  0.84841
Cardinal_(Catholi |     322 |   1.17 +  29.17 =  30.34 ms |     90.6 % |  79% ( 117) |  92% ( 205) |   0.8789 |  0.87888 |   0.8796 |  0.87963

In [None]:
# with mpnet and electra-best (optimized) k = 9

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |   1.13 +  19.35 =  20.48 ms |    95.71 % |  85% ( 326) |   0% (   0) |   0.8589 |   0.8589 |   0.8551 |  0.85507
2008_Sichuan_eart |     521 |   0.98 +  26.15 =  27.13 ms |    94.82 % |  86% ( 521) |   0% (   0) |   0.8637 |  0.86372 |   0.8548 |  0.85484
Wayback_Machine   |     208 |   1.24 +  27.48 =  28.72 ms |    91.14 % |  70% (  79) |  90% ( 129) |   0.8317 |  0.83173 |   0.8432 |  0.84324
Canadian_Armed_Fo |     396 |    1.0 +  30.37 =  31.37 ms |    94.41 % |  85% ( 179) |  90% ( 217) |   0.8813 |  0.88131 |   0.8651 |  0.86512
Cardinal_(Catholi |     322 |   1.22 +  25.36 =  26.58 ms |    89.74 % |  78% ( 117) |  93% ( 205) |    0.882 |  0.88199 |   0.8843 |  0.88428

In [None]:
# with mpnet and electra-best (optimized) k = 8

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |   1.11 +  17.16 =  18.27 ms |    95.09 % |  86% ( 326) |   0% (   0) |    0.865 |  0.86503 |   0.8661 |  0.86606
2008_Sichuan_eart |     521 |   0.96 +  21.43 =  22.39 ms |    93.86 % |  85% ( 521) |   0% (   0) |    0.856 |  0.85605 |   0.8451 |   0.8451
Wayback_Machine   |     208 |    1.2 +  22.25 =  23.46 ms |    89.87 % |  73% (  79) |  90% ( 129) |   0.8413 |  0.84135 |   0.8432 |  0.84324
Canadian_Armed_Fo |     396 |   1.14 +  28.23 =  29.37 ms |    93.85 % |  88% ( 179) |  91% ( 217) |    0.899 |  0.89899 |   0.8781 |  0.87811
Cardinal_(Catholi |     322 |    1.2 +  22.33 =  23.53 ms |    89.74 % |  78% ( 117) |  93% ( 205) |   0.8789 |  0.87888 |   0.8778 |  0.87779

In [None]:
# with mpnet and electra-best (optimized) k = 6

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |    1.2 +  15.17 =  16.37 ms |    90.49 % |  85% ( 326) |   0% (   0) |   0.8528 |  0.85276 |   0.8311 |  0.83106
2008_Sichuan_eart |     521 |   0.97 +  15.97 =  16.93 ms |    91.75 % |  86% ( 521) |   0% (   0) |   0.8656 |  0.86564 |    0.844 |  0.84402
Wayback_Machine   |     208 |   1.17 +   15.9 =  17.07 ms |    86.08 % |  73% (  79) |  91% ( 129) |   0.8462 |  0.84615 |   0.8523 |  0.85232
Canadian_Armed_Fo |     396 |   0.93 +  18.49 =  19.41 ms |     93.3 % |  89% ( 179) |  92% ( 217) |   0.9116 |  0.91162 |   0.8865 |  0.88646
Cardinal_(Catholi |     322 |   1.18 +  16.75 =  17.93 ms |    86.32 % |  77% ( 117) |  94% ( 205) |   0.8851 |  0.88509 |   0.8814 |  0.88144

In [None]:
# with mpnet and electra-best (optimized) k = 5

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | Final PA | F1 Score | Final F1
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|----------|---------
IPod              |     326 |    1.1 +  13.75 =  14.85 ms |    88.34 % |  84% ( 326) |   0% (   0) |   0.8436 |  0.84356 |   0.8207 |  0.82069
2008_Sichuan_eart |     521 |   1.15 +  14.81 =  15.96 ms |     90.4 % |  85% ( 521) |   0% (   0) |   0.8599 |  0.85988 |   0.8352 |  0.83515
Wayback_Machine   |     208 |   1.17 +  14.52 =  15.69 ms |    86.08 % |  75% (  79) |  89% ( 129) |   0.8413 |  0.84135 |   0.8496 |  0.84963
Canadian_Armed_Fo |     396 |   0.98 +  16.19 =  17.17 ms |    92.74 % |  88% ( 179) |  92% ( 217) |    0.904 |  0.90404 |   0.8818 |  0.88177
Cardinal_(Catholi |     322 |   1.15 +  13.84 =  14.99 ms |    82.91 % |  76% ( 117) |  94% ( 205) |   0.8789 |  0.87888 |   0.8747 |  0.87469

#### Context Length Threshold
170 | 190 | 200 | **205** | 210 | 220

In [None]:
# with mpnet and electra-best (optimized) context_length_threshold = 170

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | F1 Score | Avg Ctx Size
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|-------------
IPod              |     326 |   1.25 +  20.06 =  21.32 ms |     95.4 % |  88% ( 326) |   0% (   0) |   0.8804 |   0.8638 |   181.4
2008_Sichuan_eart |     521 |   1.11 +   21.0 =  22.12 ms |    93.28 % |  87% ( 521) |   0% (   0) |   0.8714 |     0.85 |   184.6
Wayback_Machine   |     208 |   1.34 +   20.5 =  21.85 ms |    88.61 % |  72% (  79) |  92% ( 129) |   0.8462 |   0.8608 |   184.9
Canadian_Armed_Fo |     396 |   1.05 +  20.28 =  21.33 ms |     93.3 % |  88% ( 179) |  93% ( 217) |   0.9091 |   0.8879 |   188.8
Cardinal_(Catholi |     322 |   1.32 +  20.63 =  21.95 ms |    86.32 % |  78% ( 117) |  93% ( 205) |    0.882 |   0.8812 |   189.4
Human_Development |     168 |    1.4 +  20.14 =  21.55 ms |    86.67 % | 

In [None]:
# with mpnet and electra-best (optimized) context_length_threshold = 190

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | F1 Score | Avg Ctx Size
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|-------------
IPod              |     326 |   1.15 +  19.83 =  20.98 ms |    95.71 % |  88% ( 326) |   0% (   0) |   0.8804 |   0.8584 |   197.0
2008_Sichuan_eart |     521 |    1.0 +  21.42 =  22.42 ms |    93.67 % |  87% ( 521) |   0% (   0) |   0.8733 |   0.8545 |   204.1
Wayback_Machine   |     208 |   1.28 +  20.88 =  22.16 ms |    88.61 % |  72% (  79) |  90% ( 129) |   0.8365 |   0.8443 |   203.9
Canadian_Armed_Fo |     396 |   1.05 +   21.0 =  22.05 ms |    93.85 % |  87% ( 179) |  92% ( 217) |    0.904 |   0.8846 |   208.7
Cardinal_(Catholi |     322 |   1.31 +  21.25 =  22.56 ms |    88.03 % |  79% ( 117) |  93% ( 205) |   0.8851 |   0.8847 |   208.6
Human_Development |     168 |    1.4 +  20.69 =  22.09 ms |     88.0 % | 

In [None]:
# with mpnet and electra-best (optimized) context_length_threshold = 200

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | F1 Score | Avg Ctx Size
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|-------------
IPod              |     326 |   1.16 +  20.08 =  21.24 ms |    95.71 % |  88% ( 326) |   0% (   0) |   0.8834 |   0.8607 |   202.9
2008_Sichuan_eart |     521 |    1.0 +  21.62 =  22.62 ms |    93.67 % |  87% ( 521) |   0% (   0) |   0.8733 |   0.8552 |   211.9
Wayback_Machine   |     208 |   1.26 +  21.68 =  22.93 ms |    88.61 % |  72% (  79) |  91% ( 129) |   0.8413 |   0.8491 |   212.4
Canadian_Armed_Fo |     396 |   1.03 +  21.72 =  22.75 ms |    93.85 % |  87% ( 179) |  92% ( 217) |    0.904 |   0.8843 |   217.2
Cardinal_(Catholi |     322 |   1.33 +  21.94 =  23.27 ms |    88.03 % |  79% ( 117) |  93% ( 205) |   0.8851 |   0.8847 |   216.1
Human_Development |     168 |   1.41 +  21.57 =  22.98 ms |     88.0 % | 

In [None]:
# with mpnet and electra-best (optimized) context_length_threshold = 205

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | F1 Score | Avg Ctx Size
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|-------------
IPod              |     326 |   1.23 +   22.2 =  23.43 ms |    95.71 % |  88% ( 326) |   0% (   0) |   0.8834 |   0.8607 |   205.4
2008_Sichuan_eart |     521 |   1.05 +  22.56 =  23.61 ms |    93.67 % |  87% ( 521) |   0% (   0) |   0.8733 |   0.8538 |   216.9
Wayback_Machine   |     208 |   1.34 +  22.44 =  23.78 ms |    89.87 % |  72% (  79) |  91% ( 129) |   0.8413 |   0.8491 |   216.5
Canadian_Armed_Fo |     396 |   1.02 +  23.17 =   24.2 ms |    93.85 % |  87% ( 179) |  92% ( 217) |    0.904 |   0.8868 |   221.9
Cardinal_(Catholi |     322 |   1.36 +  26.46 =  27.82 ms |    88.89 % |  80% ( 117) |  93% ( 205) |   0.8882 |   0.8878 |   220.6
Human_Development |     168 |   1.48 +  23.63 =   25.1 ms |    89.33 % | 

In [None]:
# with mpnet and electra-best (optimized) context_length_threshold = 210

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | F1 Score | Avg Ctx Size
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|-------------
IPod              |     326 |   1.18 +  20.43 =  21.62 ms |    95.71 % |  88% ( 326) |   0% (   0) |   0.8834 |   0.8607 |   207.1
2008_Sichuan_eart |     521 |   1.01 +  22.84 =  23.85 ms |    93.86 % |  86% ( 521) |   0% (   0) |   0.8695 |   0.8507 |   221.2
Wayback_Machine   |     208 |   1.27 +   22.6 =  23.87 ms |    89.87 % |  69% (  79) |  91% ( 129) |   0.8317 |   0.8395 |   221.5
Canadian_Armed_Fo |     396 |   1.04 +  22.83 =  23.87 ms |    93.85 % |  88% ( 179) |  92% ( 217) |   0.9066 |   0.8874 |   227.1
Cardinal_(Catholi |     322 |   1.28 +  22.64 =  23.92 ms |    88.89 % |  80% ( 117) |  93% ( 205) |   0.8882 |   0.8878 |   224.8
Human_Development |     168 |   1.41 +  21.98 =  23.39 ms |    89.33 % | 

In [None]:
# with mpnet and electra-best (optimized) context_length_threshold = 220

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | F1 Score | Avg Ctx Size
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|-------------
IPod              |     326 |   1.23 +  24.66 =  25.88 ms |    95.71 % |  87% ( 326) |   0% (   0) |   0.8773 |   0.8561 |   211.3
2008_Sichuan_eart |     521 |   1.54 +  28.74 =  30.28 ms |    94.24 % |  87% ( 521) |   0% (   0) |   0.8714 |   0.8539 |   228.1
Wayback_Machine   |     208 |   1.36 +  24.82 =  26.18 ms |    89.87 % |  70% (  79) |  90% ( 129) |   0.8317 |   0.8384 |   230.9
Canadian_Armed_Fo |     396 |   1.05 +  23.57 =  24.62 ms |    93.85 % |  88% ( 179) |  92% ( 217) |    0.904 |   0.8852 |   238.3
Cardinal_(Catholi |     322 |   1.24 +  22.86 =   24.1 ms |    89.74 % |  80% ( 117) |  93% ( 205) |   0.8882 |   0.8847 |   231.2
Human_Development |     168 |    1.4 +  22.39 =   23.8 ms |    89.33 % | 

### Current Best Performance

In [None]:
# with mpnet - finetuned and electra-best (optimized) - k = 10, context_length_threshold = 205 - new themes

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | F1 Score | Avg Ctx Size
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|-------------
IPod              |     326 |   1.15 +  21.24 =  22.39 ms |    96.93 % |  88% ( 326) |   0% (   0) |   0.8804 |   0.8562 |   210.0
2008_Sichuan_eart |     521 |   1.01 +  22.77 =  23.78 ms |    95.39 % |  89% ( 521) |   0% (   0) |   0.8944 |   0.8743 |   217.8
Wayback_Machine   |     208 |   1.29 +  21.89 =  23.18 ms |    92.41 % |  74% (  79) |  92% ( 129) |   0.8558 |   0.8528 |   217.7
Canadian_Armed_Fo |     396 |   1.02 +  22.63 =  23.64 ms |    95.53 % |  89% ( 179) |  90% ( 217) |   0.9015 |   0.8809 |   224.0
Cardinal_(Catholi |     322 |   1.23 +  21.77 =   23.0 ms |     90.6 % |  82% ( 117) |  94% ( 205) |   0.8975 |   0.8956 |   224.0
Human_Development |     168 |   1.35 +  22.26 =  23.62 ms |    89.33 % | 

In [None]:
# with mpnet and electra-best (optimized) - k = 10, context_length_threshold = 205 - new themes

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | F1 Score | Avg Ctx Size
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|-------------
IPod              |     326 |   1.67 +  23.22 =  24.89 ms |    95.71 % |  88% ( 326) |   0% (   0) |   0.8834 |   0.8607 |   205.4
2008_Sichuan_eart |     521 |   1.14 +  24.29 =  25.43 ms |    93.67 % |  87% ( 521) |   0% (   0) |   0.8733 |   0.8538 |   216.9
Wayback_Machine   |     208 |   1.31 +  23.79 =  25.11 ms |    89.87 % |  72% (  79) |  91% ( 129) |   0.8413 |   0.8491 |   216.5
Canadian_Armed_Fo |     396 |   1.06 +  24.08 =  25.15 ms |    93.85 % |  87% ( 179) |  92% ( 217) |    0.904 |   0.8868 |   221.9
Cardinal_(Catholi |     322 |   1.32 +  23.71 =  25.03 ms |    88.89 % |  80% ( 117) |  93% ( 205) |   0.8882 |   0.8878 |   220.6
Human_Development |     168 |   1.35 +  21.69 =  23.04 ms |    89.33 % | 

In [None]:
# with mpnet - finetuned and electra-best (optimized) - k = 10, context_length_threshold = 205 - old themes

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | F1 Score | Avg Ctx Size
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|-------------
Beyoncé           |     228 |   1.25 +   22.4 =  23.65 ms |    95.61 % |  85% ( 228) |   0% (   0) |   0.8596 |   0.8765 |   220.2
Frédéric_Chopin   |     203 |   1.23 +  22.49 =  23.72 ms |    88.18 % |  85% ( 203) |   0% (   0) |   0.8522 |    0.836 |   214.5
Sino-Tibetan_rela |      98 |   1.34 +  24.68 =  26.02 ms |    92.86 % |  85% (  98) |   0% (   0) |   0.8571 |   0.8492 |   224.3
The_Legend_of_Zel |     125 |   1.32 +  22.82 =  24.14 ms |    96.05 % |  82% (  76) |  91% (  49) |    0.864 |   0.8541 |   214.4
Spectre_(2015_fil |     142 |   1.25 +  23.15 =   24.4 ms |     95.4 % |  82% (  87) |  87% (  55) |   0.8451 |   0.8627 |   217.2
New_York_City     |     253 |   1.28 +  21.95 =  23.23 ms |     99.6 % | 

In [None]:
# with mpnet and electra-best (optimized) - context_length_threshold = 205 - old themes

Theme             | Queries |  AIT: (ANN) + (QnA) = Total | ansInCtx % | TP % (TotP) | TN % (TotN) | Para Acc | F1 Score | Avg Ctx Size
------------------|---------|-----------------------------|------------|-------------|-------------|----------|----------|-------------
Beyoncé           |     228 |    1.2 +  20.84 =  22.04 ms |     94.3 % |  88% ( 228) |   0% (   0) |    0.886 |   0.8858 |   216.2
Frédéric_Chopin   |     203 |    1.2 +  21.97 =  23.17 ms |     80.3 % |  75% ( 203) |   0% (   0) |   0.7537 |   0.7463 |   210.2
Sino-Tibetan_rela |      98 |   1.29 +  23.09 =  24.38 ms |     89.8 % |  83% (  98) |   0% (   0) |   0.8367 |   0.8378 |   220.7
The_Legend_of_Zel |     125 |   1.25 +  20.76 =  22.01 ms |    94.74 % |  85% (  76) |  91% (  49) |     0.88 |   0.8594 |   213.4
Spectre_(2015_fil |     142 |   1.07 +  21.54 =  22.62 ms |    94.25 % |  81% (  87) |  90% (  55) |   0.8521 |    0.864 |   214.5
New_York_City     |     253 |   1.17 +  21.32 =  22.49 ms |    99.21 % | 