## Question Answering Interface Notebook Contents
- [What do I need to import?](#What-do-I-need-to-import?)
- [Helper Fucntions](#Helper-functions)
- [What are all the steps I need to take to prepare the QAInterface?](#What-are-all-the-steps-I-need-to-take-to-prepare-the-QAInterface?)
- [How do I ask?](#How-do-I-ask?)

### What do I need to import?

Well basically you need the QAInterface and everything it connects :   
    - The Answer Detector (with pytorch)  
    - The Search Engines  
   
pprint and time aren't required they only help us visualize the results   

In [1]:
# bot modules
from bot.brain import QAInterface
from bot.searcher.base import SearchEngine
from bot.searcher.question import QuestionSearchEngine
from bot.searcher.faq import FAQSearchEngine
from bot.answer.detector import AnswerDetector
from bot.database.sqlite import Database

# general python
import torch
import time
import pprint

### Helper functions

### What are all the steps I need to take to prepare the QAInterface?

#### Open connection to Data Storage

In [2]:
# prepare data_storage
data_storage = Database("data_storage.db")

#### Load Answer Detector

In [3]:
model = 'distilbert-base-cased-distilled-squad'
gpu = 0 if torch.cuda.is_available() else -1

In [4]:
# load answer detector
answer_detector = AnswerDetector(model=model, device=gpu)

#### Load Search Engines

In [5]:
def setup_search_engines(db=Database):
    print("Loading SearchEngines...")
    docs_se = SearchEngine()
    docs_se.load_index(db=db, table_name="rucio_doc_term_matrix")
    question_se = QuestionSearchEngine()
    question_se.load_index(db=db, table_name="questions_doc_term_matrix")
    faq_se = FAQSearchEngine()
    faq_se.load_index(db=db, table_name="faq_doc_term_matrix")
    return faq_se, docs_se, question_se

In [6]:
# load search engines
faq_se, docs_se, question_se = setup_search_engines(db=data_storage)

Loading SearchEngines...


#### Load QA Interface

In [7]:
# load interface
qa_interface = QAInterface(
    detector=answer_detector,
    question_engine=question_se,
    faq_engine=faq_se,
    docs_engine=docs_se,
)

### How do I ask?

Simply have a query and the number of answers you want return

In [8]:
query = """When does a touch happen in the system?"""
top_k = 3

And then use the `.get_answers()` method.     
With also choosing the num_faqs, num_questions, num_docs you want to retrieve for the query

In [9]:
start_time = time.time()
answers = qa_interface.get_answers(query, top_k=top_k, num_faqs=1, num_questions=1, num_docs=1)
print(f"Total inference time: {round(time.time() - start_time, 2)} seconds")

  0%|                                                                                            | 0/1 [00:00<?, ?it/s]

Predicting answers from 1 document(s)...


100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00,  8.67s/it]
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]

Predicting answers from 1 document(s)...


100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.24s/it]

No answer was predicted for this document!
Total inference time: 13.15 seconds





Just visualizing the answers

In [10]:
pp = pprint.PrettyPrinter(indent=2) # not related; only for printing

In [11]:
i=0
print("Question : ", query)
for answer in answers:
    i+=1
    print()
    print(f"number {i} asnwer (by confidence)")
    pp.pprint([{k:v for k,v  in answer.__dict__.items() if k in ['answer','confidence','extended_answer','metadata']}])

Question :  When does a touch happen in the system?

number 1 asnwer (by confidence)
[ { 'answer': 'when the dataset is used as input for a panda task or when '
              'rucio download is used to access the data.',
    'confidence': 0.5806299379923985,
    'extended_answer': 'Hi fac8a3, A "touch" occurs when the dataset is used '
                       'as input for a panda task or when rucio download is '
                       "used to access the data. I don't see any tasks defined",
    'metadata': { 'bm25_score': 10.277384263879775,
                  'comment_id': nan,
                  'email_id': 4959.0,
                  'end': '413',
                  'issue_id': nan,
                  'most_similar_question': 'Can you clarify what constitutes '
                                           '"touched" by the system, so that '
                                           "in the future we can be sure we're "
                                           'protecting these from dele

print_answers(answers)

More details like : 
- What was retrieved   
- What each question detected was  

In [12]:
# retrieved FAQs without extra columns
qa_interface.retrieved_faqs.drop(columns=['faq_id','created_at','query'])

Unnamed: 0,question,answer,author,keywords,bm25_score,context


In [13]:
# retrieved Questions without extra columns
qa_interface.retrieved_questions.drop(columns=['question_id','query'])

Unnamed: 0,question,start,end,context,email_id,issue_id,comment_id,bm25_score
0,"Can you clarify what constitutes ""touched"" by ...",281,413,"Hi fac8a3, A ""touch"" occurs when the dataset i...",4959.0,,,10.277384


In [14]:
# retrieved Rucio documentation without extra columns
qa_interface.retrieved_docs.drop(columns=['doc_id','url','body','query'])

Unnamed: 0,name,doc_type,bm25_score,context
0,api.rst,general,3.80881,general The Client API Reference\n============...
