# Open-retrieval Conversation Question Answering
Based on the paper _Open-retrieval Conversation Question Answering_ by _Qu et al_.

Since ConverSE is built upon Haystack. This notebook is very similar to the original notebook on Dense Passage Retrieval https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial6_Better_Retrieval_via_DPR.ipynb#scrollTo=kFwiPP60A6N7

## Prepare environment

In [1]:
# Make sure you have a GPU running
!nvidia-smi

# For locals runs... Notebook opens in example dir, should be main
import os
os.chdir("..")

!pip install git+https://github.com/deepset-ai/haystack.git # Install the latest master of Haystack
!pip install git+https://github.com/giguru/converse.git  # Install the latest master of Converse

Wed Oct  7 20:47:39 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX 166...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   48C    P8    11W / 125W |    761MiB /  5943MiB |     18%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage    

In [2]:
from haystack import Finder
from haystack.preprocessor.cleaning import clean_wiki_text
from haystack.preprocessor.utils import convert_files_to_dicts, fetch_archive_from_http
from haystack.reader.farm import FARMReader
from haystack.reader.transformers import TransformersReader
from haystack.utils import print_answers

from converse.src.reader.farm import FARMReader
from converse.src.reader.transformers import TransformersReader
from converse.src.retriever.dense_passage_retriever import DensePassageRetriever
from converse.src.converse import Converse

## Indexer and data

In [3]:
# Add document collection to a DocumentStore. The original text will be indexed. Conversion into embeddings can be 
# is done below.
from haystack.document_store.faiss import FAISSDocumentStore
document_store = FAISSDocumentStore()


10/07/2020 20:47:50 - INFO - faiss -   Loading faiss.


In [4]:
from converse.src.retriever.dense_passage_retriever import DensePassageRetriever
retriever = DensePassageRetriever(
    document_store=document_store,
    query_embedding_model="facebook/dpr-question_encoder-single-nq-base",  # TODO replace with ORConvQA model
    passage_embedding_model="facebook/dpr-ctx_encoder-single-nq-base",  # TODO replace with ORConvQA model
    use_gpu=True,
    embed_title=True,
    max_seq_len=256,
    batch_size=16,
    remove_sep_tok_from_untitled_passages=True
)

Some weights of DPRQuestionEncoder were not initialized from the model checkpoint at facebook/dpr-question_encoder-single-nq-base and are newly initialized: ['question_encoder.bert_model.embeddings.position_ids']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of DPRContextEncoder were not initialized from the model checkpoint at facebook/dpr-ctx_encoder-single-nq-base and are newly initialized: ['ctx_encoder.bert_model.embeddings.position_ids']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


# Embed passages
Since retrieval will be done on the embeddings, the embedding representation of the documents need to be computed
This only needs to be done once.

In [5]:
# document_store.update_embeddings(retriever)

In [6]:
# Load a local model or any of the QA models on Hugging Face's model hub (https://huggingface.co/models)
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)

10/07/2020 20:48:23 - INFO - farm.utils -   device: cuda n_gpu: 1, distributed training: False, automatic mixed precision training: None
10/07/2020 20:48:23 - INFO - farm.infer -   Could not find `deepset/roberta-base-squad2` locally. Try to download from model hub ...
	 We guess it's an *ENGLISH* model ... 
	 If not: Init the language model by supplying the 'language' param.
10/07/2020 20:48:58 - INFO - farm.utils -   device: cuda n_gpu: 1, distributed training: False, automatic mixed precision training: None
10/07/2020 20:48:58 - INFO - farm.infer -   Got ya 7 parallel workers to do inference ...
10/07/2020 20:48:58 - INFO - farm.infer -    0    0    0    0    0    0    0 
10/07/2020 20:48:58 - INFO - farm.infer -   /w\  /w\  /w\  /w\  /w\  /w\  /w\
10/07/2020 20:48:58 - INFO - farm.infer -   /'\  / \  /'\  /'\  / \  / \  /'\
10/07/2020 20:48:58 - INFO - farm.infer -               
Process ForkPoolWorker-7:
Process ForkPoolWorker-1:
Process ForkPoolWorker-5:
Process ForkPoolWorker-2:

In [7]:
finder = Converse(reader, [retriever])

## Evaluate pipeline

In [8]:
# Evaluate combination of Reader and Retriever through Finder
finder_eval_results = finder.eval(top_k_retriever=1, top_k_reader=10)
finder.print_eval_results(finder_eval_results)

TypeError: eval() missing 2 required positional arguments: 'label_index' and 'doc_index'