# Tutorial: Retrieval of documents from a corpus using Neural Information Retrieval (IR)

In this tutorial you'll learn how to use a popular Neural IR system called DPR [Karpukhin2020].


## Step 0: Install the required packages

In [None]:
! pip install primeqa

## Step 1: Init -- Initialize your model. In PrimeQA for searching through your corpus, we use a class called SearchableCorpus.

For DPR, you need to point to a question and context encoder models available via the HuggingFace model hub.

In [None]:
from primeqa.components.retriever.searchable_corpus import SearchableCorpus
collection = SearchableCorpus(model_name="PrimeQA/XOR-TyDi_monolingual_DPR_ctx_encoder", 
                              query_encoder_model_name_or_path="PrimeQA/XOR-TyDi_monolingual_DPR_qry_encoder", 
                              batch_size=64, top_k=10)

## Step 2: Add -- add your documents into the searchable corpus.

In this step you create a tsv file with the following format:
`id \t text \t title_of_document`
Note: since DPR is based on an encoder language model the typical sequence length is 512 max sub-word tokens. So please make sure your documents are splitted into text length of ~220 words.

In [None]:
# Please update to point to your collection tsv format id\ttext\ttitle with these headers
path_to_collection_tsv_file="../path_to_tsv/sample.tsv"
collection.add_documents(path_to_collection_tsv_file)

## Step 3: Search -- start asking questions.

Your queries can be a list. You can also retrieve the scores of retrieved documents.

In [None]:
queries = ['When was Idaho split in two?' , 'Who was Danny Nozel']
retrieved_doc_ids, passages = collection.search(queries)
#res, scores = collection.search2(queries)
import json
print(json.dumps(passages, indent = 4))