# Tutorial: Retrieval of documents from a corpus using Neural Information Retrieval (IR)

In this tutorial you'll learn how to use a popular Neural IR system called DPR [Karpukhin2020].

## Step 1: Init -- Initialize your model. In PrimeQA for searching through your corpus, we use a class called SearchableCorpus.

For DPR, you need to point to a question and context encoder models available via the HuggingFace model hub.

In [9]:
from primeqa.util import SearchableCorpus
collection = SearchableCorpus(model_name="PrimeQA/XOR-TyDi_monolingual_DPR_ctx_encoder", 
                              query_encoder_model_name_or_path="PrimeQA/XOR-TyDi_monolingual_DPR_qry_encoder", 
                              batch_size=64, top_k=10)

## Step 2: Add -- add your documents into the searchable corpus.

In this step you create a tsv file with the following format:
`id \t text \t title_of_document`
Note: since DPR is based on an encoder language model the typical sequence length is 512 max sub-word tokens. So please make sure your documents are splitted into text length of ~220 words.

In [3]:
# an example subset of NQ passages
collection.add_documents("/dccstor/avi8/vector_db/sample.psgs.tsv")

49it [00:00, 3919.01it/s]


49


AttributeError: 'SearchableCorpus' object has no attribute 'working_dir'

In [3]:
## Step 3: Search -- start asking questions.

Your queries can be a list. You can also retrieve the scores of 

In [11]:
queries = ['When was Idaho split in two?' , 'Who was Danny Nozel']
retrieved_doc_ids, passages = collection.search(queries)
#res, scores = collection.search2(queries)
import json
print(json.dumps(passages, indent = 4))

[
    {
        "titles": [
            "I Will Always Love You",
            "History of Idaho",
            "I Will Always Love You",
            "I Will Always Love You",
            "History of Idaho",
            "I Will Always Love You",
            "I Will Always Love You",
            "I Will Always Love You",
            "I Will Always Love You",
            "History of Idaho"
        ],
        "texts": [
            "- end review two years in a row . In Australia , it was the number 17 single of 1992 and the number two song of 1993 . Houston 's single was also an international success , peaking at number one of the singles charts in almost all countries , including the Eurochart Hot 100 Singles , where it spent 13 weeks at the top . The single ruled the summit position for ten weeks in Australia , five weeks in Austria , seven weeks for Belgium , eight weeks in France , six weeks in Germany , eight weeks in Ireland , two weeks in Italy , six weeks in Netherlands , 11 weeks i