# Sparse Retrieval

In this notebook, we will see how to run BM25 search.

In [80]:
# Pyserini requires Java 11
# python utility that can be used to download and install a given Java
!pip install install-jdk



In [81]:
# install of Java JDK 11 into $HOME/.jdk/<VERSION>
import jdk

jdk.uninstall('11')

jdk_dir = jdk.install('11')
os.environ["JAVA_HOME"] = jdk_dir

In [82]:
from oneqa.ir.sparse.retriever import PyseriniRetriever
import pandas as pd

Set some parameters
- index_path: path to an index 
- queries: example queries
- top_k: number of hits 
- k1: bm25 parameter to tune impact of term frequency 
- b: bm25 constant to fine tune the effect of document length   

In [83]:
index_path = '../../../tests/resources/sample_wiki_psgs_w100_index'
queries = [
    'who designed the South African 1961 one-cent postage stamp',
    'vitamin e deficiency',
    'where is the Presanella located'
]
top_k=5
k1 = 0.9
b = 0.4

In [84]:
# Instantiate the retriever

searcher = PyseriniRetriever(index_path, use_bm25=True, k1=k1, b=b)

In [85]:
# Run queries
for query in queries:
    hits = searcher.retrieve(query, top_k)
    df = pd.DataFrame.from_records(hits, columns=['rank','score','doc_id','title','text'])
    print('\n************************')
    print(f'QUERY:{query}')
    print(df)



************************
QUERY:who designed the South African 1961 one-cent postage stamp
   rank      score    doc_id                                      title  \
0     0  17.771099  20076582                             Nerine Desmond   
1     1   2.901800  19750546  SOS-Hermann Gmeiner International College   
2     2   2.230800  14213975                 Dana (South Korean singer)   
3     3   2.161900  14796077                      A Cottage on Dartmoor   
4     4   2.125000   9503472                            Nayaks of Kandy   

                                                text  
0  A South African 1961 one-cent postage stamp ca...  
1  SOS-Hermann Gmeiner International College SOS-...  
2  Dana (South Korean singer) Hong Sung-mi (born ...  
3  date; however the evening turns out awkwardly ...  
4  kith and kin. Narenappa Nayaka was destined to...  

************************
QUERY:vitamin e deficiency
   rank   score    doc_id                                    title  \
0    