# Experimentation: Testing models that do not require training a fusion model (no learning to rank)

### Two things to do before we start:
1. Point environment variable `COLLECT_ROOT` to the collection root.
2. Change directory to the location of installed scripts/binaries

In [2]:
%env COLLECT_ROOT=/home/leo/flexneuart_collections

env: COLLECT_ROOT=/home/leo/flexneuart_collections


In [3]:
cd /home/leo/flexneuart_scripts/

/home/leo/flexneuart_scripts


## Testing BM25
We use optimal BM25 parameters obtained during tuning:

In [4]:
!./exper/run_experiments.sh \
  wikipedia_dpr_nq_sample \
  exper_desc.best/bm25.json \
  -test_part dev

Using collection root: /home/leo/flexneuart_collections
The number of CPU cores:      8
The number of || experiments: 1
The number of threads:        8
Experiment descriptor file:                                 /home/leo/flexneuart_collections/wikipedia_dpr_nq_sample/exper_desc.best/bm25.json
Default test set:                                           dev
Number of parallel experiments:                             1
Number of threads in feature extractors/query applications: 8
Parsed experiment parameters:
experSubdir:final_exper/bm25
candProvAddConf:exper_desc.best/lucene.json
extrTypeFinal:exper_desc.best/extractors/bm25.json
modelFinal:exper_desc.best/models/one_feat.model
testOnly:1
Started a process 8306, working dir: /home/leo/flexneuart_collections/wikipedia_dpr_nq_sample/results/dev/final_exper/bm25
Process log file: /home/leo/flexneuart_collections/wikipedia_dpr_nq_sample/results/dev/final_exper/bm25/exper.log
Waiting for 1 child processes
Process with pid=8306 finished succe

All the results are available in the directory `collections/wikipedia_dpr_nq_sample/results/dev/final_exper/bm25`. 

The following is a summary report (top-100):

In [6]:
!cat $COLLECT_ROOT/wikipedia_dpr_nq_sample/results/dev/final_exper/bm25/rep/out_100.rep

# of queries:    2500
NDCG@10:  0.400400
NDCG@20:  0.433900
NDCG@100: 0.507400
P20:      0.164800
MAP:      0.346200
MRR:      0.487300
Recall:   0.817827


## Sanity check: testing if the intermediate re-ranker functionality works

In [7]:
# The results should be the same as for the BM25 re-ranker
!./exper/run_experiments.sh \
  wikipedia_dpr_nq_sample \
  exper_desc.best/bm25_test_interm.json \
  -test_part dev \
  -test_cand_qty_list 100,200

Using collection root: /home/leo/flexneuart_collections
The number of CPU cores:      8
The number of || experiments: 1
The number of threads:        8
Experiment descriptor file:                                 /home/leo/flexneuart_collections/wikipedia_dpr_nq_sample/exper_desc.best/bm25_test_interm.json
Default test set:                                           dev
Number of parallel experiments:                             1
Number of threads in feature extractors/query applications: 8
Parsed experiment parameters:
experSubdir:final_exper/bm25_test_interm
candProvAddConf:exper_desc.best/lucene.json
extrTypeInterm:exper_desc.best/extractors/bm25.json
modelInterm:exper_desc.best/models/one_feat.model
candProvQty:5000
extrTypeFinal:exper_desc.best/extractors/bm25.json
modelFinal:exper_desc.best/models/one_feat.model
testOnly:1
Started a process 8439, working dir: /home/leo/flexneuart_collections/wikipedia_dpr_nq_sample/results/dev/final_exper/bm25_test_interm
Process log file: /home/l

Top-100 results should be the same as for BM25 re-ranker:

In [9]:
!cat $COLLECT_ROOT/wikipedia_dpr_nq_sample/results/dev/final_exper/bm25_test_interm/rep/out_100.rep

# of queries:    2500
NDCG@10:  0.400600
NDCG@20:  0.434100
NDCG@100: 0.507700
P20:      0.164800
MAP:      0.346400
MRR:      0.487700
Recall:   0.817879


## Testing dense retrieval (ANCE) in the re-ranking mode

In [10]:
!./exper/run_experiments.sh \
  wikipedia_dpr_nq_sample \
      exper_desc.best/ance.json \
  -test_part dev

Using collection root: /home/leo/flexneuart_collections
The number of CPU cores:      8
The number of || experiments: 1
The number of threads:        8
Experiment descriptor file:                                 /home/leo/flexneuart_collections/wikipedia_dpr_nq_sample/exper_desc.best/ance.json
Default test set:                                           dev
Number of parallel experiments:                             1
Number of threads in feature extractors/query applications: 8
Parsed experiment parameters:
experSubdir:final_exper/ance
candProvAddConf:exper_desc.best/lucene.json
extrTypeFinal:exper_desc.best/extractors/ance.json
modelFinal:exper_desc.best/models/one_feat.model
testOnly:1
Started a process 8537, working dir: /home/leo/flexneuart_collections/wikipedia_dpr_nq_sample/results/dev/final_exper/ance
Process log file: /home/leo/flexneuart_collections/wikipedia_dpr_nq_sample/results/dev/final_exper/ance/exper.log
Waiting for 1 child processes
Process with pid=8537 finished succe

To-100 report:

In [11]:
!cat $COLLECT_ROOT/wikipedia_dpr_nq_sample/results/dev/final_exper/ance/rep/out_100.rep

# of queries:    2500
NDCG@10:  0.649100
NDCG@20:  0.651700
NDCG@100: 0.692300
P20:      0.152200
MAP:      0.555200
MRR:      0.865000
Recall:   0.639296


## Testing dense retrieval (averaged glove embeddings) in the re-ranking mode

In [13]:
!./exper/run_experiments.sh \
  wikipedia_dpr_nq_sample \
  exper_desc.best/avgembed.json \
  -test_part dev

Using collection root: /home/leo/flexneuart_collections
The number of CPU cores:      8
The number of || experiments: 1
The number of threads:        8
Experiment descriptor file:                                 /home/leo/flexneuart_collections/wikipedia_dpr_nq_sample/exper_desc.best/avgembed.json
Default test set:                                           dev
Number of parallel experiments:                             1
Number of threads in feature extractors/query applications: 8
Parsed experiment parameters:
experSubdir:final_exper/avgembed
candProvAddConf:exper_desc.best/lucene.json
extrTypeFinal:exper_desc.best/extractors/avgembed.json
modelFinal:exper_desc.best/models/one_feat.model
testOnly:1
Experimental directory already exists (removing contents): /home/leo/flexneuart_collections/wikipedia_dpr_nq_sample/results/dev/final_exper/avgembed
Cleaning the experimental directory: /home/leo/flexneuart_collections/wikipedia_dpr_nq_sample/results/dev/final_exper/avgembed
Started a proce

To-100 report:

In [15]:
!cat $COLLECT_ROOT/wikipedia_dpr_nq_sample/results/dev/final_exper/avgembed/rep/out_100.rep

# of queries:    2500
NDCG@10:  0.144400
NDCG@20:  0.157500
NDCG@100: 0.216500
P20:      0.067500
MAP:      0.101900
MRR:      0.225200
Recall:   0.419581


## Testing a BERT ranking model
One needs to start a query server that binds to the port 8080 as shown below. This needs to be done in __a separate terminal__, because notebooks do not support background processes. Please, note we have to specify __the same maximum query and document lengths__ as during the training process.

```
COLLECT_ROOT=/home/leo/flexneuart_collections

./featextr_server/nn_rank_server.py  \
   --init_model $COLLECT_ROOT/wikipedia_dpr_nq_sample/derived_data/ir_models/vanilla_bert/model.best \
   --port 8080
```

Note that we ask to re-rank only 50 candidates. The ranking of candidates below 50th position will not change.

In [22]:
!./exper/run_experiments.sh \
  wikipedia_dpr_nq_sample \
  exper_desc.best/cedr8080.json \
  -thread_qty 2 \
  -max_final_rerank_qty 50 \
  -test_part dev 

Using collection root: /home/leo/flexneuart_collections
The number of CPU cores:      8
The number of || experiments: 1
The number of threads:        2
Experiment descriptor file:                                 /home/leo/flexneuart_collections/wikipedia_dpr_nq_sample/exper_desc.best/cedr8080.json
Default test set:                                           dev
Number of parallel experiments:                             1
Number of threads in feature extractors/query applications: 2
Parsed experiment parameters:
experSubdir:final_exper/cedr8080
extrTypeFinal:exper_desc.best/extractors/cedr8080.json
modelFinal:exper_desc.best/models/one_feat.model
testOnly:1
Started a process 9373, working dir: /home/leo/flexneuart_collections/wikipedia_dpr_nq_sample/results/dev/final_exper/cedr8080
Process log file: /home/leo/flexneuart_collections/wikipedia_dpr_nq_sample/results/dev/final_exper/cedr8080/exper.log
Waiting for 1 child processes
Process with pid=9373 finished successfully.
Waiting for 0 c

Top-100 report:

In [23]:
!cat $COLLECT_ROOT/wikipedia_dpr_nq_sample/results/dev/final_exper/cedr8080/rep/out_100.rep

# of queries:    2500
NDCG@10:  0.570000
NDCG@20:  0.583500
NDCG@100: 0.625800
P20:      0.190800
MAP:      0.497300
MRR:      0.678000
Recall:   0.808185
