# Experimentation: Training & testing fusion models

### First we need to move to the top-level directory ...

In [1]:
cd ../..

/home/leo/SourceTreeGit/FlexNeuART.refact2021


## Training a fusion of BM25 and Model1 using the optimal configuration obtained during the fine-tuning step

Training uses only the first 5000 queries from the fusion set:

In [9]:
!scripts/exper/run_experiments.sh \
   wikipedia_dpr_nq_sample \
   exper_desc.best/bm25_model1.json \
   -max_num_query_train 5000 \
   -train_cand_qty 20 \
   -test_part dev

Using collection root: collections
The number of CPU cores:      8
The number of || experiments: 1
The number of threads:        8
Experiment descriptor file:                                 collections/wikipedia_dpr_nq_sample/exper_desc.best/bm25_model1.json
Default test set:                                           dev
Number of parallel experiments:                             1
Number of threads in feature extractors/query applications: 8
Parsed experiment parameters:
experSubdir:feat_exper/bm25_model1
extrType:exper_desc.best/extractors/bm25=text+model1=text_bert_tok+lambda=0.3+probSelfTran=0.35.json
testOnly:0
Started a process 18046, working dir: collections/wikipedia_dpr_nq_sample/results/dev/feat_exper/bm25_model1
Process log file: collections/wikipedia_dpr_nq_sample/results/dev/feat_exper/bm25_model1/exper.log
Waiting for 1 child processes
Process with pid=18046 finished successfully.
Waiting for 0 child processes
1 experiments executed
0 experiments failed


Top-100 report:

In [10]:
!cat collections/wikipedia_dpr_nq_sample/results/dev/feat_exper/bm25_model1/rep/out_100.rep

# of queries:    2500
NDCG@10:  0.500700
NDCG@20:  0.534900
NDCG@100: 0.598400
P20:      0.198700
MAP:      0.439500
MRR:      0.586900
Recall:   0.887168


## Training a fusion of (query-normalized) BM25 and BERT-model scores
One needs to start a query server that binds to the port 8080 as shown below. This needs to be done in __a separate terminal__, because notebooks do not support background processes. Please, note we have to specify __the same maximum query and document lengths__ as during the training process.

```
scripts/py_featextr_server/cedr_server.py  \
   --init_model collections/wikipedia_dpr_nq_sample/derived_data/ir_models/vanilla_bert/model.best \
   --max_query_len 64 \
   --max_doc_len 445 \
   --port 8080
```

Now we can run an experiment by training using a `train_fusion` subset of queries and testing on the `dev` subset. Please, note the following:
1. During training time use 20 candidates, but for testing on `dev` we re-rank 50 candidates. The ranking of candidates below 50th position will not change.
2. We use two threads and output log to the screen (i.e., the process is no started in a separate shell).
3. Training uses only the __first 5000__ queries from the fusion set.

In [None]:
!scripts/exper/run_experiments.sh \
   wikipedia_dpr_nq_sample \
   exper_desc.best/bm25_cedr8080.json \
   -max_num_query_train 5000 \
   -train_cand_qty 20 \
   -max_final_rerank_qty 50 \
   -test_part dev \
   -thread_qty 2 \
   -no_separate_shell

All the results are available in the directory `collections/wikipedia_dpr_nq_sample/results/dev/feat_exper/bm25_cedr8080`.

The following is a summary report (top-100):

In [6]:
!cat collections/wikipedia_dpr_nq_sample/results/dev/feat_exper/bm25_cedr8080/rep/out_100.rep

# of queries:    2500
NDCG@10:  0.570300
NDCG@20:  0.583000
NDCG@100: 0.625100
P20:      0.191000
MAP:      0.496600
MRR:      0.677900
Recall:   0.808288


## Training a fusion of BM25 and ANCE

In [15]:
!scripts/exper/run_experiments.sh \
   wikipedia_dpr_nq_sample \
   exper_desc.best/bm25_ance.json \
   -max_num_query_train 5000 \
   -train_cand_qty 20 \
   -test_part dev


Using collection root: collections
The number of CPU cores:      8
The number of || experiments: 1
The number of threads:        8
Experiment descriptor file:                                 collections/wikipedia_dpr_nq_sample/exper_desc.best/bm25_ance.json
Default test set:                                           dev
Number of parallel experiments:                             1
Number of threads in feature extractors/query applications: 8
Parsed experiment parameters:
experSubdir:feat_exper/bm25_ance
candProvAddConfParam:exper_desc.best/lucene.json
extrType:exper_desc.best/extractors/bm25_ance.json
testOnly:0
Started a process 32022, working dir: collections/wikipedia_dpr_nq_sample/results/dev/feat_exper/bm25_ance
Process log file: collections/wikipedia_dpr_nq_sample/results/dev/feat_exper/bm25_ance/exper.log
Waiting for 1 child processes
Process with pid=32022 finished successfully.
Waiting for 0 child processes
1 experiments executed
0 experiments failed


Top-100 results:

In [16]:
!cat collections/wikipedia_dpr_nq_sample/results/dev/feat_exper/bm25_ance/rep/out_100.rep

# of queries:    2500
NDCG@10:  0.655400
NDCG@20:  0.657300
NDCG@100: 0.698300
P20:      0.155500
MAP:      0.561800
MRR:      0.866000
Recall:   0.652528
