# Title Baseline for TOMT retrieval

### Resources

- The [PyTerrier tutorial](https://github.com/terrier-org/ecir2021tutorial)
- The [PyTerrier documentation](https://pyterrier.readthedocs.io/en/latest/)



### Step 1: Import Dependencies

In [6]:
import pyterrier as pt
import pandas as pd
from tira.third_party_integrations import ensure_pyterrier_is_loaded, get_preconfigured_chatnoir_client, get_input_directory_and_output_directory, persist_and_normalize_run
import json
from tqdm import tqdm

ensure_pyterrier_is_loaded()
input_directory, output_directory = get_input_directory_and_output_directory('/workspace/tomt-dataset-tira')

chatnoir = get_preconfigured_chatnoir_client(config_directory = input_directory, features = [], verbose = True, num_results=1000, page_size=1000)

I will use a small hardcoded example located in /workspace/tomt-dataset-tira.
The output directory is /tmp/
ChatNoir Client will retrieve the top-1000 with page size of 1000 from index ClueWeb22 with 25 retries.


### Step 2: Load the Data

In [2]:
print('Step 2: Load the data.')

queries = pt.io.read_topics(input_directory + '/queries.xml', format='trecxml')

Step 2: Load the data.


In [4]:
print(queries)

  qid                                              query
0  20   website selling t shirts bags posters with te...
1  21   dutch dystopian webcomic having a white laven...
2  22   search engine for pictures of a predominant c...


### Step 3: Create Run

In [5]:
print('Step 3: Create Run.')
run = chatnoir(queries)

Step 3: Create Run.


Searching with ChatNoir: 100%|██████████████████████████████████████████████████████████████████████████████████| 3/3 [09:40<00:00, 193.62s/query]


In [7]:
run.head(3)

Unnamed: 0,qid,query,docno,score,rank
0,20,website selling t shirts bags posters with te...,clueweb22-en0032-53-13789,1446.5402,0
1,20,website selling t shirts bags posters with te...,clueweb22-en0004-43-14766,1431.8256,1
2,20,website selling t shirts bags posters with te...,clueweb22-en0036-45-11003,1398.1755,2


In [6]:
print('Step 4: Run stence detection')

def detect_stance(query_document_pair):
    # As baseline, we return always neutral
    return 'NEU'

run['Q0'] = run.apply(lambda i: detect_stance(i), axis=1)


Step 4: Run stence detection


### Step 4: Persist Run

In [8]:
print('Step 4: Persist Run.')

persist_and_normalize_run(run, 'chatnoir-title-baseline', output_file=output_directory + '/run.txt')

print('Done...')

Step 4: Persist Run.
Done...


In [9]:
!head -3 {output_directory}/run.txt

20 0 clueweb22-en0032-53-13789 1 1446.5402 chatnoir-title-baseline
20 0 clueweb22-en0004-43-14766 2 1431.8256 chatnoir-title-baseline
20 0 clueweb22-en0036-45-11003 3 1398.1755 chatnoir-title-baseline
