<a href="https://colab.research.google.com/github/marcospiau/ia368-dd-dl4ir/blob/main/aula02/aula03_rerank.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# What's in here

runs using:
- pyserini BM25
- our finetuned reranker: https://huggingface.co/marcospiau/MiniLM-L6-H384-uncased-msmarco-tiny-finetune
- a good model cross_encoder: https://huggingface.co/cross-encoder/ms-marco-TinyBERT-L-2

In [None]:
!pip install -q transformers toolz datasets ftfy neptune-client polars sentence_transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.8/6.8 MB[0m [31m86.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m469.0/469.0 KB[0m [31m36.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.1/53.1 KB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m443.8/443.8 KB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.2/16.2 MB[0m [31m53.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 KB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m55.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.2/199.2 KB[0m [31m20.7 MB/s[0m 

# Imports

In [362]:
import toolz
import transformers
import torch
import datasets
import pandas as pd
import toolz
import fileinput
import os
import gc
import itertools
import functools
import more_itertools
import random
from collections import Counter
import ftfy
import multiprocessing as mp
import polars as pl
import matplotlib.pyplot as plt
import numpy as np
import json

from tqdm import tqdm
import pyarrow as pa

%matplotlib inline
%config InlineBackend.figure_format='retina'

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Reranker finetuning

Already done on finetune notebook.

# Reranking

## Environment Setup

This notebook supposes that the following requirements are ready:
- MSMARCO data is available locally
- `anserini`, `anserini-tools` and `pyserini` are installed

## Loading queries with associated relevance judgements

We can only evalute queries with associated relevant documents, so we will remove queries with no relevance judgments.

These queries were generated during last class and we will use it 

In [2]:
%%bash
ls -lht collections/msmarco-passage/topics.dl20.small.tsv
head collections/msmarco-passage/topics.dl20.small.tsv
wc -l collections/msmarco-passage/topics.dl20.small.tsv

-rw-rw-r-- 1 marcospiau marcospiau 2.3K Mar  6 15:43 collections/msmarco-passage/topics.dl20.small.tsv
1030303	who is aziz hashim
1037496	who is rep scalise?
1043135	who killed nicholas ii of russia
1051399	who sings monk theme song
1064670	why do hunters pattern their shotguns?
1071750	why is pete rose banned from hall of fame
1105792	define: geon
1106979	define pareto chart in statistics
1108651	what the best way to get clothes white
1109707	what medium do radio waves travel through
54 collections/msmarco-passage/topics.dl20.small.tsv


In [3]:
df_queries = (pl.read_csv('collections/msmarco-passage/topics.dl20.small.tsv',
                       has_header=False, sep='\t', 
                       new_columns=['qid', 'query']))
len(df_queries), df_queries[:5]

(54,
 shape: (5, 2)
 ┌─────────┬─────────────────────────────────────┐
 │ qid     ┆ query                               │
 │ ---     ┆ ---                                 │
 │ i64     ┆ str                                 │
 ╞═════════╪═════════════════════════════════════╡
 │ 1030303 ┆ who is aziz hashim                  │
 │ 1037496 ┆ who is rep scalise?                 │
 │ 1043135 ┆ who killed nicholas ii of russia    │
 │ 1051399 ┆ who sings monk theme song           │
 │ 1064670 ┆ why do hunters pattern their sho... │
 └─────────┴─────────────────────────────────────┘)

In [4]:
queries = df_queries.to_dicts()
len(queries), queries[:5]

(54,
 [{'qid': 1030303, 'query': 'who is aziz hashim'},
  {'qid': 1037496, 'query': 'who is rep scalise?'},
  {'qid': 1043135, 'query': 'who killed nicholas ii of russia'},
  {'qid': 1051399, 'query': 'who sings monk theme song'},
  {'qid': 1064670, 'query': 'why do hunters pattern their shotguns?'}])

## BM25 run using pyserni

We alreday have a run generated using Pyserini's BM25 from last class exercise and we will use it as a baseline result and also to use it as initial data for reranking. The command below was used to generate this run:

```bash
# Obs.: salvei em formato trec pra usar o script trec_eval
mkdir -pv runs
time python3 -m pyserini.search.lucene \
  --index indexes/lucene-index-msmarco-passage \
  --topics collections/msmarco-passage/topics.dl20.small.tsv \
  --output runs/run.dl20-passage.small.bm25default.txt \
  --output-format trec \
  --hits 1000 \
  --bm25 --k1 0.9 --b 0.4
```

In [5]:
%%bash
wc runs/run.dl20-passage.small.bm25default.txt
head runs/run.dl20-passage.small.bm25default.txt

  54000  324000 2175645 runs/run.dl20-passage.small.bm25default.txt
23849 Q0 4348282 1 10.066300 Anserini
23849 Q0 2674124 2 9.865500 Anserini
23849 Q0 7119957 3 9.644200 Anserini
23849 Q0 8133127 4 9.431700 Anserini
23849 Q0 542113 5 9.385200 Anserini
23849 Q0 2516458 6 9.338800 Anserini
23849 Q0 4834498 7 9.251600 Anserini
23849 Q0 436721 8 9.249500 Anserini
23849 Q0 6667419 9 9.207700 Anserini
23849 Q0 8246990 10 9.108900 Anserini


Metrics:

In [6]:
!tools/eval/trec_eval.9.0.4/trec_eval -M 1000 -m ndcg_cut.10 tools/topics-and-qrels/qrels.dl20-passage.txt runs/run.dl20-passage.small.bm25default.txt

ndcg_cut_10           	all	0.4796


## Getting data for reranking

In [26]:
def raw_to_doc(raw):
    decoded = json.loads(raw)
    return {'id': decoded['id'],
            'contents': ftfy.fix_text(decoded['contents'])}

def get_doc(index_reader, docid: str):
    docid = str(docid)
    raw = index_reader.doc(docid).raw()
    doc = raw_to_doc(raw)
    assert doc['id'] == docid
    return doc['contents']

def load_run_polars(path):
    return pl.read_csv(path, sep=' ', has_header=False, new_columns=[
        'qid', 'q0', 'docid', 'rank', 'score', 'run_id'])

def add_raw_docs_to_run_polars(df_run, index_reader):
    get_doc_partial = functools.partial(get_doc, index_reader)
    # retrieve distinct docids from index
    doc_contents = (
        df_run.select(pl.col('docid').unique())
        .with_columns(
            pl.col('docid').apply(get_doc_partial).alias('document'))
    )
    return (
        df_run.select(pl.exclude('document'))
        .join(doc_contents, on='docid', how='left'))

def add_raw_batch_docs_to_run_polars(df_run, searcher, threads=32):
    docids = df_run.get_column('docid').unique().cast(pl.Utf8).to_list()
    doc_map = toolz.valmap(
        lambda doc: raw_to_doc(doc.raw())['contents'],
        searcher.batch_doc(docids, threads=threads)
    )
    return df_run.with_columns(
        pl.col('docid').cast(pl.Utf8).map_dict(doc_map).alias('document')
    )

def add_query_to_runs(df_run, df_queries):
    return (
        df_run.select(pl.exclude('query'))
        .join(df_queries, on='qid', how='left'))

In [305]:
def sort_by_n_words(df, cols=['query', 'document']):
    """Sort dataframe by total count of words of columns `cols`"""
    n_words_expr = (
        pl.sum(pl.col(*cols).str.split(' ').arr.lengths()).alias('n_words'))
    return df.sort(n_words_expr)

In [19]:
from pyserini.index.lucene import IndexReader
from pyserini.search.lucene import LuceneSearcher

# we need the index with raw docs, not the slim one
index_reader = IndexReader.from_prebuilt_index('msmarco-v1-passage')
searcher = LuceneSearcher.from_prebuilt_index('msmarco-v1-passage')

In [20]:
index_reader.stats()

{'total_terms': 352316036,
 'documents': 8841823,
 'non_empty_documents': 8841823,
 'unique_terms': 2660824}

In [33]:
df_bm25_run = load_run_polars('runs/run.dl20-passage.small.bm25default.txt')
df_bm25_run.head()

qid,q0,docid,rank,score,run_id
i64,str,i64,i64,f64,str
23849,"""Q0""",4348282,1,10.0663,"""Anserini"""
23849,"""Q0""",2674124,2,9.8655,"""Anserini"""
23849,"""Q0""",7119957,3,9.6442,"""Anserini"""
23849,"""Q0""",8133127,4,9.4317,"""Anserini"""
23849,"""Q0""",542113,5,9.3852,"""Anserini"""


In [34]:
df_bm25_run = df_bm25_run.pipe(add_query_to_runs, df_queries)
df_bm25_run.head()

qid,q0,docid,rank,score,run_id,query
i64,str,i64,i64,f64,str,str
23849,"""Q0""",4348282,1,10.0663,"""Anserini""","""are naturaliza..."
23849,"""Q0""",2674124,2,9.8655,"""Anserini""","""are naturaliza..."
23849,"""Q0""",7119957,3,9.6442,"""Anserini""","""are naturaliza..."
23849,"""Q0""",8133127,4,9.4317,"""Anserini""","""are naturaliza..."
23849,"""Q0""",542113,5,9.3852,"""Anserini""","""are naturaliza..."


In [35]:
# 1 by 1
# df_bm25_run = df_bm25_run.pipe(add_raw_docs_to_run_polars, index_reader)
df_bm25_run.head()

qid,q0,docid,rank,score,run_id,query
i64,str,i64,i64,f64,str,str
23849,"""Q0""",4348282,1,10.0663,"""Anserini""","""are naturaliza..."
23849,"""Q0""",2674124,2,9.8655,"""Anserini""","""are naturaliza..."
23849,"""Q0""",7119957,3,9.6442,"""Anserini""","""are naturaliza..."
23849,"""Q0""",8133127,4,9.4317,"""Anserini""","""are naturaliza..."
23849,"""Q0""",542113,5,9.3852,"""Anserini""","""are naturaliza..."


In [36]:
# multithreaded batch
df_bm25_run = add_raw_batch_docs_to_run_polars(df_bm25_run, searcher)
df_bm25_run.head()

qid,q0,docid,rank,score,run_id,query,document
i64,str,i64,i64,f64,str,str,str
23849,"""Q0""",4348282,1,10.0663,"""Anserini""","""are naturaliza...","""Civil Records ..."
23849,"""Q0""",2674124,2,9.8655,"""Anserini""","""are naturaliza...","""See our FAQ's ..."
23849,"""Q0""",7119957,3,9.6442,"""Anserini""","""are naturaliza...","""Yes, in most c..."
23849,"""Q0""",8133127,4,9.4317,"""Anserini""","""are naturaliza...","""Spokeo pulls d..."
23849,"""Q0""",542113,5,9.3852,"""Anserini""","""are naturaliza...","""Public Records..."


## Reranking with our finetuned model

In [39]:
from torch.utils.data import DataLoader
from transformers.trainer_utils import RemoveColumnsCollator
from transformers import DataCollatorWithPadding
import inspect

In [495]:
def encode(ex, tokenizer, **tokenizer_kwargs):
    """Encode a pair of query and document using a tokenizer"""
    return tokenizer(ex['query'], ex['document'], **tokenizer_kwargs)

def generate_new_run(df_run, scores):
    """Generate new run from df_run and array of scores"""
    df = (df_run.select('qid', 'q0', 'docid',
                        pl.from_arrow(scores).alias('score'),
                        pl.lit('DONT_CARE').alias('run_id')))
    rank_expr = (pl.col('score').rank(method='ordinal', descending=True)\
                 .over('qid').alias('rank'))
    df = df.with_columns(rank_expr).sort('qid', 'rank')
    return df

def write_df_run(df_run, path):
    """Serialize polars dataframe to trec format"""
    cols_order = ['qid', 'q0', 'docid', 'rank', 'score', 'run_id']
    to_write = df_run.select(cols_order).sort('qid', 'rank').rechunk()
    return to_write.write_csv(path, has_header=False, sep=' ')

class Reranker:
    def __init__(self, model, tokenizer, max_length=200):
        self.model = model
        self.tokenizer = tokenizer
        self.max_length = max_length
        self.encode = functools.partial(encode, tokenizer=tokenizer,
                                        max_length=max_length,
                                        padding=False, truncation=True,
                                        return_length=True)
        self.data_collator = DataCollatorWithPadding(
            tokenizer=tokenizer, max_length=max_length,padding=True)
        self.collate_fn = RemoveColumnsCollator(
            data_collator=self.data_collator,
            signature_columns=set(
                inspect.signature(model.forward).parameters.keys()).union(
                    {'label', 'label_ids'}))
        self.partial_dataloader = functools.partial(
            DataLoader, shuffle=False, collate_fn=self.collate_fn)
        n_classes = model.classifier.out_features
        if n_classes == 1:
            self.is_cross_encoder = True
        elif n_classes == 2:
            self.is_cross_encoder = False
        else:
            raise ValueError(
                f'n_classes should be 1 or 2, got {n_classes}')

    
    @torch.no_grad()
    def __call__(self, df_runs, batch_size=128, sort_by_length=True):
        """Process a existing df_run into a new one, using reranked scores"""
        # sort by n_words to reduce padding
        if sort_by_length:
            df_runs = sort_by_n_words(df_runs)
        # load data into HuggingFaace dataset (zero-copy)
        ds = datasets.arrow_dataset.Dataset(df_runs.to_arrow())
        # encode and set format to torch (lazily)
        ds = ds.with_transform(reranker.encode).to_iterable_dataset()
        ds = ds.with_format('torch')
        device = self.model.device
        dataloader = self.partial_dataloader(ds, batch_size=batch_size)

        all_scores = []
        for batch in tqdm(dataloader, 'Scoring query-doc pairs',
                          total=len(df_runs) // batch_size):
            # send batch to same device as model
            batch = toolz.valmap(lambda x: x.to(device), batch)
            logits = self.model(**batch).logits
            # OBS.: fiz isso pra funcionar apenas com o cross encoder que ia
            # usar, não sei se fica certo pra todos
            if self.is_cross_encoder:
                scores = logits.ravel().sigmoid()
            else:
                # scores = torch.nn.functional.log_softmax(logits, dim=1)[:, 1].exp()
                scores = logits.softmax(-1)[:, 1]
            all_scores.append(pa.array(scores.numpy()))

            
        all_scores = pa.concat_arrays(all_scores)
        new_run = generate_new_run(df_runs, all_scores)
        return new_run

In [496]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = AutoModelForSequenceClassification.from_pretrained('marcospiau/MiniLM-L6-H384-uncased-msmarco-tiny-finetune')
model.to(device)
tokenizer = AutoTokenizer.from_pretrained('marcospiau/MiniLM-L6-H384-uncased-msmarco-tiny-finetune')
reranker = Reranker(model, tokenizer)

In [497]:
df_run_finetune = reranker(df_bm25_run, 128, True)
print(df_run_finetune, df_run_finetune[:5])

Scoring query-doc pairs:   0%|                                                                                                                                                                                                                             | 0/421 [00:00<?, ?it/s]You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Scoring query-doc pairs: 422it [04:02,  1.74it/s]

shape: (54000, 6)
┌─────────┬─────┬─────────┬──────────┬───────────┬──────┐
│ qid     ┆ q0  ┆ docid   ┆ score    ┆ run_id    ┆ rank │
│ ---     ┆ --- ┆ ---     ┆ ---      ┆ ---       ┆ ---  │
│ i64     ┆ str ┆ i64     ┆ f32      ┆ str       ┆ u32  │
╞═════════╪═════╪═════════╪══════════╪═══════════╪══════╡
│ 23849   ┆ Q0  ┆ 2647769 ┆ 0.999556 ┆ DONT_CARE ┆ 1    │
│ 23849   ┆ Q0  ┆ 8010559 ┆ 0.999528 ┆ DONT_CARE ┆ 2    │
│ 23849   ┆ Q0  ┆ 8010561 ┆ 0.999473 ┆ DONT_CARE ┆ 3    │
│ 23849   ┆ Q0  ┆ 8010558 ┆ 0.999406 ┆ DONT_CARE ┆ 4    │
│ ...     ┆ ... ┆ ...     ┆ ...      ┆ ...       ┆ ...  │
│ 1136962 ┆ Q0  ┆ 80877   ┆ 0.000278 ┆ DONT_CARE ┆ 997  │
│ 1136962 ┆ Q0  ┆ 8065423 ┆ 0.000277 ┆ DONT_CARE ┆ 998  │
│ 1136962 ┆ Q0  ┆ 7101410 ┆ 0.000274 ┆ DONT_CARE ┆ 999  │
│ 1136962 ┆ Q0  ┆ 1880431 ┆ 0.000272 ┆ DONT_CARE ┆ 1000 │
└─────────┴─────┴─────────┴──────────┴───────────┴──────┘ shape: (5, 6)
┌───────┬─────┬─────────┬──────────┬───────────┬──────┐
│ qid   ┆ q0  ┆ docid   ┆ score    ┆ run_i




In [503]:
FINETUNE_RUN_FILE = 'runs/run.dl20-passage.small.reranker.mini.lm.20.epochs.v1.txt'
write_df_run(df_run_finetune, FINETUNE_RUN_FILE)
!wc -l {FINETUNE_RUN_FILE}
!head {FINETUNE_RUN_FILE}
!tools/eval/trec_eval.9.0.4/trec_eval -M 1000 -m ndcg_cut.10 tools/topics-and-qrels/qrels.dl20-passage.txt runs/run.dl20-passage.small.reranker.mini.lm.20.epochs.v1.txt

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
54000 runs/run.dl20-passage.small.reranker.mini.lm.20.epochs.v1.txt
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
23849 Q0 2647769 1 0.999556 DONT_CARE
23849 Q0 8010559 2 0.9995277 DONT_CARE
23849 Q0 8010561 3 0.9994733 DONT_CARE
23849 Q0 8010558 4 0.99940634 DONT_CARE
23849 Q0 188246 5 0.998933 DONT_CARE
23849 Q0 1680203 6 0.99798274 DONT_CARE
23849 Q0 653142 7 0.9978331 DONT_CARE
23849 Q0 4806514 8 0.9974049 DONT_CARE
23849 Q0 1622747 9 0.9962842 DONT_CARE
23849 Q0 1449785 

In [499]:
del model, tokenizer, reranker
gc.collect()

3160

## Reranking with a (probably) better finetune model - https://huggingface.co/cross-encoder/ms-marco-TinyBERT-L-2

In [500]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_name = 'cross-encoder/ms-marco-TinyBERT-L-2'
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [501]:
reranker = Reranker(model, tokenizer)
df_run_cross_encoder = reranker(df_bm25_run, 128, True)
print(df_run_cross_encoder.shape, df_run_cross_encoder, df_run_cross_encoder[:5])

Scoring query-doc pairs:   0%|                                                                                                                                                                                                                             | 0/421 [00:00<?, ?it/s]You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Scoring query-doc pairs: 422it [00:43,  9.72it/s]

(54000, 6) shape: (54000, 6)
┌─────────┬─────┬─────────┬──────────┬───────────┬──────┐
│ qid     ┆ q0  ┆ docid   ┆ score    ┆ run_id    ┆ rank │
│ ---     ┆ --- ┆ ---     ┆ ---      ┆ ---       ┆ ---  │
│ i64     ┆ str ┆ i64     ┆ f32      ┆ str       ┆ u32  │
╞═════════╪═════╪═════════╪══════════╪═══════════╪══════╡
│ 23849   ┆ Q0  ┆ 8010561 ┆ 0.958647 ┆ DONT_CARE ┆ 1    │
│ 23849   ┆ Q0  ┆ 2647769 ┆ 0.954198 ┆ DONT_CARE ┆ 2    │
│ 23849   ┆ Q0  ┆ 4834498 ┆ 0.932085 ┆ DONT_CARE ┆ 3    │
│ 23849   ┆ Q0  ┆ 8010558 ┆ 0.921999 ┆ DONT_CARE ┆ 4    │
│ ...     ┆ ... ┆ ...     ┆ ...      ┆ ...       ┆ ...  │
│ 1136962 ┆ Q0  ┆ 6058232 ┆ 0.000907 ┆ DONT_CARE ┆ 997  │
│ 1136962 ┆ Q0  ┆ 4481766 ┆ 0.000905 ┆ DONT_CARE ┆ 998  │
│ 1136962 ┆ Q0  ┆ 3422218 ┆ 0.000897 ┆ DONT_CARE ┆ 999  │
│ 1136962 ┆ Q0  ┆ 8239770 ┆ 0.000871 ┆ DONT_CARE ┆ 1000 │
└─────────┴─────┴─────────┴──────────┴───────────┴──────┘ shape: (5, 6)
┌───────┬─────┬─────────┬──────────┬───────────┬──────┐
│ qid   ┆ q0  ┆ docid   ┆ score




In [502]:
CROSS_ENCODER_RUN_FILE = 'runs/run.dl20-passage.small.reranker.cross.encoder.msmarco.tiny.bert.txt'
write_df_run(df_run_cross_encoder, CROSS_ENCODER_RUN_FILE)
!wc -l {CROSS_ENCODER_RUN_FILE}
!head {CROSS_ENCODER_RUN_FILE}

!tools/eval/trec_eval.9.0.4/trec_eval -M 1000 -m ndcg_cut.10 tools/topics-and-qrels/qrels.dl20-passage.txt {CROSS_ENCODER_RUN_FILE}

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
54000 runs/run.dl20-passage.small.reranker.cross.encoder.msmarco.tiny.bert.txt
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
23849 Q0 8010561 1 0.95864666 DONT_CARE
23849 Q0 2647769 2 0.9541975 DONT_CARE
23849 Q0 4834498 3 0.93208456 DONT_CARE
23849 Q0 8010558 4 0.92199904 DONT_CARE
23849 Q0 5888570 5 0.90353805 DONT_CARE
23849 Q0 4091551 6 0.7117987 DONT_CARE
23849 Q0 2017213 7 0.69111776 DONT_CARE
23849 Q0 8246990 8 0.66432697 DONT_CARE
23849 Q0 8769995 9 0.6448785 DONT_CAR