# Using Our Margin-MSE trained Bert_Dot (or BERT Dense Retrieval) Checkpoint

We provide a fully retrieval trained (with Margin-MSE using a 3 teacher Bert_Cat Ensemble on MSMARCO-Passage) DistilBert-based instance on the HuggingFace model hub here: https://huggingface.co/sebastian-hofstaetter/distilbert-dot-margin_mse-T2-msmarco

This instance can be used to **re-rank a candidate set** or **directly for a vector index based dense retrieval**. The architecure is a 6-layer DistilBERT, without architecture additions or modifications (we only change the weights during training) - to receive a query/passage representation we pool the CLS vector. 

If you want to know more about our simple, yet effective knowledge distillation method for efficient information retrieval models for a variety of student architectures, check out our paper: https://arxiv.org/abs/2010.02666 🎉

This notebook gives you a minimal usage example of downloading our Bert_Dot checkpoint to encode passages and queries to create a dot-product based score of their relevance. 



---


Let's get started by installing the awesome *transformers* library from HuggingFace:


In [None]:
pip install transformers

The next step is to download our checkpoint and initialize the tokenizer and models:


In [25]:
from transformers import AutoTokenizer, AutoModel

# you can switch the model to the original "distilbert-base-uncased" to see that the usage example then breaks and the score ordering is reversed :O
#pre_trained_model_name = "distilbert-base-uncased"
pre_trained_model_name = "sebastian-hofstaetter/distilbert-dot-margin_mse-T2-msmarco"

tokenizer = AutoTokenizer.from_pretrained(pre_trained_model_name) 
bert_model = AutoModel.from_pretrained(pre_trained_model_name)

Now we are ready to use the model to encode two sample passages and a query:

In [26]:
# our relevant example
passage1_input = tokenizer("We are very happy to show you the 🤗 Transformers library for pre-trained language models. We are helping the community work together towards the goal of advancing NLP 🔥.",return_tensors="pt")
# a non-relevant example
passage2_input = tokenizer("Hmm I don't like this new movie about transformers that i got from my local library. Those transformers are robots?",return_tensors="pt")
# the user query -> which should give us a better score for the first passage
query_input = tokenizer("what is the transformers library",return_tensors="pt")

print("Passage 1 Tokenized:",passage1_input)
print("Passage 2 Tokenized:",passage2_input)
print("Query Tokenized:",query_input)

# note how we call the bert model independently between passages and query :)
# [0][:,0,:] pools (or selects) the CLS vector from the full output
passage1_encoded = bert_model(**passage1_input)[0][:,0,:].squeeze(0)
passage2_encoded = bert_model(**passage2_input)[0][:,0,:].squeeze(0)
query_encoded    = bert_model(**query_input)[0][:,0,:].squeeze(0)

print("---")
print("Passage Encoded Shape:",passage1_encoded.shape)
print("Query Encoded Shape:",query_encoded.shape)

Passage 1 Tokenized: {'input_ids': tensor([[  101,  2057,  2024,  2200,  3407,  2000,  2265,  2017,  1996,   100,
         19081,  3075,  2005,  3653,  1011,  4738,  2653,  4275,  1012,  2057,
          2024,  5094,  1996,  2451,  2147,  2362,  2875,  1996,  3125,  1997,
         10787, 17953,  2361,   100,  1012,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
Passage 2 Tokenized: {'input_ids': tensor([[  101, 17012,  1045,  2123,  1005,  1056,  2066,  2023,  2047,  3185,
          2055, 19081,  2008,  1045,  2288,  2013,  2026,  2334,  3075,  1012,
          2216, 19081,  2024, 13507,  1029,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1]])}
Query Tokenized: {'input_ids': tensor([[  101,  2054,  2003,  1996, 19081,  3075,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])}
---
Passage Encoded 

Now that we have our encoded vectors, we can generate the score with a simple dot product! 

(This can be offloaded to a vector indexing library like Faiss)


In [27]:
score_for_p1 = query_encoded.dot(passage1_encoded)
print("Score passage 1 <-> query: ",float(score_for_p1))

score_for_p2 = query_encoded.dot(passage2_encoded)
print("Score passage 2 <-> query: ",float(score_for_p2))

Score passage 1 <-> query:  108.82856750488281
Score passage 2 <-> query:  99.5865249633789


As we see the model gives the first passage a higher score than the second - these scores would now be used to generate a list (if we run this comparison on all passages in our collection or candidate set). The scores are in the 100+ range (as we create a dot-product of 768 dimensional vectors, which naturally gives a larger score)

*As a fun exercise you can swap the pre-trained model to the initial distilbert checkpoint and see that the example doesn't work anymore*

- If you want to look at more complex usages and training code we have a library for that: https://github.com/sebastian-hofstaetter/transformer-kernel-ranking 👏

- If you use our model checkpoint please cite our work as:

    ```
@misc{hofstaetter2020_crossarchitecture_kd,
      title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation}, 
      author={Sebastian Hofst{\"a}tter and Sophia Althammer and Michael Schr{\"o}der and Mete Sertkan and Allan Hanbury},
      year={2020},
      eprint={2010.02666},
      archivePrefix={arXiv},
      primaryClass={cs.IR}
}
    ```

Thank You 😊 If you have any questions feel free to reach out to Sebastian via mail (email in the paper). 
