# Using Our Margin-MSE trained BERT_Cat Checkpoint

We provide a fully retrieval trained (with Margin-MSE using a 3 teacher Bert_Cat Ensemble on MSMARCO-Passage) DistilBert-based instance on the HuggingFace model hub here: https://huggingface.co/sebastian-hofstaetter/distilbert-cat-margin_mse-T2-msmarco

This instance can be used to **re-rank a candidate set** or **directly for a vector index based dense retrieval**. The architecure is a 6-layer DistilBERT, with an additional single linear layer at the end. 

If you want to know more about our simple, yet effective knowledge distillation method for efficient information retrieval models for a variety of student architectures, check out our paper: https://arxiv.org/abs/2010.02666 üéâ

This notebook gives you a minimal usage example of downloading our BERT_Cat checkpoint to encode concatenated passages and queries to create a score of their relevance. 



---


Let's get started by installing the awesome *transformers* library from HuggingFace:


In [None]:
pip install transformers

The next step is to download our checkpoint and initialize the tokenizer and models:


In [14]:
from transformers import AutoTokenizer,AutoModel, PreTrainedModel,PretrainedConfig
from typing import Dict
import torch

class BERT_Cat_Config(PretrainedConfig):
    model_type = "BERT_Cat"
    bert_model: str
    trainable: bool = True

class BERT_Cat(PreTrainedModel):
    """
    The vanilla/mono BERT concatenated (we lovingly refer to as BERT_Cat) architecture 
    -> requires input concatenation before model, so that batched input is possible
    """
    config_class = BERT_Cat_Config
    base_model_prefix = "bert_model"

    def __init__(self,
                 cfg) -> None:
        super().__init__(cfg)
        
        self.bert_model = AutoModel.from_pretrained(cfg.bert_model)

        for p in self.bert_model.parameters():
            p.requires_grad = cfg.trainable

        self._classification_layer = torch.nn.Linear(self.bert_model.config.hidden_size, 1)

    def forward(self,
                query_n_doc_sequence):

        vecs = self.bert_model(**query_n_doc_sequence)[0][:,0,:] # assuming a distilbert model here
        score = self._classification_layer(vecs)
        return score

#
# init the model & tokenizer (using the distilbert tokenizer)
#
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") # honestly not sure if that is the best way to go, but it works :)
model = BERT_Cat.from_pretrained("sebastian-hofstaetter/distilbert-cat-margin_mse-T2-msmarco")

Now we are ready to use the model to encode two sample passage and query pairs:

In [15]:
# our relevant example (with the query)
passage1_input = tokenizer("what is the transformers library","We are very happy to show you the ü§ó Transformers library for pre-trained language models. We are helping the community work together towards the goal of advancing NLP üî•.",return_tensors="pt")
# a non-relevant example (with the query)
passage2_input = tokenizer("what is the transformers library","Hmm I don't like this new movie about transformers that i got from my local library. Those transformers are robots?",return_tensors="pt")

#print("Passage 1 Tokenized:",passage1_input)
#print("Passage 2 Tokenized:",passage2_input)

# note how we call the bert model for pairs, can not be changed (look for colbert or bert_dot for independent forward calls)
score_for_p1 = model.forward(passage1_input).squeeze(0)
score_for_p2 = model.forward(passage2_input).squeeze(0)

print("---")
print("Score passage 1 <-> query: ",float(score_for_p1))
print("Score passage 2 <-> query: ",float(score_for_p2))

---
Score passage 1 <-> query:  5.899686336517334
Score passage 2 <-> query:  2.2803378105163574


As we see the model gives the first passage a higher score than the second - these scores would now be used to generate a list (if we run this comparison on all passages in our candidate set).

- If you want to look at more complex usages and training code we have a library for that: https://github.com/sebastian-hofstaetter/transformer-kernel-ranking üëè

- If you use our model checkpoint please cite our work as:

    ```
@misc{hofstaetter2020_crossarchitecture_kd,
      title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation}, 
      author={Sebastian Hofst{\"a}tter and Sophia Althammer and Michael Schr{\"o}der and Mete Sertkan and Allan Hanbury},
      year={2020},
      eprint={2010.02666},
      archivePrefix={arXiv},
      primaryClass={cs.IR}
}
    ```

Thank You üòä If you have any questions feel free to reach out to Sebastian via mail (email in the paper). 
