<a href="https://colab.research.google.com/github/santiagxf/interpret/blob/main/saliency-maps/loading_transformers_allennlp.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Loading a HuggingFace model into AllenNLP

In [111]:
%pip install transformers allennlp eli5 --quiet
%pip install -U google-cloud-storage==1.40.0 --quiet

In this case we will use `nlptown/bert-base-multilingual-uncased-sentiment`. This is a bert-base-multilingual-uncased model finetuned for sentiment analysis on product reviews in six languages: English, Dutch, German, French, Spanish and Italian. It predicts the sentiment of the review as a number of stars (between 1 and 5).

The model can be used directly as a sentiment analysis model for product reviews in any of the six languages, or further finetuned on related sentiment analysis tasks. To keep the example small, we won't do any fine-tuning with our own data in this opportunity.

## Loading the model with transformers

In [1]:
from transformers.models.auto import AutoConfig, AutoModelForSequenceClassification
from transformers.models.auto.tokenization_auto import AutoTokenizer

model_uri = 'nlptown/bert-base-multilingual-uncased-sentiment'

config = AutoConfig.from_pretrained(model_uri)
tokenizer = AutoTokenizer.from_pretrained(model_uri)
classifier = AutoModelForSequenceClassification.from_pretrained(model_uri, config=config)

Downloading:   0%|          | 0.00/953 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/851k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/638M [00:00<?, ?B/s]

The transformers library provides a convenient way to store all the artifacts of a given model, and that is using the functionsave_pretrained from the model.

In [2]:
model_path = 'rating_classifier'
model_name = 'rating_classifier'
classifier.save_pretrained(model_path)

This will generate a single file called pytorch_model.bin which contains the weights of the model itself. However, remember that in order to run the model we also need it's corresponding tokenizer. The same save_pretrained method is available for the tokenizer, which will generate other set of files:

In [3]:
tokenizer.save_pretrained(model_path)

('rating_classifier/tokenizer_config.json',
 'rating_classifier/special_tokens_map.json',
 'rating_classifier/vocab.txt',
 'rating_classifier/added_tokens.json',
 'rating_classifier/tokenizer.json')

## Loading the saved model using AllenNLP

### Vocabulary

As with any other framework, a vocabulary in AllenNLP maps strings to integers. They are fit to a particular dataset, which is used to decide which tokens are in-vocabulary. Any token that is outside of the vocabulary is mapped to a particular token called out-of-vocabulary.
An important distinction is that in AllenNLP, vocabularies can have different namespaces so you can have separate indices for the same token. For instance, you can have one vocabulary for your words as inputs and another one for your words as outputs (in a text generation setting for instance).

In [4]:
from allennlp.data.vocabulary import Vocabulary

transformer_vocab = Vocabulary.from_pretrained_transformer(model_name)

### Tokenizer

The tokenization work is divided into 2 parts in AllenNLP, which allows having a more modular approach:
 
 - Tokenizer: A Tokenizer splits chunks of text into tokens. Typically, this either splits text into word tokens or character tokens. Its job is to split sequences of text into sequences of discreet words or tokens. It goes from text into sequences of text.
 - Indexer: its job is to take a sequence of tokens and translate them into word indexes in according to the vocabulary. It goes from sequences of text into sequences of indexes in the vocabulary.


In [5]:
from allennlp.data.tokenizers.pretrained_transformer_tokenizer import PretrainedTransformerTokenizer
from allennlp.data.token_indexers.pretrained_transformer_indexer import PretrainedTransformerIndexer

transformer_tokenizer = PretrainedTransformerTokenizer(model_name)
token_indexer = PretrainedTransformerIndexer(model_name)


### Embedder

The embedder's job is to provide vectors for each word index. Most NLP models utilize this kind of dense representation instead of indices. It basically takes a word's index and returns its vector representation.
AllenNLP supports providing multiple embedders for the same or different inputs. In our case, we are going to provide embeddings for inputs provided in a field called "tokens".

In [6]:
from allennlp.modules.text_field_embedders import BasicTextFieldEmbedder
from allennlp.modules.token_embedders.pretrained_transformer_embedder import PretrainedTransformerEmbedder

In [7]:
token_embedder = BasicTextFieldEmbedder({ "tokens": PretrainedTransformerEmbedder(model_name) })

Some weights of the model checkpoint at rating_classifier were not used when initializing BertModel: ['classifier.bias', 'classifier.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


> Note that AllenNLP support providing multiple embedders for different inputs. This is because it has this modular approach

A `Seq2VecEncoder` is a Module that takes as input a sequence of vectors and returns a single vector. The input shape would be `(batch_size, sequence_length, input_dim)` and return a `(batch_size, output_dim)` tensor. In the BERT architecture, there is a pooling layer at the end of the BERT model. This returns an embedding for the [CLS] token, after passing it through a non-linear tanh activation; the non-linear layer is also part of the BERT model.

We can create it in AllenNLP:

In [8]:
from allennlp.modules.seq2vec_encoders.bert_pooler import BertPooler

transformer_encoder = BertPooler(model_name)

### Building the model

In [68]:
from allennlp.models import BasicClassifier

model = BasicClassifier(vocab=transformer_vocab, 
                        text_field_embedder=token_embedder, 
                        seq2vec_encoder=transformer_encoder, 
                        dropout=0.1, 
                        num_labels=5)

Loading the model's weights

In [69]:
model._classification_layer.weight = classifier.classifier.weight
model._classification_layer.bias = classifier.classifier.bias

In [70]:
_ = model.eval()

## Data readers

AllenNLP uses the concept of DatasetReader which allows the creation of `Instance`'s which can provide the inputs in the format the model expects them. This abstraction allows the framwork to make any preprocessing needed before the data is actually sent to the model. 

In [77]:
from allennlp.data.dataset_readers import TextClassificationJsonReader

dataset_reader = TextClassificationJsonReader(token_indexers={ "tokens": token_indexer },
                                              tokenizer=transformer_tokenizer,
                                              max_sequence_length=400)


Testing if the reader works:

In [80]:
instance = dataset_reader.text_to_instance("this is a great read everyone should have")
dataset_reader.apply_token_indexers(instance)

In [112]:
from allennlp.nn import util
from allennlp.data import Batch, Instance

dataset = Batch([instance])
dataset.index_instances(transformer_vocab)
model_input = util.move_to_device(dataset.as_tensor_dict(), model._get_prediction_device())

In [113]:
model.make_output_human_readable(model(**model_input))

{'label': ['4'],
 'logits': tensor([[-2.3093, -2.4158, -0.5280,  1.7162,  2.7962]],
        grad_fn=<AddmmBackward0>),
 'probs': tensor([[0.0044, 0.0039, 0.0260, 0.2448, 0.7209]], grad_fn=<SoftmaxBackward0>),
 'token_ids': tensor([[  101, 10372, 10127,   143, 11838, 18593, 36053, 14693, 10574,   102]]),
 'tokens': [['[CLS]',
   'this',
   'is',
   'a',
   'great',
   'read',
   'everyone',
   'should',
   'have',
   '[SEP]']]}

## Making the tokenizer and the model a single piece

In [None]:
from allennlp.predictors import TextClassifierPredictor

predictor = TextClassifierPredictor(model, dataset_reader)

In [108]:
predictor.predict("this is a great read everyone should have")

{'label': '3',
 'logits': [-0.03681138902902603,
  0.12418578565120697,
  -0.22534894943237305,
  0.4560336172580719,
  0.14134210348129272],
 'probs': [0.17138470709323883,
  0.20132245123386383,
  0.1419355571269989,
  0.2805510461330414,
  0.20480619370937347],
 'token_ids': [101, 10372, 10127, 143, 11838, 18593, 36053, 14693, 10574, 102],
 'tokens': ['[CLS]',
  'this',
  'is',
  'a',
  'great',
  'read',
  'everyone',
  'should',
  'have',
  '[SEP]']}

# Using JSONNET

AllenNLP allows a declarative way of loading models. The entire architecture can be specified using the language JSONNET. This allows even faster iteration for trying new combination of different architectures as it only takes changing the relevant part in the declarative JSON.
The following JSONNET (which is indicated using the type Params but you can do exactly the same saving the json structure in a JSONNET file) is the very same equivalent to the model we used before.

In [123]:
from allennlp.common import Params
from allennlp.data.dataset_readers import DatasetReader

params = Params({
      "type": "text_classification_json",
      "tokenizer": {
          "type": "pretrained_transformer",
          "model_name": model_name,
      },
      "token_indexers": {
          "tokens": {
              "type": "pretrained_transformer",
              "model_name": model_name,
          }
      }
})

dataset_reader = DatasetReader.from_params(params)

In [124]:
from allennlp.common import Params
from allennlp.models import Model

params = Params({
    "type": "basic_classifier",
    "vocab": {
        "type": "from_pretrained_transformer",
        "model_name": model_name,
    },
    "text_field_embedder": {
        "type": "basic",
        "token_embedders": {
            "tokens": {
                "type": "pretrained_transformer",
                "model_name": model_name
            }
        }
    },
    "seq2vec_encoder": {
        "type": "bert_pooler",
        "pretrained_model": model_name
    },
    "dropout": 0.1,
    "num_labels": 5,
});

model = Model.from_params(params)
model._classification_layer.weight = classifier.classifier.weight
model._classification_layer.bias = classifier.classifier.bias
model.eval()

In [None]:
predictor = TextClassifierPredictor(model, dataset_reader)

If you don't want to load the weights of the model, then you can:

In [129]:
from allennlp.common import params
from allennlp.predictors import Predictor

params = Params({
    "type": "text_classifier",
    "dataset_reader": {
        "type": "text_classification_json",
        "tokenizer": {
            "type": "pretrained_transformer",
            "model_name": model_name,
        },
        "token_indexers": {
            "tokens": {
                "type": "pretrained_transformer",
                "model_name": model_name,
            }
        }
    },
    "model": {
        "type": "basic_classifier",
        "vocab": {
            "type": "from_pretrained_transformer",
            "model_name": model_name,
        },
        "text_field_embedder": {
            "type": "basic",
            "token_embedders": {
                "tokens": {
                    "type": "pretrained_transformer",
                    "model_name": model_name
                }
            }
        },
        "seq2vec_encoder": {
            "type": "bert_pooler",
            "pretrained_model": model_name
        },
        "dropout": 0.1,
        "num_labels": 5,
    }
})

predictor = Predictor.from_params(params)