## Loading the Bert Model

In this project we use the `bert-base-uncased` model from huggingface as a base for our transformer model.

In [1]:
from transformers import BertTokenizer, BertModel

### Initialise a pre-trained model

In [2]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained("bert-base-uncased")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [6]:
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')

In [17]:
encoded_input['input_ids'][0]

tensor([ 101, 5672, 2033, 2011, 2151, 3793, 2017, 1005, 1040, 2066, 1012,  102])

In [22]:
' '.join(tokenizer.convert_ids_to_tokens(encoded_input['input_ids'][0]))

"[CLS] replace me by any text you ' d like . [SEP]"

In [7]:
output = model(encoded_input['input_ids'])

In [11]:
output['last_hidden_state'].shape

torch.Size([1, 12, 768])

### Initialise a custom model

Refer to: https://huggingface.co/transformers/v3.0.2/model_doc/bert.html#transformers.BertConfig

In [24]:
from transformers import BertModel, BertConfig

# Initializing a BERT bert-base-uncased style configuration
configuration = BertConfig()

# Initializing a model from the bert-base-uncased style configuration
model = BertModel(configuration)

# Accessing the model configuration
configuration = model.config

In [29]:
model(encoded_input['input_ids'])['last_hidden_state'].shape

torch.Size([1, 12, 768])