# Models (PyTorch)

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [None]:
!pip install datasets evaluate transformers[sentencepiece]

## Creating a Transformer

In [None]:
from transformers import BertConfig, BertModel

# Building the config
config = BertConfig()

# Building the model from the config
model = BertModel(config)

In [None]:
print(config)

BertConfig {
  [...]
  "hidden_size": 768,
  "intermediate_size": 3072,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  [...]
}

The config is a blueprint of the model

## Different loading methods

In [None]:
from transformers import BertConfig, BertModel

config = BertConfig()
model = BertModel(config)

# Model is randomly initialized!

This model can be used in **this state** but the output will be **trash**

We must train it **first**. 
- This requires lot of data and it has an environmental impact

We should reuse and share already trained models.

We use the `from_pretrained()`method

In [None]:
from transformers import BertModel

model = BertModel.from_pretrained("bert-base-cased")

We could replace `BertModel` with the equivalent `AutoModel` class.
In fact, is better as is checkpoint-agnostic code.

In the code sample above we didn’t use `BertConfig`, and instead loaded a pretrained model via the bert-base-cased identifier

This is a model checkpoint that was trained by the authors of BERT themselves.

The weights have been downloaded and cached (so future calls to the from_pretrained() method won’t re-download them) in the cache folder, which defaults to:
 - `~/.cache/huggingface/transformers`. 

You can customize your cache folder by setting the `HF_HOME` environment variable.

## Saving methods

We use the `save_pretrained()` method

In [None]:
model.save_pretrained("directory_on_my_computer")

This saves two files to your disk:
- `config.json`: Attributes necessary to build the model architecture. Also contains metadata such as where the checkpoint originated and what 🤗 Transformers version you were using when you last saved the checkpoint
- `pytorch_model.bin`: State dictionary; it contains all your model’s weights. The two files go hand in hand; the configuration is necessary to know your model’s architecture, while the model weights are your model’s parameters.

## Using a Transformer model for inference

We already know how to load a model (`from_pretrained()`) and how to save a model (`save_pretrained()`). 

We can now use it for making predictions. 

Transformers can **only** process numbers (that the **tokenizer** generates). 

In [0]:
sequences = ["Hello!", "Cool.", "Nice!"]

In [0]:
encoded_sequences = [
    [101, 7592, 999, 102],
    [101, 4658, 1012, 102],
    [101, 3835, 999, 102],
]

To convert the array into a **tensor**

In [None]:
import torch

model_inputs = torch.tensor(encoded_sequences)

### Using the tensor as input to the model

In [None]:
output = model(model_inputs)