# Models (PyTorch)

The explanation of this notebook is in the Hugging Face course, chapter 2, section 3: [Models](https://huggingface.co/course/chapter2/3?fw=pt)

The original code of this notebook is in the Hugging Face's SageMaker repository: [section3_pt.ipynb](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter2/section3_pt.ipynb)

## Run conditions

This notebook has been tested in the following environment:
- Environment: Project created in [Paperspace Gradient](https://gradient.paperspace.com) with Python 3.9.13.
- Machine: P5000 (30GiB RAM 8 CPU 16GiB GPU) (more details on [Paperspace Machines](https://docs.paperspace.com/gradient/machines/)).
- IDE: Visual Studio Code with remote Jupyter server.

## Install dependencies

In [2]:
# Install the libraries datasets v2.7.1, evaluate v0.3.0, and transformers v4.25.1 with quiet and upgrade flags.
%pip install -q datasets==2.7.1 evaluate==0.3.0 transformers==4.25.1 --upgrade

[0mNote: you may need to restart the kernel to use updated packages.


## Creating a Transformer

In [3]:
# Import BERT configuration and model from Transformers.
from transformers import BertConfig, BertModel

# Build the configuration of the model.
config = BertConfig()
# Build the model.
model = BertModel(config)

In [4]:
# Print the configuration.
print(config)

BertConfig {
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.25.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}



## Different loading methods

In [5]:
# Creating a model from the default configuration initializes it with random values
from transformers import BertModel, BertConfig

# Build the configuration of the model.
config = BertConfig()
# Build the model.
model = BertModel(config)

In [6]:
# Load a transformer model from a pretrained model.
from transformers import BertModel

# Build the model from "bert-base-cased".
model = BertModel.from_pretrained("bert-base-cased")

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


## Saving methods

In [7]:
# Save the model in the path "hugging_face_course/2_using_transformers/model/section_3".
model.save_pretrained("/hugging_face_course/2_using_transformers/model/section_3")

## Using a Transformer model for inference

In [None]:
# Create a sequences list with 3 words.
sequences = ["Hello!", "Cool.", "Nice!"]

In [None]:
# Create a encoded_sequences list with 3 list of numbers.
encoded_sequences = [
    [101, 7592, 999, 102],
    [101, 4658, 1012, 102],
    [101, 3835, 999, 102],
]

In [None]:
# Import the torch library.
import torch

# Create a tensor from the encoded_sequences list.
input_ids = torch.tensor(encoded_sequences)

In [None]:
# Using the tensors as inputs to the model.
outputs = model(input_ids)