<a href="https://colab.research.google.com/github/nidhiashok/huggingfacelearning/blob/main/Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%pip install transformers

Collecting transformers
  Downloading transformers-4.32.0-py3-none-any.whl (7.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.5/7.5 MB[0m [31m19.3 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.15.1 (from transformers)
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m21.4 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m28.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m30.7 MB/s[0m eta [36m0:00:0

# A deep dive into Transformers Models!

The Automodel class is used to instantiate a model from a checkpoint.
Which is creating a new instance of a model using the saved parameters and weights stored in a checkpoint file. A checkpoint file contains the trained parameters of a model at a specific point in training or after fine tuning.

The PyTorch version.

In [None]:
#Creating a Transformer
# Initialising a model and loading a configuration object.

from transformers import BertConfig, BertModel

#Building the Config
config = BertConfig()

#Building the model from the config
model = BertModel(config)

#Here the model uses default configurations and gets initialised with random values

In [None]:
#Examining the config attributes

print(config)

BertConfig {
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.31.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}



In [None]:
# The model needs to be trained on the task at hand. Using a pretrained model is advantageous.

from transformers import BertModel

model = BertModel.from_pretrained("bert-base-cased")


Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

Using AutoModel instead of BertModel produces checkpoint agnostic code;
The model is initialised with the weights of the checkpoint. bert-base-cased identifier is a checkpoint trained by authors of BERT.


In [None]:
# Saving methods

model.save_pretrained("directory_on_my_computer")

This saves two files on the disk. A config.json file and a pytorch_model.bin file.
config file contains metadata like checkpoint origin, transformer version used when the checkpoint was last saved.
pytorch_model file knows as state dictionary contains all the weights of the model.

Using transformers model for inference.

Transformer models only process numbers.
Tokenizers take care of casting the inputs to appropriate framework's tensors.

Sequences -> Tokenizer -> inputIDs (vocabulary indices/list of lists/encoded sequence list) -> Tensor (accepts only rectangular shape array) -> model inputs

In [None]:
sequences = ["Hello!","Cool.","Nice!"]

encoded_sequences = [[],[],[],]

In [None]:
import torch

model_inputs = torch.tensor(encoded_sequences)

In [None]:
# Using the tensors as inputs to the model

output = model(model_inputs)

# Dealing with the TensorFlow!!

TFAutoModel class is handy when you want to instantiate any model from a checkpoint.

In [2]:
from transformers import BertConfig, TFBertModel

#Building the config
config = BertConfig()

#Building the model from the config
model = TFBertModel(config)

In [3]:
print(config)

BertConfig {
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.32.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}



In [4]:
#initialise model from default configurations

from transformers import BertConfig, TFBertModel

config = BertConfig()
model = TFBertModel(config)

#model gets initialised randomly

In [5]:
#initialising with pretrained model

from transformers import TFBertModel

model = TFBertModel.from_pretrained("bert-base-cased")

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFBertModel: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias']
- This IS expected if you are initializing TFBertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFBertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions w

In [None]:
#saving methods

model.save_pretrained("directory_on_my_computer")

In [None]:
#Using tranformer model for inference

import tensorflow as tf

model_inputs = tf.constant   (encoded_sequences)

In [None]:
#using tensors as the inputs to the model

output = model(model_inputs)ß