<a href="https://colab.research.google.com/github/plaban1981/Huggingface_transformers_course/blob/main/Huggingface_transformers_pipeline_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##  Creating and using a model

* The AutoModel class,is used to instantiate any model from a checkpoint.

* The **AutoModel** class and all of its relatives are actually simple wrappers over the wide variety of models available in the library.
* The AutoModel wrapper can automatically guess the appropriate model architecture for your checkpoint, and then instantiates a model with this architecture.
* In case we know the type of model we want to use, you can use the class that defines its architecture directly.

In [1]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/fd/1a/41c644c963249fd7f3836d926afa1e3f1cc234a1c40d80c5f03ad8f6f1b2/transformers-4.8.2-py3-none-any.whl (2.5MB)
[K     |▏                               | 10kB 19.1MB/s eta 0:00:01[K     |▎                               | 20kB 27.8MB/s eta 0:00:01[K     |▍                               | 30kB 23.4MB/s eta 0:00:01[K     |▌                               | 40kB 16.4MB/s eta 0:00:01[K     |▋                               | 51kB 13.0MB/s eta 0:00:01[K     |▉                               | 61kB 11.6MB/s eta 0:00:01[K     |█                               | 71kB 12.7MB/s eta 0:00:01[K     |█                               | 81kB 13.9MB/s eta 0:00:01[K     |█▏                              | 92kB 13.8MB/s eta 0:00:01[K     |█▎                              | 102kB 12.1MB/s eta 0:00:01[K     |█▍                              | 112kB 12.1MB/s eta 0:00:01[K     |█▋                              | 

## Creating a Transformer

#### Creating a model from the default configuration initializes it with random values:

In [2]:
from transformers import BertModel,BertConfig
#Building the config
config = BertConfig()
#Building the model from config
# Here the model is initialized randomly
model = BertModel(config)

In [3]:
print(config)

BertConfig {
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.8.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}



* The model can be used in this state, but it will output gibberish; it needs to be trained first.

* Training  the model from scratch would require a long time and a lot of data, and it would have a non-negligible environmental impact. 

* To avoid unnecessary and duplicated effort, it’s imperative to be able to share and reuse models that have already been trained.

## Loading a Transformer model that is already trained

In [4]:
from transformers import BertModel
model = BertModel.from_pretrained("bert-base-cased")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=570.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435779157.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at bert-base-cased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [6]:
model.config

BertConfig {
  "_name_or_path": "bert-base-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.8.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

* Here we didn’t use BertConfig,instead loaded a **pretrained model** via the bert-base-cased identifier. 
* This is a model checkpoint that was trained by the authors of BERT themselves
* This model is now initialized with all the weights of the checkpoint
* It can be used directly for inference on the tasks it was trained on, and it can also be fine-tuned on a new task
* By training with pretrained weights rather than from scratch, we can quickly achieve good results

## Saving methods

In [7]:
model.save_pretrained("/content")

#### This saves two files to your disk:

* /content/pytorch_model.bin
* /content/config.json

*  config.json file contains the following information 
    * attributes necessary to build the model architecture
    * metadata, such as where the checkpoint originated
    * the  🤗 Transformers version used

* the pytorch model bin file known as the state dictionary
    * Iit contains all your model’s weights

## Read config.json file

In [11]:
import json
import pandas as pd
# Opening JSON file
with open('/content/config.json',) as f:
  data = json.load(f)
  for k,v in data.items():
    print(str(k) + " : "+str(v))

_name_or_path : bert-base-cased
architectures : ['BertModel']
attention_probs_dropout_prob : 0.1
gradient_checkpointing : False
hidden_act : gelu
hidden_dropout_prob : 0.1
hidden_size : 768
initializer_range : 0.02
intermediate_size : 3072
layer_norm_eps : 1e-12
max_position_embeddings : 512
model_type : bert
num_attention_heads : 12
num_hidden_layers : 12
pad_token_id : 0
position_embedding_type : absolute
transformers_version : 4.8.2
type_vocab_size : 2
use_cache : True
vocab_size : 28996


* The configuration file is necessary to know your model’s architecture, 

* The model weights specify model’s parameters.

## Using a Transformer model for inference
* Transformer models can only process numbers — numbers that the tokenizer generates.
* Tokenizers can take care of casting the inputs to the appropriate framework’s tensors,

In [12]:
sequences = [
  "Hello!",
  "Cool.",
  "Nice!"
]

In [13]:
sequences

['Hello!', 'Cool.', 'Nice!']

* The tokenizer converts these to vocabulary indices which are typically called input IDs. 
* Each sequence is now a list of numbers.

The resulting output is:

In [14]:
encoded_sequences = [
  [ 101, 7592,  999,  102],
  [ 101, 4658, 1012,  102],
  [ 101, 3835,  999,  102]
]

* This is a list of encoded sequences: a list of lists. 

Transformers accepts Tensors (only accept rectangular shapes)

In [15]:
import torch

model_inputs = torch.tensor(encoded_sequences)

In [17]:
model_inputs.shape

torch.Size([3, 4])

## Using the tensors as inputs to the model

In [18]:
output = model(model_inputs)

In [19]:
output

BaseModelOutputWithPoolingAndCrossAttentions([('last_hidden_state',
                                               tensor([[[ 4.4496e-01,  4.8276e-01,  2.7797e-01,  ..., -5.4033e-02,
                                                          3.9394e-01, -9.4770e-02],
                                                        [ 2.4943e-01, -4.4093e-01,  8.1772e-01,  ..., -3.1917e-01,
                                                          2.2992e-01, -4.1172e-02],
                                                        [ 1.3668e-01,  2.2518e-01,  1.4502e-01,  ..., -4.6915e-02,
                                                          2.8224e-01,  7.5566e-02],
                                                        [ 1.1789e+00,  1.6738e-01, -1.8187e-01,  ...,  2.4671e-01,
                                                          1.0441e+00, -6.1964e-03]],
                                               
                                                       [[ 3.6436e-01,  3.2464e-02,  2.0

* While the model accepts a lot of different arguments, only the input IDs are necessary