# Models

## AutoModel Class

https://huggingface.co/docs/transformers/v4.44.0/en/model_doc/auto#auto-classes

The AutoModel class and all of its relatives are simple wrappers OVER the wide variety of models available in the library (wrap around the hugging face models)
- Wrappers (explanation from ChatGPT): A Python wrapper is a design pattern that involves wrapping an object or function within another object or function to extend or modify its behavior without changing its core structure. Wrappers are commonly used to enhance functionality

It can automatically guess the appropriate model architecture for your checkpoint, and instantiate the model with this architecture
- Architecture we want to use can be guesssed from the name of the pretrained model supplied to the ```from_pretrained()``` method. Instantiating with the AutoModel directly create a class of the relevant architecture 

However, if you know the type of model you want to use, you can use the class that defines its architecture directly


### Exploring model configuration

In [1]:
"""
Lets say we know we want to use the BERT model

https://huggingface.co/docs/transformers/model_doc/bert

Creating a model from the default configuration initializes it with random values

The model can be used in this state, but it will output gibberish; it needs to be trained first

We could train the model from scratch on the task at hand but this would require a long time and a lot of data

To avoid unnecessary and duplicated effort, it’s imperative to be able to share and reuse models that have already been trained
"""

from transformers import BertConfig, BertModel

# Building the config
# Configuration class that stores the configuration of a BertModel
# Used to instantiate a BERT model 
# Load the configuration object
config = BertConfig()

# Building the model from the config
# Initialise a BERT model with the configuration object
# Model is randomly initialised
model = BertModel(config)

In [2]:
"""
The configuration contains many attributes used to build the model
Examples:
- hidden_size attribute defines the size of the hidden_states vector
- num_hidden_layers defines the number of layers the Transformer model has
"""
print(config)

BertConfig {
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.44.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}



### Exploring pre-trained models

In [None]:
"""
Loading a Transformer model that is already trained is simple — we can do this using the from_pretrained() method

We could replace BertModel with the equivalent AutoModel class, which is more flexible
as it works with any model. More flexible for the future in case you decide to change models, but for a similar task

In the code below, we load a pretrained model via the bert-base-cased identifier
This is a model checkpoint (saved instance of the model - weights and configurations) that was traind by the authors of BERT themselves
Details: https://huggingface.co/google-bert/bert-base-cased
- Note: the weights have been downloaded and cached (so future calls to the from_pretrained() method 
won’t re-download them) in the cache folder, which defaults to ~/.cache/huggingface/transformers. 
You can customize your cache folder by setting the HF_HOME environment variable.

This model is now initialized with all the weights of the checkpoint. 
It can be used directly for inference on the tasks it was trained on, and 
it can also be fine-tuned on a new task. By training with pretrained weights rather than from scratch, we can quickly achieve good results.

Identifiers
- the identifier used to load the model can be the identifier of any model on the Model Hub
- Since we are using the BERT model, we can use any identifier compatible with the BERT architecture
- List of available BERT checkpoints: https://huggingface.co/models?other=bert
"""

from transformers import BertModel

model = BertModel.from_pretrained("bert-base-cased")

## Saving Models

Saving a model : ```model.save_pretrained("directory_on_my_computer")```

This saves 2 files to our disk:
1. ```config.json```: Contains attributes necessary to build the model architecture, metadata such as where the checkpoint originated and what huggingface transformers version you were using when you last saved the checkpoint. Necessary to know model's architecture.

2. ```pytorch_model.bin```: Known as a state dictionary, contains all the model's weights. Contains model's parameters.





## Using a Transformer Model for Inference

Transformer models can only process numbers that the tokenizer generates
- Tokenizers take care of casting the inputs to the appropriate framework’s tensors

We then feed tensors as inputs to the model
```output = model(model_inputs)```

While the model accepts a lot of different arguments, only the input IDs are necessary
