# Loading models with Transformer Architectures

In [None]:
!pip install transformers

The transformers library under the hood makes use of a framework for machine learning computations, which can either be PyTorch or Tensorflow. In this practical session, we'll make use of PyTorch. We'll load PyTorch using the command below; we'll equally define that PyTorch does not need to take into account gradient computations. Gradient computations are only necessary when training (or finetuning) the models. In this practical we'll only perform inference, and no training or finetuning - as training and finetuning is rather compute intensive.

By disabling gradient computations when we don't need them, we'll speed up the computations.

In [None]:
import torch
torch.set_grad_enabled(False)

### Llama-2

We will use transformers in a decoder setup: the model is only able to look at the tokens that have already been generated at a certain point in time, and uses this information in order to predict the next token in the sequence. This is the setup of the infamous GPT (Generative Pretrained Transformer) models. The most recent instantiation is GPT-4o, but its parameters (a massive amount) are not publicly available. We can import the necessary modules with the following command:

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

And we can load the model's parameters and tokenizer using the following commands:

In [None]:
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", pad_token_id=tokenizer.eos_token_id)