<a href="https://colab.research.google.com/github/mar-i0/AI-Notebooks/blob/main/Accelerate_OPT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Running OPT up to 30B using `accelerate`

This notebook shows how to leverage the dispatching utility in colab, to load even very large checkpoints.

This should handle up to 11B in Colab Free, and 30B in colab Pro.

In [1]:
! pip install transformers accelerate

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.28.1-py3-none-any.whl (7.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.0/7.0 MB[0m [31m53.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.18.0-py3-none-any.whl (215 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m215.3/215.3 kB[0m [31m28.3 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m102.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.13.4-py3-none-any.whl (200 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m200.1/200.1 kB[0m [31m27.3 MB/s[0m eta [36m0:00:00[0

This downloads the checkpoint. Several checkpoints are available:

- [facebook/opt-125m](https://huggingface.co/facebook/opt-125m)
- [facebook/opt-350m](https://huggingface.co/facebook/opt-350m)
- [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b)
- [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b)
- [facebook/opt-6.7b](https://huggingface.co/facebook/opt-6.7b)
- [facebook/opt-13b](https://huggingface.co/facebook/opt-13b)
- [facebook/opt-30b](https://huggingface.co/facebook/opt-30b)

It downloads it to cache and we save the link to be re-used afterwards,

We then instantiate the model with an automatic model map, that will be created according to the current system's configuration.

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = 'facebook/opt-1.3b'

! mkdir offload_folder
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", offload_folder='./offload_folder')

Downloading (…)lve/main/config.json:   0%|          | 0.00/653 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/2.63G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

Finally, we create a prompt to generate from and we generate a text from it.

In [5]:
tokenizer = AutoTokenizer.from_pretrained(checkpoint, use_fast=False)
inputs = tokenizer("¿De qué color es el caballo blanco de Santiago?", return_tensors="pt")

output = model.generate(inputs["input_ids"].to(0), min_length=30, max_length=30, do_sample=True)

In [17]:
prueba = tokenizer("Adivina, adivinanza, ¿qué tiene el rey en la panza?", return_tensors="pt")
output = model.generate(inputs["input_ids"].to(0), min_length=30, max_length=100, do_sample=True)

In [18]:
print(tokenizer.decode(output[0].tolist()))

</s>¿De qué color es el caballo blanco de Santiago? Me da pena en ver que hay tan mágicos bárbaros en la humanidad.
Era el caballo blanco de la revolución francesa y luego se vino a la maldita República del Sur.
Santiago todo es el caballo blanco de la revolución francésia.</s>
