<a href="https://colab.research.google.com/github/maxim-popkov/study/blob/master/transformers/Recurser_GPT2_XL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Recurser-GPT2-XL reduces GPT2-XL's VRAM usage by 25% so that we can run GPT2-XL in colab with Pytorch. 
<br>⭐
**Remember to select GPU runtime.**

In [None]:
!pip install recursers

The model weight is taken from the original OpenAI GPT-2 checkpoints.

In [None]:
import torch
import tiktoken

from contextlib import nullcontext
from recursers import GPTConfig, Recurser


init_from = 'gpt2-xl' 
max_new_tokens = 200 # number of tokens generated in each sample
temperature = 0.9 # 1.0 = no change, < 1.0 = less random, > 1.0 = more random, in predictions
top_k = 40 # retain only the top_k most likely tokens, clamp others to have 0 probability
seed = 1337
device = 'cuda' # examples: 'cuda', 'cuda:0', 'cuda:1', etc. cpu is depreciated in this version. 
dtype = 'float16' # 'float32' or 'bfloat16' or 'float16'

torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cuda.matmul.allow_tf32 = True # allow tf32 on matmul
torch.backends.cudnn.allow_tf32 = True # allow tf32 on cudnn
device_type = 'cuda' if 'cuda' in device else 'cpu' # for later use in torch.autocast
ptdtype = {'float32': torch.float32, 'bfloat16': torch.bfloat16, 'float16': torch.float16}[dtype]
ctx = nullcontext() if device_type == 'cpu' else torch.amp.autocast(device_type=device_type, dtype=ptdtype)


model = Recurser.from_pretrained(init_from, dict(dropout=0.0))
model.eval()
model.to(device)

enc = tiktoken.get_encoding("gpt2")
encode = lambda s: enc.encode(s, allowed_special={"<|endoftext|>"})
decode = lambda l: enc.decode(l)

loading weights from pretrained gpt: gpt2-xl


Downloading (…)lve/main/config.json:   0%|          | 0.00/689 [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/6.43G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

The program only takes up about 7.5 GB of VRAM.

In [None]:
start = "What is science?\n" # or "<|endoftext|>" or etc.

start_ids = encode(start)
x = (torch.tensor(start_ids, dtype=torch.long, device=device)[None, ...])

with torch.no_grad():
    with ctx:
        y = model.generate(x, max_new_tokens, temperature=temperature, top_k=top_k)


Science is a way of knowing, which includes all that happens when we live our lives. It has as its essence the understanding of nature, how the world works, the laws of nature and how they change.

What is science teaching?

Science is a way of knowing, which includes all that happens when we live our lives. It has as its essence the understanding of nature, how the world works, the laws of nature and how they change.

Are there people or groups working to promote science?

Yes and no. People are working to promote science and they do so through various institutions, including some educational institutions, universities, laboratories, companies, scientists – and a whole range of community initiatives. Some of these activities are in the public interest, but some are not. Some of these activities are also often in conflict with the interests of their funding bodies.

Science is, above all, an open and inclusive way of knowing. It is based upon the

Thank you for your support. I want to make AI models more accessible to everyone.
<br>
Recurser repo: https://github.com/max-ng/recurser 💪



Reference: 
<br>
Karpathy's elegant GPT implementation
https://github.com/karpathy/nanoGPT
<br>
Hugging Face's library
https://github.com/huggingface/transformers