#### Basic usage of the Bloom model for text prediction - zero-shot

##### Setup

In [1]:
# Bloom is part of the transformers library --> install it
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.22.1-py3-none-any.whl (4.9 MB)
[K     |████████████████████████████████| 4.9 MB 8.0 MB/s 
Collecting huggingface-hub<1.0,>=0.9.0
  Downloading huggingface_hub-0.9.1-py3-none-any.whl (120 kB)
[K     |████████████████████████████████| 120 kB 78.1 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 48.3 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.9.1 tokenizers-0.12.1 transformers-4.22.1


In [2]:
# imports needed libraries
import torch
from transformers import AutoTokenizer,AutoModelForCausalLM # A general model for casual inferencing

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


Moving 0 files to the new cache system


0it [00:00, ?it/s]

In [3]:
# test GPU avialiablity - otherwise cpu

!nvidia-smi
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device


Sun Sep 25 06:35:21 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

device(type='cuda', index=0)

In [4]:
# if we have gpu bind all tensors to gpu, otherwise by default cpu

if 'cuda' in str(device):
  torch.set_default_tensor_type(torch.cuda.FloatTensor) # this will allocate all tensors  on cuda



In [5]:
# define the tokenizer and model 

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-3b")

model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-3b") # here we use bloom 3b parameters

Downloading:   0%|          | 0.00/222 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/710 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/6.01G [00:00<?, ?B/s]

In [6]:
# define any text prompt / or a list of prompts 
text_prompt = ["Albert Einstein won a Nobel prize for", "Otto Waarburg won the Nobel prize for" ]# ANyone can change this part

In [7]:
# let's look at how bloom tokenizes text
tokens = tokenizer.tokenize(text_prompt)
tokens

['Albert',
 'ĠEinstein',
 'Ġwon',
 'Ġa',
 'ĠNobel',
 'Ġprize',
 'Ġfor',
 'Ot',
 'to',
 'ĠWa',
 'ar',
 'burg',
 'Ġwon',
 'Ġthe',
 'ĠNobel',
 'Ġprize',
 'Ġfor']

In [8]:
# convert tokens to inpt ids and return pytorch tensors
input_ids = tokenizer (text_prompt, return_tensors="pt", padding=True)
input_ids

{'input_ids': tensor([[     3,      3,      3, 124190,  78426,  15974,    267,  41530, 127901,
            613],
        [125112,   1025,  50738,    273,  19616,  15974,    368,  41530, 127901,
            613]]), 'attention_mask': tensor([[0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

In [9]:
# generate tokens as ids and review the generated ids
gen_text = model.generate(**input_ids, min_length=20, max_length=40, temperature=0.2)

In [10]:
# predict possible continuation for the provided prompts

predictions = []
for index, _ in enumerate(gen_text):
  predicted_text = tokenizer.decode(gen_text[index])
  # print(f"Paragraph {index} is: {predicted_text}\n")
  predictions.append(predicted_text)

In [11]:
# do some cleaning of text:
# 1. find the last "." and delete everything afterwords to have a clean sentense
# 2. remove the new line character \n to make it easier and review
# 3. remove <pad> tokens

for item, pred in enumerate(predictions):
  pred = pred[0:pred.rfind(".")+1]
  pred = pred.replace("\n", "")
  pred = pred.replace("<pad>", "")
  print(f"Paragraph {item}: {pred}\n")





Paragraph 0: Albert Einstein won a Nobel prize for his work on the theory of relativity. He was the first person to use the word “relativity” in his Nobel lecture.

Paragraph 1: Otto Waarburg won the Nobel prize for physics in 1901 for his work on the theory of the electron.

