#### Just testing this approach https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/HuggingFace_int8_demo.ipynb#scrollTo=Aep1KMF6dqdm
Using bnb library with 8bit integers we can use bloom 7b with colab pro

## Important note: 8bit implementations are supported only on T4 and A100 GPUS - If you have ay othe GPU reset environment until at least T4 is allocated

#### Setup

In [1]:
# test GPU  T4 avialiablity - otherwise 8bit can not be loaded
import torch

gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)


Tue Sep 27 12:16:50 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
# install needed libraries to use 8 bit integers fron bnb
!pip install --quiet bitsandbytes
!pip install --quiet git+https://github.com/huggingface/transformers.git # Install latest version of transformers
!pip install --quiet accelerate

[K     |████████████████████████████████| 55.9 MB 305 kB/s 
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
[K     |████████████████████████████████| 7.0 MB 2.1 MB/s 
[K     |████████████████████████████████| 120 kB 87.3 MB/s 
[?25h  Building wheel for transformers (PEP 517) ... [?25l[?25hdone
[K     |████████████████████████████████| 143 kB 2.1 MB/s 
[?25h

In [4]:
# if we have gpu bind all tensors to given gpu, otherwise by default cpu

if 'cuda' in str(gpu_info):
  torch.set_default_tensor_type(torch.cuda.FloatTensor) # this will allocate all tensors  on cuda

### Using bloom 7b for text prediction and Q&A

In [5]:
# # define the tokenizer and model 

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigscience/bloom-1b7"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model_8bit = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", load_in_8bit=True) # here we load the model in 8bit


Downloading:   0%|          | 0.00/222 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/710 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/3.44G [00:00<?, ?B/s]


Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 111
CUDA SETUP: Loading binary /usr/local/lib/python3.7/dist-packages/bitsandbytes/libbitsandbytes_cuda111.so...


  f'{candidate_env_vars["LD_LIBRARY_PATH"]} did not contain '


In [6]:
# add prompt with questions

prompt = ["Who was Otto Warburg and what he is known for?", "What Prof. Thomas Seyfried theory about cancer as a metabolic disease says?",\
          "What is the role of the m-Tor pathway in cancer progression?", \
          "What are the best natural blood Glucose inhibitors?",
          "Is the theory of cancer as a metabolic disease correct or not?"]

In [7]:
# tokenize the prompt 
tokenized_ids = tokenizer(prompt, return_tensors='pt', padding=True)


In [8]:
# generate predicted ids 
generated_ids = model_8bit.generate(**tokenized_ids,  min_length=20, max_length=80, temperature=0.2, repetition_penalty=1.1)

In [9]:
# transform it back to text

predictions = []
for ids in generated_ids:
  predicted_text = tokenizer.decode(ids)
  print(f"Q&A: {predicted_text}\n")
  predictions.append(predicted_text)

Q&A: <pad><pad><pad>Who was Otto Warburg and what he is known for? He has been a scientist, researcher, educator, philanthropist, and author. His work on cancer research led to the discovery of many new drugs that have helped millions of people live longer lives.
Otto Warburg (1883-1963) was born in Vienna, Austria as Otto Friedrich Wilhelm von Huberthal

Q&A: What Prof. Thomas Seyfried theory about cancer as a metabolic disease says?"
The answer is that the tumor cells are not metabolically active, and they do not produce any of their own energy.
In fact, it seems to me that this idea has been misunderstood for quite some time now.  The main reason why people think that tumors have no metabolism is because there was an epidemic of

Q&A: <pad>What is the role of the m-Tor pathway in cancer progression? The mTOR signaling cascade has been shown to be involved in a variety of cellular processes, including cell growth and proliferation. In addition, it plays an important role in tumorigen

In [10]:
# do some cleaning of text:
# 1. find the last "." and delete everything afterwords to have a clean sentense
# 2. remove the new line character \n to make it easier and review
# 3. remove <pad> tokens

for item, pred in enumerate(predictions):
  pred = pred[0:pred.rfind(".")+1] # truncate words beyond last period
  pred = pred.replace("\n", "") # remove newlines
  pred = pred.replace("<pad>", "") # remove padding tokens
  print(f"Paragraph {item}: {pred}\n")

Paragraph 0: Who was Otto Warburg and what he is known for? He has been a scientist, researcher, educator, philanthropist, and author. His work on cancer research led to the discovery of many new drugs that have helped millions of people live longer lives.

Paragraph 1: What Prof. Thomas Seyfried theory about cancer as a metabolic disease says?"The answer is that the tumor cells are not metabolically active, and they do not produce any of their own energy.In fact, it seems to me that this idea has been misunderstood for quite some time now.

Paragraph 2: What is the role of the m-Tor pathway in cancer progression? The mTOR signaling cascade has been shown to be involved in a variety of cellular processes, including cell growth and proliferation.

Paragraph 3: What are the best natural blood Glucose inhibitors? What is a good diet for diabetes?The most important thing to remember when choosing an oral glucose-lowering agent is that it should be safe and effective.

Paragraph 4: Is the t