### Getting start with Bloom

References: 
- https://towardsdatascience.com/getting-started-with-bloom-9e3295459b65
- https://towardsdatascience.com/run-bloom-the-largest-open-access-ai-model-on-your-desktop-computer-f48e1e2a9a32

In [1]:
import transformers
from transformers import BloomForCausalLM
from transformers import BloomTokenizerFast
import torch
import os, sys , ssl
if (not os.environ.get('PYTHONHTTPSVERIFY', '') and getattr(ssl, '_create_unverified_context', None)):
    ssl._create_default_https_context = ssl._create_unverified_context

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
## for now, we will just use the 3 b model 
cache_dir = '/data/chuang/temp'
model = BloomForCausalLM.from_pretrained("bigscience/bloom-3b",cache_dir=cache_dir)
tokenizer = BloomTokenizerFast.from_pretrained("bigscience/bloom-3b",cache_dir=cache_dir)

Downloading config.json: 100%|██████████| 688/688 [00:00<00:00, 254kB/s]
Downloading pytorch_model.bin: 100%|██████████| 5.59G/5.59G [07:31<00:00, 13.3MB/s]
Downloading tokenizer_config.json: 100%|██████████| 222/222 [00:00<00:00, 76.5kB/s]
Downloading tokenizer.json: 100%|██████████| 13.8M/13.8M [00:00<00:00, 34.4MB/s]
Downloading special_tokens_map.json: 100%|██████████| 85.0/85.0 [00:00<00:00, 33.3kB/s]


In [15]:
prompt = "Generate python code to loop through a list of 5 items and change their values:"
result_length = 500 
inputs = tokenizer(prompt,return_tensors='pt')

In [16]:
# Greedy Search
print(tokenizer.decode(model.generate(inputs["input_ids"], 
                       max_length=result_length
                      )[0]))

Generate python code to loop through a list of 5 items and change their values:
import random
import time

def change_item(item):
    print("Item: " + item)
    item = item.replace(" ", "")
    item = item.replace(",", "")
    item = item.replace(".", "")
    item = item.replace(" ", "")
    item = item.replace(",", "")
    item = item.replace(".", "")
    item = item.replace(" ", "")
    item = item.replace(",", "")
    item = item.replace(".", "")
    item = item.replace(" ", "")
    item = item.replace(",", "")
    item = item.replace(".", "")
    item = item.replace(" ", "")
    item = item.replace(",", "")
    item = item.replace(".", "")
    item = item.replace(" ", "")
    item = item.replace(",", "")
    item = item.replace(".", "")
    item = item.replace(" ", "")
    item = item.replace(",", "")
    item = item.replace(".", "")
    item = item.replace(" ", "")
    item = item.replace(",", "")
    item = item.replace(".", "")
    item = item.replace(" ", "")
    item = item.re

In [13]:
# Beam Search
print(tokenizer.decode(model.generate(inputs["input_ids"],
                       max_length=result_length, 
                       num_beams=2, 
                       no_repeat_ngram_size=2,
                       early_stopping=True
                      )[0]))

Generate python code to loop through a dataframe:
import pandas as pd

df = pd.DataFrame({'A':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,


In [14]:
# Sampling Top-k + Top-p
print(tokenizer.decode(model.generate(inputs["input_ids"],
                       max_length=result_length, 
                       do_sample=True, 
                       top_k=50, 
                       top_p=0.9
                      )[0]))

Generate python code to loop through a dataframe:
import pandas as pd

df = pd.DataFrame({
   'sample': [
        {'A': 'AA', 'B': 'BB'},
        {'A': 'AA', 'B': 'AA'},
        {'A': 'AA', 'B': 'AA'},
        {'A': 'AA', 'B': 'AA'},
        {'A': 'AA', 'B': 'AA'},
        {'A': 'AA', 'B': 'AA'},
    ],
    'N': [
        {'X': 1, 'Y': 2},
        {'X': 1, 'Y': 3},
        {'X': 2, 'Y': 2},
        {'X': 3, 'Y': 2},
        {'X': 3, 'Y': 3},
    ],
    'Z': [
        {'X': 1, 'Y': 1},
        {'X': 1, 'Y': 1},
        {'X': 1, 'Y': 1},
        {'X': 1, 'Y': 1},
        {'X': 2, 'Y': 1},
        {'X': 2, 'Y': 1},
    ]
})

df.to_sql('df_tst', connection='postgres')
pd.read_sql_query('df_tst'.to_sql('SELECT * FROM df_tst', connection='postgres'), output_format='list')

How do I loop through the dataframe using pandas to calculate the sum of the rows, the maximum value and the minimum value. The result should look like this.

The expected output should be:

A:

You can use.set_index and t