# Large Language Model FLAN-T5 and GTP locally
In this notebook we are going to run different versions of FLAN-T5 and GTP

We define the following prompt:

In [1]:
prompt ="A step by step recipe to make bolognese pasta:"

## FLAN-T5-small 
Here we are going to download 300 MB of data of the model

In [2]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small")
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))



['Pour a cup of bolognese into a large bowl and add the pasta']


## FLAN-T5-large
Here we are going to download 3GB MB of data of the model

In [3]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-large")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-large")
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

['Toss the pasta with the sauce, then add the meat and toss again.']


## CUDA Capability

In [4]:
import torch
is_cuda=torch.cuda.is_available()
if is_cuda:
    print("This computer uses CUDA") 
else:  
    print("This computer uses CPU") 

This computer uses CUDA


In [5]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-large")
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-large").to(device)

In [6]:
inputs = tokenizer(prompt, return_tensors="pt").to(device)

In [7]:
outputs = model.generate(**inputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

['Toss the pasta with the sauce, then add the meat and toss again.']


##  GPT-neo-125M

In [16]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-125M")
model.to(device)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-125M")
input_ids = tokenizer(prompt, return_tensors="pt").to(device) 

In [17]:
#outputs = model.generate(**input_ids)
outputs = model.generate(**input_ids, do_sample=True, max_length=400)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [18]:
generation =tokenizer.batch_decode(outputs, skip_special_tokens=False)

In [19]:
print('\n'.join(generation))

A step by step recipe to make bolognese pasta:
1. Begin recipe steps #1-3. Prepare recipe ingredients carefully and use the pasta
tubing tips (note: they should be in a bottle and not on the body of pasta). It is
essential not to use the tubing tips for pasta in the recipe.
3. Make topping.

For pasta the tubing tips should be in a large bowl, leaving the pasta empty and
sticking on as the pasta water is hot.
4. Place the spaghetti and saucepan directly on a wire rack.
5. To make sauce, spray with a thin coating. Using a fork, remove the pasta
parting, then pour off any excess saucepan.
6. Bring the pan down to just below the bottom of the spaghetti saucepan.
7. Place the pans in the skillet and sprinkle the meat filling and sauce on top. As you
add meat, pour the sauce from the bottom of the saucepan onto the bottom of
the saucepan. Cover the pan evenly with towels. Let the sauce cook while you
cooking.
8. Put the pan in the oven, and cook for about 10 minutes. Remove from the
pan and

# HuggingFace Enviroment

In [12]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
import gradio as gr
import re
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-large")
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-large").to(device)
class GUI:
    def query(self,query,tokens=30):
        options=""
        tok_len=tokens
        t5query = f"""Question: "{query}" Context: {options}"""
        inputs = tokenizer(t5query, return_tensors="pt").to(device)
        outputs = model.generate(**inputs, max_new_tokens=tok_len)
        return tokenizer.batch_decode(outputs, skip_special_tokens=True)
        
    def begin(self,question,tokens):
        results = app.query(question,tokens)
        return results

app = GUI()
title = "Get answers with questions with Flan-T5"
description = "Results will show up in a few seconds."
css = """.output_image, .input_image {height: 600px !important}"""

iface = gr.Interface(fn=app.begin, 
                     inputs=[ gr.Textbox(label="Question"),
                             gr.Slider(30, 100, value=30, step = 1)
                            ],
                     outputs = gr.Text(label="Answer Summary"),
                     title=title,
                     description=description,
                     #article=article,
                     css=css,
                     analytics_enabled = True, enable_queue=True)
iface.launch(inline=False, share=False, debug=False)

  iface = gr.Interface(fn=app.begin,


Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.




In [13]:
iface.launch()

Rerunning server... use `close()` to stop if you need to change `launch()` parameters.
----

To create a public link, set `share=True` in `launch()`.




## FLAN-T5 vs GTP Neo
Now let us merge all tree models together and check the differences

In [14]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AutoModelForCausalLM
import torch
import gradio as gr
import re
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
class GUI:
    def query(self,query,modelo="flan-t5-small",tokens=100):
        options=""
        tok_len=tokens
        t5query = f"""Question: "{query}" Context: {options}"""        
        if (modelo=="flan-t5-small" or modelo=="flan-t5-large"):
           tokenizer = AutoTokenizer.from_pretrained("google/{}".format(modelo))
           model = AutoModelForSeq2SeqLM.from_pretrained("google/{}".format(modelo)).to(device)
           inputs = tokenizer(t5query, return_tensors="pt").to(device)
           outputs = model.generate(**inputs, max_new_tokens=tok_len)
        else:
            model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-125M").to(device)
            tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-125M")
            input_ids = tokenizer(t5query, return_tensors="pt").to(device) 
            outputs = model.generate(**input_ids, do_sample=True, max_length=tok_len) 
        generation=tokenizer.batch_decode(outputs, skip_special_tokens=True)
        
        return '\n'.join(generation)
    def begin(self,question,modelo,tokens):
        results = app.query(question,tokens)
        return results

app = GUI()
title = "Get answers with questions with Flan-T5"
description = "Results will show up in a few seconds."
article="More info  <a href='https://ruslanmv.com/'>ruslanmv.com</a><br>" 
css = """.output_image, .input_image {height: 600px !important}"""

iface = gr.Interface(fn=app.begin, 
                     inputs=[ gr.Textbox(label="Question"),
                     gr.Radio(["flan-t5-small", "flan-t5-large","gpt-neo-125M"],label="Model",value="flan-t5-small"),
                     gr.Slider(30, 200, value=100, step = 1,label="Max Tokens"),],
                     outputs = gr.Text(label="Answer Summary"),
                     title=title,
                     description=description,
                     article=article,
                     css=css,
                     analytics_enabled = True
                    ,enable_queue=True)
iface.launch(inline=False, share=False, debug=False)

  iface = gr.Interface(fn=app.begin,


Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.




In [15]:
iface.launch()

Rerunning server... use `close()` to stop if you need to change `launch()` parameters.
----

To create a public link, set `share=True` in `launch()`.


