# LLM Demo Notebook


## Getting Setup

You can run this notebook on the Jupyter Hub machines but you will need to setup an OpenAI account.  Alternatively, if you are running on your own computer you can also try to run a model locally.

### Step 1. Create an OpenAI account

You can create a free account which has some initial free credits by going here:

https://platform.openai.com

You will the need to get an API Key.  Save that api key to a local file called `openai.key`:

In [1]:
# with open("openai.key", "w") as f:
#     f.write("sk-zPutYourkeyHereYJ")

In [2]:
openai_key = open("openai.key", "r").readline()

### Step 2. Install Python Tools

Uncomment the following line. 

In [3]:
#!pip install openai langchain

In [4]:
from openai import OpenAI

client = OpenAI(api_key=openai_key)
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "What is the capital of California?",
        }
    ],
    model="gpt-3.5-turbo", temperature=0
)
chat_completion

ChatCompletion(id='chatcmpl-8iaqMA8ouHlfBXFgweahpfkpEbmpz', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The capital of California is Sacramento.', role='assistant', function_call=None, tool_calls=None))], created=1705638750, model='gpt-3.5-turbo-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=7, prompt_tokens=14, total_tokens=21))

## Using OpenAI with LangChain

In [5]:
from langchain.llms import OpenAI
llm = OpenAI(openai_api_key=openai_key,
             model="gpt-3.5-turbo-instruct", 
             #temperature=0, 
             max_tokens=512)

  warn_deprecated(


In [6]:
llm("What is the capital of California? Provide a short answer.")

  warn_deprecated(


'\n\nSacramento'

In [7]:
for chunk in llm.stream("Write a short song about data science and large language models."):
    print(chunk, end="", flush=True)



Verse 1:
Data science, it's the key
To unlock the mysteries
Of the world we see
With numbers and algorithms

Chorus:
Large language models, they can speak
With human-like complexity
They learn from data, endlessly
To help us understand our reality

Verse 2:
Gathering data, from every source
We analyze and make it work
To find patterns, and insights
That open up new worlds for our eyes

Chorus:
Large language models, they can speak
With human-like complexity
They learn from data, endlessly
To help us understand our reality

Bridge:
From chatbots to translation
They can handle any conversation
With endless possibilities
Data science and language models, a perfect harmony

Chorus:
Large language models, they can speak
With human-like complexity
They learn from data, endlessly
To help us understand our reality

Outro:
Data science and large language models
Together, they'll take us far
Unlocking the secrets of the world
With every data point, we'll reach for the stars.

## Running Locally with Ollama and LangChain
You can download and install Ollama from:

https://ollama.ai/download

This will run models locally

In [8]:
from langchain.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
vicuna = Ollama(
    model="vicuna:7b", 
    #temperature=0,
    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])
)

In [9]:
vicuna("What is the capital of California? Answer with only one word.")


Sacramento

'\nSacramento'

In [10]:
vicuna("Write a short song about data science and large language models.")

Data science is the name of the game
Big data's the target, we aim for fame
With machine learning algorithms we create
New insights and knowledge, it's what we're great

Large language models are our tools of choice
They help us process and analyze with ease
From natural language processing to computer vision
We use them all to gain new perspectives

With Python and R, we code and we test
To build models that are the best
We train and we tweak until we're satisfied
Our data science skills, they never lack in quality

So here's to data science, it's our passion
Helping us make sense of all the information we can access with haste
Large language models, they help us achieve
The insights we need to solve and improve and perceive.

"Data science is the name of the game\nBig data's the target, we aim for fame\nWith machine learning algorithms we create\nNew insights and knowledge, it's what we're great\n\nLarge language models are our tools of choice\nThey help us process and analyze with ease\nFrom natural language processing to computer vision\nWe use them all to gain new perspectives\n\nWith Python and R, we code and we test\nTo build models that are the best\nWe train and we tweak until we're satisfied\nOur data science skills, they never lack in quality\n\nSo here's to data science, it's our passion\nHelping us make sense of all the information we can access with haste\nLarge language models, they help us achieve\nThe insights we need to solve and improve and perceive."

<br><br><br><br><br>

## Data Analytics

We can use LLMs to help in analyzing data

In [11]:
import pandas as pd
tweets = pd.read_json("AOC_recent_tweets.txt")
list(tweets['full_text'][0:10])

['RT @RepEscobar: Our country has the moral obligation and responsibility to reunite every single family separated at the southern border.\n\nT…',
 'RT @RoKhanna: What happens when we guarantee $15/hour?\n\n💰 31% of Black workers and 26% of Latinx workers get raises.\n😷 A majority of essent…',
 '(Source: https://t.co/3o5JEr6zpd)',
 'Joe Cunningham pledged to never take corporate PAC money, and he never did. Mace said she’ll cash every check she gets. Yet another way this is a downgrade. https://t.co/DytsQXKXgU',
 'What’s even more gross is that Mace takes corporate PAC money.\n\nShe’s already funded by corporations. Now she’s choosing to swindle working people on top of it.\n\nPeak scam artistry. Caps for cash 💰 https://t.co/CcVxgDF6id',
 'Joe Cunningham already proving to be leagues more decent + honest than Mace seems capable of.\n\nThe House was far better off w/ Cunningham. It’s sad to see Mace diminish the representation of her community by launching a reputation of craven dishone

<br><br><br>
Suppose I wanted to evaluate whether a tweet is attacking someone

In [12]:
prompt = """
Is the following text making a statement about minimum wage? You should answer either Yes or No.

{text}

Answer:
"""
questions = [prompt.format_map(dict(text=t)) for t in tweets['full_text'].head(20)]

Ask each of the LLMs to answer the questions:

In [13]:
open_ai_answers = llm.batch(questions)
open_ai_answers

['\nYes',
 'Yes',
 '\nNo',
 '\nNo',
 'No',
 '\nNo',
 '\nNo',
 'No',
 '\nNo',
 '\nYes',
 '\nNo',
 '\nYes',
 '\nYes',
 '\nNo',
 '\nNo',
 '\nYes',
 '\nNo',
 '\nNo',
 '\nYes',
 '\nNo']

In [14]:
vicuna_answers = vicuna.batch(questions)
vicuna_answers

No.No.Yes.No.No.No.Yes.No.Yes.
No.Yes.No.No.Yes.No.No.No.Yes.No.Yes.

['No.',
 'No.',
 'Yes.',
 'No.',
 'No.',
 'No.',
 'Yes.',
 'No.',
 'Yes.',
 '\nNo.',
 'Yes.',
 'No.',
 'No.',
 'Yes.',
 'No.',
 'No.',
 'No.',
 'Yes.',
 'No.',
 'Yes.']

In [15]:
pd.set_option('display.max_colwidth', None)
df = pd.DataFrame({"OpenAI": open_ai_answers, 
                   "Vicuna": vicuna_answers,
                   "Text": tweets['full_text'].head(20)})
df["OpenAI"] = df["OpenAI"].str.contains("Y")
df["Vicuna"] = df["Vicuna"].str.contains("Y")
df

Unnamed: 0,OpenAI,Vicuna,Text
0,True,False,RT @RepEscobar: Our country has the moral obligation and responsibility to reunite every single family separated at the southern border.\n\nT…
1,True,False,RT @RoKhanna: What happens when we guarantee $15/hour?\n\n💰 31% of Black workers and 26% of Latinx workers get raises.\n😷 A majority of essent…
2,False,True,(Source: https://t.co/3o5JEr6zpd)
3,False,False,"Joe Cunningham pledged to never take corporate PAC money, and he never did. Mace said she’ll cash every check she gets. Yet another way this is a downgrade. https://t.co/DytsQXKXgU"
4,False,False,What’s even more gross is that Mace takes corporate PAC money.\n\nShe’s already funded by corporations. Now she’s choosing to swindle working people on top of it.\n\nPeak scam artistry. Caps for cash 💰 https://t.co/CcVxgDF6id
5,False,False,Joe Cunningham already proving to be leagues more decent + honest than Mace seems capable of.\n\nThe House was far better off w/ Cunningham. It’s sad to see Mace diminish the representation of her community by launching a reputation of craven dishonesty right off the bat.
6,False,True,"Pretty horrible.\n\nWell, it’s good to know what kind of person she is early. Also good to know that Mace is cut from the same Trump cloth of dishonesty and opportunism.\n\nSad to see a colleague intentionally hurt other women and survivors to make a buck. Thought she’d be better. https://t.co/CcVxgDF6id"
7,False,False,RT @jaketapper: .@RepNancyMace fundraising off the false smear that @AOC misrepresented her experience during the insurrection. She didn’t.…
8,False,True,RT @RepMcGovern: One reason Washington can’t “come together” is because of people like her sending out emails like this.\n\nShe should apolog…
9,True,False,"RT @JoeNeguse: Just to be clear, “targeting” stimulus checks means denying them to some working families who would otherwise receive them."


In [18]:
prompt = """
Is the following text self promoting? You should answer either Yes or No.

{text}

Answer:
"""
questions = [prompt.format_map(dict(text=t)) for t in tweets['full_text'].head(20)]
open_ai_answers2 = llm.batch(questions)
df2 = pd.DataFrame({"OpenAI": open_ai_answers2, 
                   "Text": tweets['full_text'].head(20)})
df2["OpenAI"] = df2["OpenAI"].str.contains("Y")
df2

Unnamed: 0,OpenAI,Text
0,False,RT @RepEscobar: Our country has the moral obligation and responsibility to reunite every single family separated at the southern border.\n\nT…
1,False,RT @RoKhanna: What happens when we guarantee $15/hour?\n\n💰 31% of Black workers and 26% of Latinx workers get raises.\n😷 A majority of essent…
2,False,(Source: https://t.co/3o5JEr6zpd)
3,False,"Joe Cunningham pledged to never take corporate PAC money, and he never did. Mace said she’ll cash every check she gets. Yet another way this is a downgrade. https://t.co/DytsQXKXgU"
4,True,What’s even more gross is that Mace takes corporate PAC money.\n\nShe’s already funded by corporations. Now she’s choosing to swindle working people on top of it.\n\nPeak scam artistry. Caps for cash 💰 https://t.co/CcVxgDF6id
5,True,Joe Cunningham already proving to be leagues more decent + honest than Mace seems capable of.\n\nThe House was far better off w/ Cunningham. It’s sad to see Mace diminish the representation of her community by launching a reputation of craven dishonesty right off the bat.
6,True,"Pretty horrible.\n\nWell, it’s good to know what kind of person she is early. Also good to know that Mace is cut from the same Trump cloth of dishonesty and opportunism.\n\nSad to see a colleague intentionally hurt other women and survivors to make a buck. Thought she’d be better. https://t.co/CcVxgDF6id"
7,True,RT @jaketapper: .@RepNancyMace fundraising off the false smear that @AOC misrepresented her experience during the insurrection. She didn’t.…
8,False,RT @RepMcGovern: One reason Washington can’t “come together” is because of people like her sending out emails like this.\n\nShe should apolog…
9,False,"RT @JoeNeguse: Just to be clear, “targeting” stimulus checks means denying them to some working families who would otherwise receive them."
