# Streaming AI Responses

*[Coding along with the Udemy online course [LLM Engineering: Master AI & Large Language Models](https://www.udemy.com/course/llm-engineering-master-ai-and-large-language-models/) by Ed Donner; GitHub repo can be found at [github.com/ed-donner/llm_engineering](https://github.com/ed-donner/llm_engineering)]*

## Implementing Real-Time LLM Output in Python

To get access to Google's Gemini API we first tried to just `poetry add google`.

But, ccording to https://github.com/googleapis/google-api-python-client#installation, you need to install the google-api-python-client package:

`poetry add google-api-python-client`

The following issue that appeared, “No module named ‘google.generativeai’” is related to the installation and import of the google.generativeai module.

Solution: ensure that you have installed the google-generativeai module correctly using `poetry add google-generativeai`.

It's highly likely that `google` and `google-api-python-client` aren't necessary and can be removed.

In [6]:
from openai import OpenAI
import anthropic
import google.generativeai
from IPython.display import Markdown, display, update_display
import pandas as pd

In [7]:
openai_api_key = pd.read_csv("~/tmp/chat_gpt/agentic-design-1.txt", sep=" ", header=None)[0][0]
print("Don't be a fool and sent your api key to github")

Don't be a fool and sent your api key to github


### <span style="color:green">Interlude: Importing API Keys for Claude and Gemini and Testing them Briefly</span>

In [8]:
anthropic_api_key = pd.read_csv("~/tmp/anthropic/anthropic-key-1.txt", sep=" ", header=None)[0][0]
print("Don't be a fool and sent your api key to github")

Don't be a fool and sent your api key to github


In [9]:
google_api_key = pd.read_csv("~/tmp/google-gemini/gemini-key-1.txt", sep=" ", header=None)[0][0]
print("Don't be a fool and sent your api key to github")

Don't be a fool and sent your api key to github


In [18]:
# connect to openai
openai = OpenAI(api_key=openai_api_key)

In [11]:
# connect to anthropic
claude = anthropic.Anthropic(api_key=anthropic_api_key)

In [12]:
# clause example from https://www.datacamp.com/tutorial/getting-started-with-claude-3-and-the-claude-3-api
from anthropic import HUMAN_PROMPT, AI_PROMPT

completion = claude.completions.create(
    model="claude-2.1",
    max_tokens_to_sample=300,
    prompt=f"{HUMAN_PROMPT} What is Matthew effect? {AI_PROMPT}",
)
Markdown(completion.completion)

 The Matthew effect refers to the sociological phenomenon of "the rich get richer and the poor get poorer" over time. Some key points about the Matthew effect:

- It derives its name from a passage in the Biblical Gospel of Matthew: "For to all those who have, more will be given, and they will have an abundance; but from those who have nothing, even what they have will be taken away."

- In sociology, it was first coined by Robert K. Merton to describe how eminent scientists tend to get more credit and recognition than less known researchers, even when their work is similar in quality and importance. Their fame brings them more influence and funding opportunities.

- It describes the dynamics of inequality and how small advantages accrue over time into large disparities. Those who start with more capital, skills, resources etc. are able to leverage them to gain more over time, widening the gap.

- It operates across many domains beyond academia as well, including economic, socioeconomic, and health disparities between demographic groups in society. Initial advantages allow certain groups to gain more wealth, opportunities, and wellbeing over time.

In essence, the Matthew effect points to the self-perpetuating nature of advantages and disadvantages in society, and how this can lead to increasing inequality and social gaps if systemic factors do not counterbalance these dynamics. Many policy interventions aim to address aspects of the Matthew effect in societies.

In [13]:
# connect to gemini
google.generativeai.configure(api_key=google_api_key)

In [14]:
# a first google request
# https://ai.google.dev/gemini-api/docs/quickstart?lang=python
gem_model = google.generativeai.GenerativeModel("gemini-1.5-flash")
response = gem_model.generate_content("Write a story about a magic backpack.")
print(response.text)

Ten-year-old Finn was a master of lost and found. He wasn't particularly good at *finding* things, but he was exceptional at *losing* them. It was an unfortunate trait for a boy whose family was about to embark on a month-long road trip across the American Southwest.

His mother, exasperated, had bought him a brand-new backpack. It was a bright, cheery orange with a worn leather flap that held a single, brass lock. "This is special, Finn," she said, handing it to him. "It’s the last one they had, and it'll keep all your things safe.”


The first day of the trip was a blur of dusty highways and endless vistas. That night, in a campground outside a sleepy town, Finn sat by the campfire, his new backpack at his feet. He was bored. He missed his friends, his video games, and the comfort of his own room. 

He idly kicked the backpack. It felt heavier than it should be. He unzipped it and peered inside. He swore he saw something move. He reached in, and to his surprise, his hand brushed agai

### <span style="color:green">Interlude End, let's continue with the course</span>

## Comparing LLMs: Let's ask them to tell a joke

__In general we pass the following information to the API:__

- The __name of the model__ that should be used
- A __system message__ that gives `overall context` for the role the LLM is playing
- A __user message__ that provides the `actual prompt`

Another parameters that can be used is __temperature__ (a number between 0 and 1). The higher the temperature, the more random the `output``
, the lower the more focused and deterministic the output will be.

In [15]:
# basic messages for the "tell a joke" example
system_message = "You are an assistant that is great at telling jokes"
user_prompt = "Tell a light-hearted joke for an audience of Data Scientists"

In [16]:
# putting prompts into a list
prompts = [
    { "role": "system", "content": system_message },
    { "role": "user", "content": user_prompt },
]

### OpenAI's GPT model

In [17]:
# OpenAI's GPT model using the outdated GPT-3.5-Turbo
completion = openai.chat.completions.create(model='gpt-3.5-turbo', messages=prompts)
print(completion.choices[0].message.content)

Why did the data scientist break up with their calculator? 

Because it couldn't handle their complex relationship!


In [19]:
# OpenAI's GPT model using GPT-4o-mini
# Temperature setting controls creativity
completion = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts,
    temperature=0.7
)
print(completion.choices[0].message.content)

Why did the data scientist bring a ladder to work?

Because they wanted to reach new heights in their analysis!


In [20]:
# OpenAI's GPT model using GPT-4o
# Temperature setting controls creativity
completion = openai.chat.completions.create(
    model='gpt-4o',
    messages=prompts,
    temperature=0.4
)
print(completion.choices[0].message.content)

Why did the data scientist bring a ladder to the bar?

Because they heard the drinks were on the house, and they wanted to elevate their data points!


In [22]:
# OpenAI's GPT model using GPT-4o
# Temperature set to 1 for max creativity
completion = openai.chat.completions.create(
    model='gpt-4o',
    messages=prompts,
    temperature=1.0
)
print(completion.choices[0].message.content)

Why do data scientists love nature hikes?

Because they can't resist a good random forest!


### Anthropic's Claude model

In [23]:
# Claude 3.5 Sonnet
# API needs system message provided separately from user prompt
# Also adding max_tokens
message = claude.messages.create(
    model="claude-3-5-sonnet-20240620",
    max_tokens=200,
    temperature=0.7,
    system=system_message,
    messages=[
        {"role": "user", "content": user_prompt},
    ],
)
print(message.content[0].text)

Sure, here's a light-hearted joke for data scientists:

Why did the data scientist break up with their significant other?

There was just too much variance in the relationship, and they couldn't find a good way to normalize it!


#### __Prompting Claude with streaming back results:__

In [25]:
# Claude 3.5 Sonnet again
result = claude.messages.stream(
    model="claude-3-5-sonnet-20240620",
    max_tokens=200,
    temperature=0.7,
    system=system_message,
    messages=[
        {"role": "user", "content": user_prompt},
    ],
)
# now let's stream back results
with result as stream:
    for text in stream.text_stream:
            print(text, end="", flush=True)

Sure, here's a light-hearted joke for data scientists:

Why did the data scientist break up with their significant other?

Because there was no significant correlation between them!

Ba dum tss! 🥁

This joke plays on the statistical concept of "significant correlation" that data scientists often deal with in their work, while also making a pun about relationships. It's a bit nerdy, but should get a chuckle from a data-savvy audience!

### Google's Gemini model

In [26]:
# The API for Gemini has a slightly different structure
# first putting in system_message
gemini = google.generativeai.GenerativeModel(
    model_name='gemini-1.5-flash',
    system_instruction=system_message
)
# second putting in the user_prompt
response = gemini.generate_content(user_prompt)
print(response.text)

Why did the data scientist break up with the statistician? 

Because they couldn't see eye to eye on the p-value! 



## Back to the Business Solution question

### OpenAI's gpt-4o-mini model

In [27]:
prompts = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "How do I decide if a business problem is suitable for an LLM solution?"}
  ]

In [28]:
stream = openai.chat.completions.create(
    model='gpt-4o',
    messages=prompts,
    temperature=0.7,
    stream=True
)

# have it stream back results in markdown
reply = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    reply += chunk.choices[0].delta.content or ''
    reply = reply.replace("```","").replace("markdown","")
    update_display(Markdown(reply), display_id=display_handle.display_id)

Determining whether a business problem is suitable for a Large Language Model (LLM) solution involves several considerations. LLMs are powerful tools capable of understanding and generating human language, but they are not suitable for all problems. Here’s a guide to help you assess if an LLM is the right fit for your business problem:

1. **Nature of the Problem**:
   - **Language-based**: LLMs excel at tasks involving natural language, such as text generation, summarization, translation, sentiment analysis, and question answering.
   - **Complexity and Nuance**: If the problem requires understanding of complex language nuances or generating coherent and contextually relevant text, an LLM might be appropriate.

2. **Data Availability**:
   - **Quality and Volume**: Ensure you have access to large volumes of high-quality text data relevant to the problem. LLMs require significant data to learn and perform effectively.
   - **Diversity**: The data should cover a diverse range of scenarios and contexts to train the model adequately.

3. **Performance Requirements**:
   - **Accuracy and Precision**: Consider whether the level of accuracy and precision required by your business problem can be achieved with an LLM.
   - **Latency and Real-time Processing**: Assess whether the response time of LLMs aligns with your application’s requirements, as they can be computationally intensive.

4. **Ethical and Compliance Considerations**:
   - **Bias and Fairness**: LLMs can inadvertently learn biases present in the training data. Evaluate if this poses a risk to your application.
   - **Privacy and Security**: Ensure that using an LLM complies with data privacy regulations and security standards relevant to your industry.

5. **Integration and Scalability**:
   - **Technical Infrastructure**: Determine if your existing infrastructure can support the deployment and scaling of an LLM solution.
   - **Interoperability**: Consider how easily the LLM can integrate with your existing systems and workflows.

6. **Cost and Resource Constraints**:
   - **Development and Maintenance Cost**: LLMs can be resource-intensive. Evaluate the cost of development, deployment, and ongoing maintenance.
   - **Computational Resources**: Ensure you have the necessary computational resources to train, fine-tune, and deploy LLMs.

7. **Alternative Solutions**:
   - **Rule-based Systems or Simpler Models**: Sometimes, simpler models or rule-based systems might suffice, especially for well-defined tasks.
   - **Custom Models**: Explore if a custom-built solution would be more effective for specific problems.

8. **Potential for Innovation and Improvement**:
   - **Value Addition**: Assess whether an LLM can significantly improve existing processes or create new opportunities for innovation within your business.

9. **User Experience**:
   - **Understandability and Transparency**: Consider if the LLM’s outputs can be easily understood and trusted by end-users.

By carefully evaluating these factors, you can determine whether an LLM solution is suitable for your business problem. Remember that while LLMs are powerful, they are not a one-size-fits-all solution and should be used where they add the most value.