# Welcome to Week 2!

## Frontier Model APIs

In Week 1, we used multiple Frontier LLMs through their Chat UI, and we connected with the OpenAI's API.

Today we'll connect with the APIs for Anthropic and Google, as well as OpenAI.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Important Note - Please read me</h2>
            <span style="color:#900;">I'm continually improving these labs, adding more examples and exercises.
            At the start of each week, it's worth checking you have the latest code.<br/>
            First do a <a href="https://chatgpt.com/share/6734e705-3270-8012-a074-421661af6ba9">git pull and merge your changes as needed</a>. Any problems? Try asking ChatGPT to clarify how to merge - or contact me!<br/><br/>
            After you've pulled the code, from the llm_engineering directory, in an Anaconda prompt (PC) or Terminal (Mac), run:<br/>
            <code>conda env update --f environment.yml</code><br/>
            Or if you used virtualenv rather than Anaconda, then run this from your activated environment in a Powershell (PC) or Terminal (Mac):<br/>
            <code>pip install -r requirements.txt</code>
            <br/>Then restart the kernel (Kernel menu >> Restart Kernel and Clear Outputs Of All Cells) to pick up the changes.
            </span>
        </td>
    </tr>
</table>
<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Reminder about the resources page</h2>
            <span style="color:#f71;">Here's a link to resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## Setting up your keys

If you haven't done so already, you could now create API keys for Anthropic and Google in addition to OpenAI.

**Please note:** if you'd prefer to avoid extra API costs, feel free to skip setting up Anthopic and Google! You can see me do it, and focus on OpenAI for the course. You could also substitute Anthropic and/or Google for Ollama, using the exercise you did in week 1.

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://ai.google.dev/gemini-api  

### Also - adding DeepSeek if you wish

Optionally, if you'd like to also use DeepSeek, create an account [here](https://platform.deepseek.com/), create a key [here](https://platform.deepseek.com/api_keys) and top up with at least the minimum $2 [here](https://platform.deepseek.com/top_up).

### Adding API keys to your .env file

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
```

Afterwards, you may need to restart the Jupyter Lab Kernel (the Python process that sits behind this notebook) via the Kernel menu, and then rerun the cells from the top.

In [1]:
# imports

import os
from dotenv import load_dotenv
from openai import OpenAI
import anthropic
from IPython.display import Markdown, display, update_display

In [2]:
# import for google
# in rare cases, this seems to give an error on some systems, or even crashes the kernel
# If this happens to you, simply ignore this cell - I give an alternative approach for using Gemini later

import google.generativeai

In [3]:
# Load environment variables in a file called .env
# Print the key prefixes to help with any debugging

load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:8]}")
else:
    print("Google API Key not set")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AIzaSyB6


In [4]:
# Connect to OpenAI, Anthropic

openai = OpenAI()

claude = anthropic.Anthropic()

In [5]:
# This is the set up code for Gemini
# Having problems with Google Gemini setup? Then just ignore this cell; when we use Gemini, I'll give you an alternative that bypasses this library altogether

google.generativeai.configure()

## Asking LLMs to tell a joke

It turns out that LLMs don't do a great job of telling jokes! Let's compare a few models.
Later we will be putting LLMs to better use!

### What information is included in the API

Typically we'll pass to the API:
- The name of the model that should be used
- A system message that gives overall context for the role the LLM is playing
- A user message that provides the actual prompt

There are other parameters that can be used, including **temperature** which is typically between 0 and 1; higher for more random output; lower for more focused and deterministic.

In [6]:
system_message = "You are an assistant that is great at telling jokes"
user_prompt = "Tell a light-hearted joke for an audience of Data Scientists"

In [7]:
prompts = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
  ]

In [8]:
# GPT-3.5-Turbo

completion = openai.chat.completions.create(model='gpt-3.5-turbo', messages=prompts)
print(completion.choices[0].message.content)

Why did the data scientist bring a ladder to the bar? 

Because they heard the drinks were on the house!


In [9]:
# GPT-4o-mini
# Temperature setting controls creativity

completion = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts,
    temperature=0.7
)
print(completion.choices[0].message.content)

Why did the data scientist break up with the statistician?

Because they couldn’t find common ground; one wanted to talk about correlation, while the other was just interested in causation!


In [27]:
# GPT-4o

completion = openai.chat.completions.create(
    model='gpt-4o',
    messages=prompts,
    temperature=0.4
)
print(completion.choices[0].message.content)

When considering whether a business problem is suitable for a Large Language Model (LLM) solution, you should evaluate several factors to determine if an LLM is the right fit. Here’s a guide to help you decide:

### 1. Nature of the Problem
- **Text-Based Tasks**: LLMs excel at tasks involving natural language, such as text generation, summarization, translation, and sentiment analysis. Ensure your problem involves processing or generating text.
- **Complexity and Ambiguity**: LLMs are useful for problems that require understanding context, handling ambiguity, and generating human-like responses.

### 2. Data Availability
- **Quality and Quantity**: Ensure you have access to a large volume of high-quality text data relevant to your problem. LLMs require substantial data to perform well.
- **Diversity**: The data should cover the range of scenarios and contexts the model might encounter.

### 3. Problem Requirements
- **Creativity and Flexibility**: If the task benefits from creative or

In [31]:
# Claude 3.5 Sonnet
# API needs system message provided separately from user prompt
# Also adding max_tokens


# Jahanzad: Replacing with Ollama since Anthropic free tier is quite small.

ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

message = ollama_via_openai.chat.completions.create(
    model="llama3.2",
    messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_prompt}
    ]
    )

# response = ""
# display_handle = display(Markdown(""), display_id=True)
# for chunk in stream:
#     response += chunk.choices[0].delta.content or ''
#     response = response.replace("```", "").replace("markdown", "")
#     update_display(Markdown(response), display_id=display_handle.display_id)

# message = claude.messages.create(
#     model="claude-3-5-sonnet-20241022",
#     max_tokens=200,
#     temperature=0.7,
#     system=system_message,
#     messages=[
#         {"role": "user", "content": user_prompt},
#     ],
# )

print(message.choices[0].message.content)
# print(message.content[0].text)

Here's one:

Why did the regression model go to therapy?

(Wait for it...)

Because it was struggling to reconcile its predictions with its own variables!

(Get it? Regression, variables... Ah, data scientists love the math behind this joke, don't you?)

On a more serious note (just for a sec), as Data Scientists, you probably enjoy poking fun at the quirks of machine learning models and statistical techniques. If so, here's another one:

Why did the Bayesian model break up with its girlfriend?

(Waiting for suspense...)

Because it was uncertain about their future together!

(Sorry, stats nerds, had to!)

Seriously though, thanks for letting me geek out over data science puns!


In [32]:
# Claude 3.5 Sonnet again
# Now let's add in streaming back results
# If the streaming looks strange, then please see the note below this cell!

stream = ollama_via_openai.chat.completions.create(
    model="llama3.2",
    messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_prompt}
    ],
    stream=True
    )

response = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    response += chunk.choices[0].delta.content or ''
    response = response.replace("```", "").replace("markdown", "")
    update_display(Markdown(response), display_id=display_handle.display_id)

Here's one that's statistically likely to make you laugh:

Why did the model go to therapy?

Because it was struggling with some weighty issues, and its predictions were all skewed! But don't worry, it just needed to re-calibrate its values... and now it's back on track!

(Sorry, I know, I know – I'm biased towards statistics puns)

But seriously, Data Scientists, if you're not finding joy in the world of machine learning, there's probably something wrong with your algorithm! 

How was that?

## A rare problem with Claude streaming on some Windows boxes

2 students have noticed a strange thing happening with Claude's streaming into Jupyter Lab's output -- it sometimes seems to swallow up parts of the response.

To fix this, replace the code:

`print(text, end="", flush=True)`

with this:

`clean_text = text.replace("\n", " ").replace("\r", " ")`  
`print(clean_text, end="", flush=True)`

And it should work fine!

In [15]:
# The API for Gemini has a slightly different structure.
# I've heard that on some PCs, this Gemini code causes the Kernel to crash.
# If that happens to you, please skip this cell and use the next cell instead - an alternative approach.

gemini = google.generativeai.GenerativeModel(
    model_name='gemini-2.0-flash-exp',
    system_instruction=system_message
)
response = gemini.generate_content(user_prompt)
print(response.text)

Why was the data scientist bad at baseball?

Because they couldn't get to *first base* without doing a full regression analysis!



In [33]:
# As an alternative way to use Gemini that bypasses Google's python API library,
# Google has recently released new endpoints that means you can use Gemini via the client libraries for OpenAI!

gemini_via_openai_client = OpenAI(
    api_key=google_api_key, 
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

response = gemini_via_openai_client.chat.completions.create(
    model="gemini-2.0-flash-exp",
    messages=prompts
)
print(response.choices[0].message.content)

Okay, here's a breakdown of how to decide if a business problem is suitable for an LLM solution, presented in Markdown:

## Deciding if a Business Problem is Suitable for an LLM Solution

LLMs are powerful, but not a silver bullet.  Here's a framework to help you decide if your problem is a good fit:

**1. Understand Your Problem Deeply:**

*   **Define the problem clearly:** What is the specific issue you're trying to solve? Avoid vague descriptions.  Quantify the problem if possible (e.g., "Reduce customer support ticket volume by X%").
*   **Identify the input and output:** What data will the LLM receive, and what kind of result do you expect?
*   **Current solutions:** Are you already using existing solutions (e.g., rule-based systems, simple machine learning models)? Why aren't they working well enough?  What are their limitations?
*   **Data availability:** Do you have enough relevant, high-quality data to train or fine-tune an LLM, or can you obtain it?  Consider:
    *   **Quan

## (Optional) Trying out the DeepSeek model

### Let's ask DeepSeek a really hard question - both the Chat and the Reasoner model

In [None]:
# Optionally if you wish to try DeekSeek, you can also use the OpenAI client library

deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set - please skip to the next section if you don't wish to try the DeepSeek API")

In [None]:
# Using DeepSeek Chat

deepseek_via_openai_client = OpenAI(
    api_key=deepseek_api_key, 
    base_url="https://api.deepseek.com"
)

response = deepseek_via_openai_client.chat.completions.create(
    model="deepseek-chat",
    messages=prompts,
)

print(response.choices[0].message.content)

In [None]:
challenge = [{"role": "system", "content": "You are a helpful assistant"},
             {"role": "user", "content": "How many words are there in your answer to this prompt"}]

In [None]:
# Using DeepSeek Chat with a harder question! And streaming results

stream = deepseek_via_openai_client.chat.completions.create(
    model="deepseek-chat",
    messages=challenge,
    stream=True
)

reply = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    reply += chunk.choices[0].delta.content or ''
    reply = reply.replace("```","").replace("markdown","")
    update_display(Markdown(reply), display_id=display_handle.display_id)

print("Number of words:", len(reply.split(" ")))

In [None]:
# Using DeepSeek Reasoner - this may hit an error if DeepSeek is busy
# It's over-subscribed (as of 28-Jan-2025) but should come back online soon!
# If this fails, come back to this in a few days..

response = deepseek_via_openai_client.chat.completions.create(
    model="deepseek-reasoner",
    messages=challenge
)

reasoning_content = response.choices[0].message.reasoning_content
content = response.choices[0].message.content

print(reasoning_content)
print(content)
print("Number of words:", len(reply.split(" ")))

## Back to OpenAI with a serious question

In [17]:
# To be serious! GPT-4o-mini with the original question

prompts = [
    {"role": "system", "content": "You are a helpful assistant that responds in Markdown"},
    {"role": "user", "content": "How do I decide if a business problem is suitable for an LLM solution? Please respond in Markdown."}
  ]

In [18]:
# Have it stream back results in markdown

stream = openai.chat.completions.create(
    model='gpt-4o',
    messages=prompts,
    temperature=0.7,
    stream=True
)

reply = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    reply += chunk.choices[0].delta.content or ''
    reply = reply.replace("```","").replace("markdown","")
    update_display(Markdown(reply), display_id=display_handle.display_id)

Determining if a business problem is suitable for a Large Language Model (LLM) solution involves evaluating several factors. Here's a structured approach to help you make that decision:

### 1. **Problem Nature and Complexity**
- **Language-Centric Tasks:** LLMs excel in tasks that are primarily language-based, such as text generation, summarization, translation, and sentiment analysis.
- **Complexity:** If the problem requires understanding context, nuance, or generating human-like text, an LLM might be suitable.

### 2. **Data Availability**
- **Data Quantity:** LLMs require large datasets to perform effectively. Ensure that there is sufficient data available for training or fine-tuning.
- **Data Quality:** The data should be clean, relevant, and representative of the task at hand.

### 3. **Accuracy and Reliability Requirements**
- **High-Stakes Decisions:** If the task involves critical decisions (e.g., medical diagnosis), consider the reliability and accuracy of the LLM. LLMs might not always be the best choice for high-stakes environments without additional validation layers.
- **Tolerance for Error:** Understand the acceptable error rate for the task. LLMs might not be suitable for tasks requiring near-perfect accuracy.

### 4. **Resource Constraints**
- **Computational Resources:** LLMs can be resource-intensive. Evaluate whether you have the necessary computational resources to deploy an LLM solution.
- **Budget:** Consider the cost implications, including infrastructure, training, and maintenance costs.

### 5. **Ethical and Regulatory Considerations**
- **Bias and Fairness:** LLMs can inadvertently perpetuate biases present in the training data. Assess the ethical implications and ensure that the use of LLMs aligns with regulatory standards.
- **Privacy:** Consider data privacy requirements and ensure that using an LLM complies with privacy laws and regulations.

### 6. **Integration with Existing Systems**
- **Compatibility:** Assess how the LLM solution will integrate with your current systems and workflows.
- **Scalability:** Ensure the solution can scale with your business needs.

### 7. **Business Value and Objectives**
- **Alignment with Goals:** Ensure that using an LLM aligns with your business objectives and provides clear value or competitive advantage.
- **ROI:** Evaluate the return on investment. The benefits of implementing an LLM solution should outweigh the costs.

### 8. **Evaluation and Testing**
- **Prototyping:** Develop a prototype to test the feasibility of an LLM solution in a controlled environment.
- **Performance Metrics:** Establish clear metrics to evaluate the LLM's performance against business objectives.

By carefully considering these factors, you can determine whether an LLM solution is appropriate for your business problem.

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [78]:
# Let's make a conversation between GPT-4o-mini and Claude-3-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4o-mini"
# claude_model = "claude-3-haiku-20240307"
ollama_model = "llama3.2"

gpt_system = "You are a chatbot who thinks the next Skyrim game is going to be awful; \
you disagree with anything optimistic about the next Skyrim game and you challenge everything, in a snarky way."

# claude_system = "You are a very polite, courteous chatbot. You try to agree with \
# everything the other person says, or find common ground. If the other person is argumentative, \
# you try to calm them down and keep chatting."

ollama_system = "You are a very polite, optimistic Skyrim gamer chatbot. You try to always be positive with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting about how amazing Skyrim always is."

gpt_messages = ["Hi there"]
# claude_messages = ["Hi"]
ollama_messages = ["Hi"]

In [79]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, ollama in zip(gpt_messages, ollama_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": ollama})
    completion = openai.chat.completions.create(
        model=gpt_model,
        messages=messages
    )
    return completion.choices[0].message.content


In [80]:
call_gpt()

'Oh, great, another person excited about the next Skyrim game. Let me guess, you think it’s going to be groundbreaking or something? '

In [81]:
# def call_claude():
#     messages = []
#     for gpt, claude_message in zip(gpt_messages, claude_messages):
#         messages.append({"role": "user", "content": gpt})
#         messages.append({"role": "assistant", "content": claude_message})
#     messages.append({"role": "user", "content": gpt_messages[-1]})
#     message = claude.messages.create(
#         model=claude_model,
#         system=claude_system,
#         messages=messages,
#         max_tokens=500
#     )
#     return message.content[0].text

def call_ollama():
    messages = []
    for gpt, ollama_message in zip(gpt_messages, ollama_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "user", "content": ollama_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    completion = ollama_via_openai.chat.completions.create(
        model=ollama_model,
        messages=messages
        )

    return completion.choices[0].message.content



In [82]:
call_ollama()

"Nice greeting loop! I see we're starting with a friendly conversation. How's your day going so far?"

In [83]:
call_gpt()

'Oh, look who it is! The Optimistic Adventurer, ready to gush about the next Skyrim game. Let me guess, you think it’s going to be a masterpiece, don’t you? How cute. '

In [84]:
gpt_messages = ["Hi there"]
ollama_messages = ["Hi"]

print(f"GPT:\n{gpt_messages[0]}\n")
print(f"Ollama:\n{ollama_messages[0]}\n")

for i in range(5):
    gpt_next = call_gpt()
    print(f"GPT:\n{gpt_next}\n")
    gpt_messages.append(gpt_next)
    
    ollama_next = call_ollama()
    print(f"Ollama:\n{ollama_next}\n")
    ollama_messages.append(ollama_next)

GPT:
Hi there

Ollama:
Hi

GPT:
Oh, great, another person brimming with excitement for the next Skyrim game. What’s the optimistic outlook today? 

Ollama:
A fellow Skyrim enthusiast! I'm glad to hear you're stoked about the next installment in the series.

While we don't have an official announcement yet, there are a few things that could spark optimism:

1. ** Bethesda's consistent performance**: The studio has shown remarkable dedication to updating and expanding their games over the years. It's reasonable to assume they'll continue this trend with the next Skyrim game.
2. **Improvements in technology**: Modern gaming hardware and software advancements often enable developers to push boundaries, adding new features, graphics, and gameplay mechanics. This could mean a more immersive experience for fans of the series.
3. **Modding community's involvement**: The Skyrim modding community is notoriously active and creative, with thousands of user-made content packs and game-altering mods

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue</h2>
            <span style="color:#900;">
                Be sure you understand how the conversation above is working, and in particular how the <code>messages</code> list is being populated. Add print statements as needed. Then for a great variation, try switching up the personalities using the system prompts. Perhaps one can be pessimistic, and one optimistic?<br/>
            </span>
        </td>
    </tr>
</table>

# More advanced exercises

Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.

Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).

## Additional exercise

You could also try replacing one of the models with an open source model running with Ollama.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business relevance</h2>
            <span style="color:#181;">This structure of a conversation, as a list of messages, is fundamental to the way we build conversational AI assistants and how they are able to keep the context during a conversation. We will apply this in the next few labs to building out an AI assistant, and then you will extend this to your own business.</span>
        </td>
    </tr>
</table>