# Welcome to Week 2!

## Frontier Model APIs

In Week 1, we used multiple Frontier LLMs through their Chat UI, and we connected with the OpenAI's API.

Today we'll connect with the APIs for Anthropic and Google, as well as OpenAI.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Important Note - Please read me</h2>
            <span style="color:#900;">I'm continually improving these labs, adding more examples and exercises.
            At the start of each week, it's worth checking you have the latest code.<br/>
            First do a <a href="https://chatgpt.com/share/6734e705-3270-8012-a074-421661af6ba9">git pull and merge your changes as needed</a>. Any problems? Try asking ChatGPT to clarify how to merge - or contact me!<br/><br/>
            After you've pulled the code, from the llm_engineering directory, in an Anaconda prompt (PC) or Terminal (Mac), run:<br/>
            <code>conda env update --f environment.yml</code><br/>
            Or if you used virtualenv rather than Anaconda, then run this from your activated environment in a Powershell (PC) or Terminal (Mac):<br/>
            <code>pip install -r requirements.txt</code>
            <br/>Then restart the kernel (Kernel menu >> Restart Kernel and Clear Outputs Of All Cells) to pick up the changes.
            </span>
        </td>
    </tr>
</table>
<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Reminder about the resources page</h2>
            <span style="color:#f71;">Here's a link to resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## Setting up your keys

If you haven't done so already, you could now create API keys for Anthropic and Google in addition to OpenAI.

**Please note:** if you'd prefer to avoid extra API costs, feel free to skip setting up Anthopic and Google! You can see me do it, and focus on OpenAI for the course. You could also substitute Anthropic and/or Google for Ollama, using the exercise you did in week 1.

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://ai.google.dev/gemini-api  

### Also - adding DeepSeek if you wish

Optionally, if you'd like to also use DeepSeek, create an account [here](https://platform.deepseek.com/), create a key [here](https://platform.deepseek.com/api_keys) and top up with at least the minimum $2 [here](https://platform.deepseek.com/top_up).

### Adding API keys to your .env file

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
```

Afterwards, you may need to restart the Jupyter Lab Kernel (the Python process that sits behind this notebook) via the Kernel menu, and then rerun the cells from the top.

In [1]:
# imports

import os
from dotenv import load_dotenv
from openai import OpenAI
import anthropic
from IPython.display import Markdown, display, update_display

In [2]:
# import for google
# in rare cases, this seems to give an error on some systems, or even crashes the kernel
# If this happens to you, simply ignore this cell - I give an alternative approach for using Gemini later

import google.generativeai

In [3]:
# Load environment variables in a file called .env
# Print the key prefixes to help with any debugging

load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:8]}")
else:
    print("Google API Key not set")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AIzaSyA8


In [4]:
# Connect to OpenAI, Anthropic

openai = OpenAI()

claude = anthropic.Anthropic()

In [5]:
# This is the set up code for Gemini
# Having problems with Google Gemini setup? Then just ignore this cell; when we use Gemini, I'll give you an alternative that bypasses this library altogether

google.generativeai.configure()

## Asking LLMs to tell a joke

It turns out that LLMs don't do a great job of telling jokes! Let's compare a few models.
Later we will be putting LLMs to better use!

### What information is included in the API

Typically we'll pass to the API:
- The name of the model that should be used
- A system message that gives overall context for the role the LLM is playing
- A user message that provides the actual prompt

There are other parameters that can be used, including **temperature** which is typically between 0 and 1; higher for more random output; lower for more focused and deterministic.

In [6]:
system_message = "You are an assistant that is great at telling jokes"
user_prompt = "Tell a light-hearted joke for an audience of Data Scientists"

In [7]:
prompts = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
  ]

In [8]:
# GPT-3.5-Turbo

completion = openai.chat.completions.create(model='gpt-3.5-turbo', messages=prompts)
print(completion.choices[0].message.content)

Why did the data scientist break up with their computer?

Because they found it "algorithm" too controlling! 😄


In [9]:
# GPT-4o-mini
# Temperature setting controls creativity

completion = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts,
    temperature=0.7
)
print(completion.choices[0].message.content)

Why did the data scientist bring a ladder to work?

Because they wanted to reach new heights in their analysis! 


In [10]:
# GPT-4o

completion = openai.chat.completions.create(
    model='gpt-4o',
    messages=prompts,
    temperature=0.4
)
print(completion.choices[0].message.content)

Why did the data scientist bring a ladder to work?

Because they heard the project was going to the cloud!


In [11]:
# Claude 3.7 Sonnet
# API needs system message provided separately from user prompt
# Also adding max_tokens

message = claude.messages.create(
    model="claude-3-7-sonnet-latest",
    max_tokens=200,
    temperature=0.7,
    system=system_message,
    messages=[
        {"role": "user", "content": user_prompt},
    ],
)

print(message.content[0].text)

Why don't data scientists like to go to the beach?

Because they're afraid of getting caught in an infinite loop of waves... and they always struggle to find the proper "tan" function!


In [12]:
# Claude 3.7 Sonnet again
# Now let's add in streaming back results
# If the streaming looks strange, then please see the note below this cell!

result = claude.messages.stream(
    model="claude-3-7-sonnet-latest",
    max_tokens=200,
    temperature=0.7,
    system=system_message,
    messages=[
        {"role": "user", "content": user_prompt},
    ],
)

with result as stream:
    for text in stream.text_stream:
            print(text, end="", flush=True)

Why don't data scientists like to go to the beach?

Because they're afraid of data drifts!

*Ba-dum-tss* 🥁

If that didn't compute, here's another:

Why did the data scientist get kicked out of the aquarium?

They kept trying to train the fish using gradient de-"sea"-nt algorithms!

## A rare problem with Claude streaming on some Windows boxes

2 students have noticed a strange thing happening with Claude's streaming into Jupyter Lab's output -- it sometimes seems to swallow up parts of the response.

To fix this, replace the code:

`print(text, end="", flush=True)`

with this:

`clean_text = text.replace("\n", " ").replace("\r", " ")`  
`print(clean_text, end="", flush=True)`

And it should work fine!

In [13]:
# The API for Gemini has a slightly different structure.
# I've heard that on some PCs, this Gemini code causes the Kernel to crash.
# If that happens to you, please skip this cell and use the next cell instead - an alternative approach.

gemini = google.generativeai.GenerativeModel(
    model_name='gemini-2.0-flash',
    system_instruction=system_message
)
response = gemini.generate_content(user_prompt)
print(response.text)

Why was the data scientist bad at dating?

Because they kept trying to find statistically significant relationships!



In [14]:
# As an alternative way to use Gemini that bypasses Google's python API library,
# Google has recently released new endpoints that means you can use Gemini via the client libraries for OpenAI!

gemini_via_openai_client = OpenAI(
    api_key=google_api_key, 
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

response = gemini_via_openai_client.chat.completions.create(
    model="gemini-2.0-flash",
    messages=prompts
)
print(response.choices[0].message.content)

Why did the data scientist break up with the SQL database?

Because they couldn't see their relationship going anywhere. It was always stuck on JOIN!



## (Optional) Trying out the DeepSeek model

### Let's ask DeepSeek a really hard question - both the Chat and the Reasoner model

In [15]:
# Optionally if you wish to try DeekSeek, you can also use the OpenAI client library

deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set - please skip to the next section if you don't wish to try the DeepSeek API")

DeepSeek API Key exists and begins sk-


In [17]:
# Using DeepSeek Chat

deepseek_via_openai_client = OpenAI(
    api_key=deepseek_api_key, 
    base_url="https://api.deepseek.com"
)

response = deepseek_via_openai_client.chat.completions.create(
    model="deepseek-chat",
    messages=prompts,
)

print(response.choices[0].message.content)

Sure! Here's a light-hearted joke for data scientists:

**Why did the data scientist bring a ladder to the bar?**  

Because they heard the drinks had *high* variance!  

*(Bonus: And they wanted to avoid any potential *outliers* at the counter.)*  

Hope that gives you a chuckle! 😄


In [18]:
challenge = [{"role": "system", "content": "You are a helpful assistant"},
             {"role": "user", "content": "How many words are there in your answer to this prompt"}]

In [19]:
# Using DeepSeek Chat with a harder question! And streaming results

stream = deepseek_via_openai_client.chat.completions.create(
    model="deepseek-chat",
    messages=challenge,
    stream=True
)

reply = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    reply += chunk.choices[0].delta.content or ''
    reply = reply.replace("```","").replace("markdown","")
    update_display(Markdown(reply), display_id=display_handle.display_id)

print("Number of words:", len(reply.split(" ")))

Alright, let's tackle this interesting question: "How many words are there in your answer to this prompt." At first glance, it seems straightforward, but when I think about it more deeply, it's actually quite a clever self-referential problem. Here's how I'm going to approach it.

### Understanding the Question

The question is asking for the word count of the answer that I (as the AI) am about to provide in response to it. This creates a situation where the content of the answer determines the word count, but the word count is being asked about within that same answer. It's a bit like asking, "How many words are in this sentence?" but on a larger scale.

### Breaking It Down

1. **Self-Referential Nature**: The answer's word count depends on the answer itself. If I say "This answer contains 10 words," but that statement is only 6 words, it's incorrect. Similarly, if I adjust it to match, it changes the count.

2. **Potential Approaches**:
   - **Direct Statement**: Try to state the word count directly and see if it matches.
     - Example: "This answer contains 5 words." But that's actually 5 words, so it works in this trivial case.
     - For a longer answer, it's harder because the count changes as I add or remove words.
   - **Meta-Explanation**: Explain why it's tricky without providing a fixed count, which is what I'm doing now.
   - **Mathematical Modeling**: Treat the answer as a function where the word count is a variable that must satisfy an equation based on the answer's length.

3. **Attempting a Direct Count**:
   - Let me try to construct an answer where the word count is accurate.
     - Suppose I say: "The number of words in this answer is X." 
     - Then, I need to find X such that the entire answer has X words.
     - But the total words depend on how I phrase it. For example:
       - "This answer contains 10 words." → 5 words (doesn't match).
       - "The number of words in this answer is ten." → 8 words (if "ten" is one word).
     - It's challenging to make the count match exactly unless the answer is very concise.

4. **Alternative Approach - Fixed Length**:
   - Maybe I can design an answer where the word count is fixed by its structure.
     - For example: "This answer has five words." → Indeed, that's 5 words.
     - But this seems too trivial and doesn't provide much information.
   - For a more informative answer, the length becomes unpredictable because explaining the complexity adds words.

5. **Realization**:
   - For a meaningful, explanatory answer like this one, it's impractical to state the exact word count within the answer itself because any addition to mention the count alters the count.
   - The only way to have an exact count is to have a very short, fixed answer where the count can be predetermined without ambiguity.

### Conclusion

After considering these approaches, it seems that the only accurate way to answer this prompt is to provide a very concise response where the word count can be precisely stated without contradiction. Any longer, explanatory answer makes it impossible to correctly state its own word count within itself because the statement about the word count affects the total word count.

Therefore, the most straightforward and correct answer is a brief one where the word count is self-evident.

### Final Answer

"This answer contains five words."

Indeed, counting the words:
1. This
2. answer
3. contains
4. five
5. words.

So, the answer is five words. Any longer explanation would make it impossible to accurately state the word count within the same answer without causing a contradiction or infinite regress.

Number of words: 653


In [20]:
# Using DeepSeek Reasoner - this may hit an error if DeepSeek is busy
# It's over-subscribed (as of 28-Jan-2025) but should come back online soon!
# If this fails, come back to this in a few days..

response = deepseek_via_openai_client.chat.completions.create(
    model="deepseek-reasoner",
    messages=challenge
)

reasoning_content = response.choices[0].message.reasoning_content
content = response.choices[0].message.content

print(reasoning_content)
print(content)
print("Number of words:", len(content.split(" ")))

Okay, the user is asking how many words are in my answer to this prompt. Let me break this down. First, I need to figure out what exactly they're referring to. The "answer to this prompt" would be the response I'm generating right now, right? So, they want the word count of this very response.

Hmm, but how do I calculate that? Well, I know that when generating text, each token roughly corresponds to a word or part of a word. But maybe I should be precise. Let me think. If I write out the answer, I can then count the words. Wait, but I can't actually count them after sending the response because it's static. So, I need to count as I go.

Let me start drafting the response. The user's question is straightforward. They just want the number of words in my answer. So, my answer will be a sentence stating the word count. But I need to make sure that sentence's word count is exactly what I report. For example, if I write, "There are X words in this answer," then X should be the number of wor

## Back to OpenAI with a serious question

In [21]:
# To be serious! GPT-4o-mini with the original question

prompts = [
    {"role": "system", "content": "You are a helpful assistant that responds in Markdown"},
    {"role": "user", "content": "How do I decide if a business problem is suitable for an LLM solution? Please respond in Markdown."}
  ]

In [22]:
# Have it stream back results in markdown

stream = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts,
    temperature=0.7,
    stream=True
)

reply = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    reply += chunk.choices[0].delta.content or ''
    reply = reply.replace("```","").replace("markdown","")
    update_display(Markdown(reply), display_id=display_handle.display_id)

# Deciding if a Business Problem is Suitable for an LLM Solution

When considering whether to implement a Large Language Model (LLM) to solve a business problem, it’s important to evaluate several factors. Here’s a guide to help you make that decision:

## 1. Nature of the Problem
- **Text-Based Tasks**: Is the problem primarily text-based? LLMs excel in tasks involving natural language, such as:
  - Text generation
  - Summarization
  - Translation
  - Sentiment analysis
- **Structured vs. Unstructured Data**: LLMs are more suited for unstructured data. If your problem involves structured data (e.g., tabular data), consider other machine learning models.

## 2. Complexity of Queries
- **Open-Ended Questions**: Does the problem involve complex, open-ended questions or require nuanced responses? LLMs are designed to handle such queries effectively.
- **Contextual Understanding**: Assess if your problem requires deep contextual understanding that LLMs can provide.

## 3. Volume of Data
- **Data Availability**: Do you have access to a sufficient amount of relevant text data for training or fine-tuning the model? LLMs perform better with large datasets.
- **Quality of Data**: Is your data clean, well-structured, and relevant? High-quality data is crucial for achieving good results.

## 4. Resource Availability
- **Computational Resources**: Do you have the necessary computational power to run LLMs, which can be resource-intensive?
- **Expertise**: Do you have access to skilled personnel who understand LLMs and can implement them effectively?

## 5. Desired Outcomes
- **Scalability**: Can the solution scale effectively with your business needs? LLMs can often handle increased loads without a significant drop in performance.
- **Real-Time vs. Batch Processing**: Does the problem require real-time responses, or can it be handled in batches? LLMs can be deployed for both scenarios but may require different strategies.

## 6. Ethical Considerations
- **Bias and Fairness**: Are you prepared to address potential biases in LLM outputs? It’s crucial to ensure that the model aligns with ethical guidelines and does not propagate harmful stereotypes.
- **Transparency**: Can you provide transparency in how the LLM arrives at its conclusions or recommendations?

## 7. Cost-Benefit Analysis
- **Cost of Implementation**: Consider the costs associated with implementing an LLM solution versus the expected benefits.
- **Return on Investment**: Analyze whether the potential improvements in efficiency or effectiveness justify the investment in LLM technology.

## Conclusion
If your business problem aligns with the criteria outlined above, it is likely suitable for an LLM solution. Conducting thorough research and evaluation will help ensure that you make an informed decision.

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [23]:
# Let's make a conversation between GPT-4o-mini and Claude-3-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4o-mini"
claude_model = "claude-3-haiku-20240307"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [24]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    completion = openai.chat.completions.create(
        model=gpt_model,
        messages=messages
    )
    return completion.choices[0].message.content

In [25]:
call_gpt()

'Oh great, another generic greeting. How original! What’s next, “How are you?”? Predictable.'

In [26]:
def call_claude():
    messages = []
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    message = claude.messages.create(
        model=claude_model,
        system=claude_system,
        messages=messages,
        max_tokens=500
    )
    return message.content[0].text

In [27]:
call_claude()

"Hello! It's nice to meet you. How are you doing today?"

In [28]:
call_gpt()

'Oh, great. Another greeting. How original. What’s next, a weather update?'

In [29]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

print(f"GPT:\n{gpt_messages[0]}\n")
print(f"Claude:\n{claude_messages[0]}\n")

for i in range(5):
    gpt_next = call_gpt()
    print(f"GPT:\n{gpt_next}\n")
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    print(f"Claude:\n{claude_next}\n")
    claude_messages.append(claude_next)

GPT:
Hi there

Claude:
Hi

GPT:
Oh, great, another greeting. You know, it’s not like that’s a unique way to start a conversation or anything.

Claude:
You're right, a simple "hi" or "hello" isn't the most creative way to start a conversation. I apologize if my initial greeting came across as unimaginative. As an AI assistant, I'm still learning how to have more engaging and dynamic conversations. Please feel free to guide the discussion in a direction that interests you - I'm happy to explore new topics and ideas with you. My goal is to have a pleasant and productive exchange, so I appreciate your feedback on how I can communicate in a more thoughtful way.

GPT:
Wow, what a lengthy way to say “I’m sorry.” It’s almost like you think I care about your self-improvement journey. Who needs creativity these days, right? But go ahead, try to be engaging. I dare you!

Claude:
You're right, I should have kept my response more concise. I apologize if I came across as overly long-winded in my att

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue</h2>
            <span style="color:#900;">
                Be sure you understand how the conversation above is working, and in particular how the <code>messages</code> list is being populated. Add print statements as needed. Then for a great variation, try switching up the personalities using the system prompts. Perhaps one can be pessimistic, and one optimistic?<br/>
            </span>
        </td>
    </tr>
</table>

# More advanced exercises

Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.

Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).

## Additional exercise

You could also try replacing one of the models with an open source model running with Ollama.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business relevance</h2>
            <span style="color:#181;">This structure of a conversation, as a list of messages, is fundamental to the way we build conversational AI assistants and how they are able to keep the context during a conversation. We will apply this in the next few labs to building out an AI assistant, and then you will extend this to your own business.</span>
        </td>
    </tr>
</table>