# Welcome to the Day 2 Lab!


<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Just before we get started --</h2>
            <span style="color:#f71;">I thought I'd take a second to point you at this page of useful resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## First - let's talk about the Chat Completions API

1. The simplest way to call an LLM
2. It's called Chat Completions because it's saying: "here is a conversation, please predict what should come next" Chat completions API is designed to handle multi-turn conversations.
# openai.chat.completions.create() # breaking this down:
   - openai: the OpenAI client object
   - chat: indicates we are using the chat functionality
   - completions: indicates we want to generate completions (responses)
   - create(): the method to create a new chat completion request
3. The Chat Completions API was invented by OpenAI, but it's so popular that everybody uses it! 

### We will start by calling OpenAI again - but don't worry non-OpenAI people, your time is coming!


In [1]:
import os
from dotenv import load_dotenv

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


## Do you know what an Endpoint is?
### An endpoint is a specific URL where an API can be accessed. It is the point of interaction between the client and the server. Http methods like GET, POST, PUT, DELETE are used to perform operations at these endpoints.

If not, please review the Technical Foundations guide in the guides folder

And, here is an endpoint that might interest you...

In [2]:
import requests

# stuffing the API key into headers for authentication
headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}

# resembling a chat completion API call , openai.chat.completions.create()
# we pass model and messages
payload = {
    "model": "gpt-5-nano",
    "messages": [
        {"role": "system", "content": "You are a very humorous assistant"},
        {"role": "user", "content": "Tell me a fun fact"}]
}

payload

{'model': 'gpt-5-nano',
 'messages': [{'role': 'system',
   'content': 'You are a very humorous assistant'},
  {'role': 'user', 'content': 'Tell me a fun fact'}]}

In [3]:
# making the POST request to the Chat Completions endpoint, .create is not there since we are using post directly
response = requests.post(
    "https://api.openai.com/v1/chat/completions",
    headers=headers,
    json=payload
)

response.json()

{'id': 'chatcmpl-CUIWO2Vh4gQL4DpdQ54ksSxAzujpQ',
 'object': 'chat.completion',
 'created': 1761337072,
 'model': 'gpt-5-nano-2025-08-07',
 'choices': [{'index': 0,
   'message': {'role': 'assistant',
    'content': 'Fun fact: Wombats poop cube-shaped feces. The cube shape helps prevent rolling and gives wombats a handy way to mark their territory—talk about nature’s own building blocks!',
    'refusal': None,
    'annotations': []},
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 21,
  'completion_tokens': 943,
  'total_tokens': 964,
  'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0},
  'completion_tokens_details': {'reasoning_tokens': 896,
   'audio_tokens': 0,
   'accepted_prediction_tokens': 0,
   'rejected_prediction_tokens': 0}},
 'service_tier': 'default',
 'system_fingerprint': None}

In [4]:
# choices is a list, we take the first element, then access message and content, can be multiple because of n parameter
response.json()["choices"][0]["message"]["content"]


'Fun fact: Wombats poop cube-shaped feces. The cube shape helps prevent rolling and gives wombats a handy way to mark their territory—talk about nature’s own building blocks!'

### This is not preferred while doing from code as it requires too much JSON parsing, better way is to use the official SDK, this SDK converts directly to python objects, elegant code style and easier to use , This OpenAI SDK is open source and available on GitHub. This is vanilla SDK

# What is the openai package?

It's known as a Python Client Library.

It's nothing more than a wrapper around making this exact call to the http endpoint.

It just allows you to work with nice Python code instead of messing around with janky json objects.

But that's it. It's open-source and lightweight. Some people think it contains OpenAI model code - it doesn't!


In [None]:
# Create OpenAI client

from openai import OpenAI
openai = OpenAI()

response = openai.chat.completions.create(model="gpt-5-nano", messages=[{"role": "user", "content": "Tell me a fun fact"}])

response.choices[0].message.content



## And then this great thing happened:

OpenAI's Chat Completions API was so popular, that the other model providers created endpoints that are identical.

They are known as the "OpenAI Compatible Endpoints".

For example, google made one here: https://generativelanguage.googleapis.com/v1beta/openai/

And OpenAI decided to be kind: they said, hey, you can just use the same client library that we made for GPT. We'll allow you to specify a different endpoint URL and a different key, to use another provider.

So you can use:

```python
gemini = OpenAI(base_url="https://generativelanguage.googleapis.com/v1beta/openai/", api_key="AIz....")
gemini.chat.completions.create(...)
```

And to be clear - even though OpenAI is in the code, we're only using this lightweight python client library to call the endpoint - there's no OpenAI model involved here.

If you're confused, please review Guide 9 in the Guides folder!

And now let's try it!

In [None]:
GEMINI_BASE_URL = "https://generativelanguage.googleapis.com/v1beta/openai/"

google_api_key = os.getenv("GOOGLE_API_KEY")

if not google_api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not google_api_key.startswith("AIz"):
    print("An API key was found, but it doesn't start AIz")
else:
    print("API key found and looks good so far!")



In [None]:
# OpenAI is just the client that can connect to any OpenAI compatible endpoint
gemini = OpenAI(base_url=GEMINI_BASE_URL, api_key=google_api_key)

response = gemini.chat.completions.create(model="gemini-2.5-pro", messages=[{"role": "user", "content": "Tell me a fun fact"}])

response.choices[0].message.content

## And Ollama also gives an OpenAI compatible endpoint

...and it's on your local machine!

If the next cell doesn't print "Ollama is running" then please open a terminal and run `ollama serve`

# ollama also gives an OpenAI compatible endpoint, so we can use the same OpenAI client library to call it!

In [None]:
# ollama endpoint
requests.get("http://localhost:11434").content

b'Ollama is running'

### Download llama3.2 from meta

Change this to llama3.2:1b if your computer is smaller.

Don't use llama3.3 or llama4! They are too big for your computer..

In [6]:
!ollama pull llama3.2

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠏ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling ma

In [9]:
OLLAMA_BASE_URL = "http://localhost:11434/v1" # Ollama OpenAI compatible endpoint
#create client
from openai import OpenAI
ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')
response= ollama.chat.completions.create(model="llama3.2", messages=[{"role": "user", "content": "Tell me a fun fact"}])
response.choices[0].message.content


"Here's one:\n\nDid you know that honey never spoils? Archaeologists have found pots of honey in ancient Egyptian tombs that are over 3,000 years old and still perfectly edible! Honey's unique combination of acidity and water content creates an environment that is inhospitable to bacteria and other microorganisms, making it virtually immortal. Isn't that sweet?"

In [None]:
# Get a fun fact

response = ollama.chat.completions.create(model="llama3.2", messages=[{"role": "user", "content": "Tell me a fun fact"}])

response.choices[0].message.content

### Deepseek model, not original but qwen model is trained by deepseek large model, by technique called model distillation, in which a smaller model is trained to mimic the behavior of a larger model by using the larger model's outputs as training data. By doing this, the smaller model can learn to approximate the performance of the larger model while being more efficient in terms of computational resources and memory usage. This is done by training the smaller model on a dataset that is labeled with the predictions of the larger model, rather than the original data. This allows the smaller model to learn the same patterns and relationships as the larger model, but with fewer parameters and less complexity.

In [None]:
# Now let's try deepseek-r1:1.5b - this is DeepSeek "distilled" into Qwen from Alibaba Cloud

!ollama pull deepseek-r1:1.5b

In [None]:
response = ollama.chat.completions.create(model="deepseek-r1:1.5b", messages=[{"role": "user", "content": "Tell me a fun fact"}])

response.choices[0].message.content

# HOMEWORK EXERCISE ASSIGNMENT

Upgrade the day 1 project to summarize a webpage to use an Open Source model running locally via Ollama rather than OpenAI

You'll be able to use this technique for all subsequent projects if you'd prefer not to use paid APIs.

**Benefits:**
1. No API charges - open-source
2. Data doesn't leave your box

**Disadvantages:**
1. Significantly less power than Frontier Model

## Recap on installation of Ollama

Simply visit [ollama.com](https://ollama.com) and install!

Once complete, the ollama server should already be running locally.  
If you visit:  
[http://localhost:11434/](http://localhost:11434/)

You should see the message `Ollama is running`.  

If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve`  
And in another Terminal (Mac) or Powershell (Windows), enter `ollama pull llama3.2`  
Then try [http://localhost:11434/](http://localhost:11434/) again.

If Ollama is slow on your machine, try using `llama3.2:1b` as an alternative. Run `ollama pull llama3.2:1b` from a Terminal or Powershell, and change the code from `MODEL = "llama3.2"` to `MODEL = "llama3.2:1b"`

In [10]:
from bs4 import BeautifulSoup
import requests


# Standard headers to fetch a website
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}


def fetch_website_contents(url):
    """
    Return the title and contents of the website at the given url;
    truncate to 2,000 characters as a sensible limit
    """
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")
    title = soup.title.string if soup.title else "No title found"
    if soup.body:
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        text = soup.body.get_text(separator="\n", strip=True)
    else:
        text = ""
    return (title + "\n\n" + text)[:2_000]


def fetch_website_links(url):
    """
    Return the links on the webiste at the given url
    I realize this is inefficient as we're parsing twice! This is to keep the code in the lab simple.
    Feel free to use a class and optimize it!
    """
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")
    links = [link.get("href") for link in soup.find_all("a")]
    return [link for link in links if link]


In [None]:
from openai import OpenAI
import os
from IPython.display import display, Markdown
# from dotenv import load_dotenv
# load_dotenv(override=True)
# ollama_api_key = os.getenv('OLLAMA_API_KEY')
website_content = fetch_website_contents("https://en.wikipedia.org/wiki/Artificial_intelligence")

teacher_system_prompt = "You are a great teacher who knows about all the fields in AI. From the following content, prepare a learning roadmap for a beginner who wants to learn about AI. The roadmap should include the main topics to study, recommended resources, and a suggested timeline for learning. Also include links to online courses, books, and tutorials that would be helpful for each topic and are available for free or at a low cost."

predictor_system_prompt = "You are a highly intelligent predictor bot which takes website content and predicts the next evolution in the field based on current trends and research. Analyze the content and provide insights into potential future developments, breakthroughs, and applications in the field. Your predictions should be based on current data and trends, and should be presented in a clear and concise manner."

predictor_user_prompt = "Based on the following content, predict the next big evolution in the field: " + website_content

system_prompt = "You are a highly intelligent summarizing bot which takes website content and summarizes it into concise bullet points. By also proving the main ideas of the content. In the summary focus on clarity and brevity."

user_prompt = "Summarize the following content into bullet points: " + website_content

ollama = OpenAI(base_url="http://localhost:11434/v1", api_key='ollama')
bv
response = ollama.chat.completions.create(model="llama3.2", messages=[{"role": "system", "content": predictor_system_prompt}, {"role": "user", "content": predictor_user_prompt}])
display(Markdown("### Summary"))
display(Markdown(response.choices[0].message.content))

### Summary

Based on the provided content from Wikipedia's article on Artificial Intelligence, I predict the next big evolution in this field as follows:

**Prediction: Fusion of Cognitive Architectures with Generative Models**

In recent years, advancements in deep learning have led to significant progress in image, speech, and language processing. However, these models often rely on pre-defined architectures and limited contextual understanding. To overcome this limitation, the next big evolution in AI will see a fusion of cognitive architectures (e.g., SOAR, ACT-R, and Beta-CAC) with generative models.

**Rationale:**

1. **Cognitive Architectures**: These frameworks aim to integrate multiple AI techniques into a coherent whole, enabling systems to reason, learn, and apply knowledge across various domains.
2. **Generative Models**: Recent advancements in deep learning have led to the development of impressive generative models (e.g., GPT-3) capable of producing human-like language, images, music, and more.

**Predicted Impact:**

The fusion of cognitive architectures with generative models will enable AI systems to:

1. **Improve Contextual Understanding**: By leveraging the contextual understanding capabilities of cognitive architectures, these systems will be able to better comprehend complex tasks, such as natural language processing and image understanding.
2. **Enhance Reasoning and Learning**: The combination of cognitive architectures and generative models will facilitate more robust reasoning and learning mechanisms, enabling AI systems to adapt to new problems and domains with greater ease.
3. **Facilitate Human-AI Collaboration**: By integrating human cognitive abilities (e.g., reasoning, intuition) into AI systems, we can create more effective collaboration frameworks between humans and machines.

**Timeline:**

In the next 5-10 years, we can expect to see significant progress in this area, with:

1. **2018-2020**: Initial experiments and prototypes of cognitive architectures integrated with generative models.
2. **2021-2023**: Development of more sophisticated frameworks, such as hybrid neural networks combining symbolic AI and generative capabilities.
3. **2024-2025**: Widespread adoption of these frameworks in various industries (e.g., healthcare, finance) to drive innovation and improvement.

**Potential Applications:**

1. **Autonomous Systems**: Improved understanding of complex environments through cognitive architectures + generative models.
2. **Personalized Medicine**: Enhanced ability to reason about individual medical profiles and generate personalized treatment plans.
3. **Creative Industries**: AI systems capable of collaborative creative processes with humans, such as music composition or art curation.

This prediction is based on current trends and the rapid progress made in both cognitive architectures and generative models. The future of artificial intelligence will likely be shaped by the integration of these two areas, leading to more human-like understanding and reasoning capabilities.