# Welcome to your first assignment!

Instructions are below. Please give this a try, and look in the solutions folder if you get stuck (or feel free to ask me!)

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Just before we get to the assignment --</h2>
            <span style="color:#f71;">I thought I'd take a second to point you at this page of useful resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

# HOMEWORK EXERCISE ASSIGNMENT

Upgrade the day 1 project to summarize a webpage to use an Open Source model running locally via Ollama rather than OpenAI

You'll be able to use this technique for all subsequent projects if you'd prefer not to use paid APIs.

**Benefits:**
1. No API charges - open-source
2. Data doesn't leave your box

**Disadvantages:**
1. Significantly less power than Frontier Model

## Recap on installation of Ollama

Simply visit [ollama.com](https://ollama.com) and install!

Once complete, the ollama server should already be running locally.  
If you visit:  
[http://localhost:11434/](http://localhost:11434/)

You should see the message `Ollama is running`.  

If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve`  
And in another Terminal (Mac) or Powershell (Windows), enter `ollama pull llama3.2`  
Then try [http://localhost:11434/](http://localhost:11434/) again.

If Ollama is slow on your machine, try using `llama3.2:1b` as an alternative. Run `ollama pull llama3.2:1b` from a Terminal or Powershell, and change the code below from `MODEL = "llama3.2"` to `MODEL = "llama3.2:1b"`

In [None]:
# imports

import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display

In [None]:
# Constants

OLLAMA_API = "http://localhost:11434/api/chat"
HEADERS = {"Content-Type": "application/json"}
MODEL = "llama3.2"

In [None]:
# Create a messages list using the same format that we used for OpenAI

messages = [
    {"role": "user", "content": "Describe some of the business applications of Generative AI"}
]

In [None]:
payload = {
        "model": MODEL,
        "messages": messages,
        "stream": False
    }

In [None]:
# Let's just make sure the model is loaded

!ollama pull llama3.2

[?25lpulling manifest â ‹ [?25h[?25l[2K[1Gpulling manifest â ™ [?25h[?25l[2K[1Gpulling manifest â ¹ [?25h[?25l[2K[1Gpulling manifest â ¸ [?25h[?25l[2K[1Gpulling manifest â ¼ [?25h[?25l[2K[1Gpulling manifest 
pulling dde5aa3fc5ff... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� 2.0 GB                         
pulling 966de95ca8a6... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� 1.4 KB                         
pulling fcc5a6bec9da... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� 7.7 KB                         
pulling a70ff7e570d9... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� 6.0 KB                         
pulling 56bb8bd477a5... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–�   96 B                         
pulling 34bb5ab01051... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–�  561 B                         
verifying sha256 digest 
writing manifest 
success [?25h


In [None]:
# If this doesn't work for any reason, try the 2 versions in the following cells
# And double check the instructions in the 'Recap on installation of Ollama' at the top of this lab
# And if none of that works - contact me!

response = requests.post(OLLAMA_API, json=payload, headers=HEADERS)
print(response.json()['message']['content'])

Generative AI has numerous business applications across various industries. Here are some examples:

1. **Content Creation**: Generative AI can generate high-quality content such as articles, blog posts, social media posts, product descriptions, and more. This can be especially useful for companies that need to produce a large amount of content quickly or have limited resources.
2. **Marketing Automation**: Generative AI can help automate marketing processes by generating personalized messages, emails, and even ad copy. It can also analyze customer data and suggest tailored campaigns based on the generated content.
3. **Product Design and Development**: Generative AI can aid in product design by generating 3D models, prototypes, and designs for products such as furniture, electronics, or other consumer goods.
4. **Virtual Assistants**: Generative AI can power virtual assistants like chatbots, voice assistants, and even customer service bots to provide personalized support to customers.

# Introducing the ollama package

And now we'll do the same thing, but using the elegant ollama python package instead of a direct HTTP call.

Under the hood, it's making the same call as above to the ollama server running at localhost:11434

In [None]:
import ollama

response = ollama.chat(model=MODEL, messages=messages)
print(response['message']['content'])

Generative AI has numerous business applications across various industries. Here are some examples:

1. **Content Generation**: Generative AI can be used to generate high-quality content such as blog posts, social media posts, articles, and more. This can help businesses save time and resources while maintaining a consistent tone and style.
2. **Product Design and Development**: Generative AI can assist in product design by generating 3D models, prototypes, and even entire products. This can be especially useful for companies that produce complex or innovative products.
3. **Marketing Automation**: Generative AI can help automate marketing campaigns by generating personalized content, social media posts, and even lead generation emails.
4. **Chatbots and Virtual Assistants**: Generative AI can be used to create chatbots and virtual assistants that can engage with customers, answer questions, and provide support 24/7.
5. **Image and Video Generation**: Generative AI can generate high-qu

## Alternative approach - using OpenAI python library to connect to Ollama

In [None]:
# There's actually an alternative approach that some people might prefer
# You can use the OpenAI client python library to call Ollama:

from openai import OpenAI
ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

response = ollama_via_openai.chat.completions.create(
    model=MODEL,
    messages=messages
)

print(response.choices[0].message.content)

## Also trying the amazing reasoning model DeepSeek

Here we use the version of DeepSeek-reasoner that's been distilled to 1.5B.  
This is actually a 1.5B variant of Qwen that has been fine-tuned using synethic data generated by Deepseek R1.

Other sizes of DeepSeek are [here](https://ollama.com/library/deepseek-r1) all the way up to the full 671B parameter version, which would use up 404GB of your drive and is far too large for most!

In [None]:
!ollama pull deepseek-r1:1.5b

[?25lpulling manifest â ‹ [?25h[?25l[2K[1Gpulling manifest â ™ [?25h[?25l[2K[1Gpulling manifest â ¸ [?25h[?25l[2K[1Gpulling manifest â ¸ [?25h[?25l[2K[1Gpulling manifest â ¼ [?25h[?25l[2K[1Gpulling manifest â ´ [?25h[?25l[2K[1Gpulling manifest 
pulling aabd4debf0c8...   0% â–•                â–�    0 B/1.1 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling aabd4debf0c8...   0% â–•                â–�    0 B/1.1 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling aabd4debf0c8...   0% â–•                â–�    0 B/1.1 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling aabd4debf0c8...   0% â–•                â–� 346 KB/1.1 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling aabd4debf0c8...   0% â–•                â–� 599 KB/1.1 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling aabd4debf0c8...   0% â–•                â–� 692 KB/1.1 GB     

In [None]:
# This may take a few minutes to run! You should then see a fascinating "thinking" trace inside <think> tags, followed by some decent definitions

response = ollama_via_openai.chat.completions.create(
    model="deepseek-r1:1.5b",
    messages=[{"role": "user", "content": "Please give definitions of some core concepts behind LLMs: a neural network, attention and the transformer"}]
)

print(response.choices[0].message.content)

# NOW the exercise for you

Take the code from day1 and incorporate it here, to build a website summarizer that uses Llama 3.2 running locally instead of OpenAI; use either of the above approaches.

In [None]:
# This may take a few minutes to run! You should then see a fascinating "thinking" trace inside <think> tags, followed by some decent definitions
import ollama
from IPython.display import Markdown, display


messages=[{"role": "user", "content": "Please give definitions of some core concepts behind LLMs: a neural network, attention and the transformer"}]
response = ollama.chat(model="deepseek-r1:1.5b", messages=messages)

print(response.model_dump_json())


info=response.message.content
print(info)
#info = response['message']['content'] 


display(Markdown(info))



{"model":"deepseek-r1:1.5b","created_at":"2025-02-10T01:39:28.8703468Z","done":true,"done_reason":"stop","total_duration":31365737100,"load_duration":52914500,"prompt_eval_count":23,"prompt_eval_duration":35000000,"eval_count":957,"eval_duration":31274000000,"message":{"role":"assistant","content":"<think>\nOkay, so I need to understand the definition of some core concepts behind Large Language Models (LLMs), specifically focusing on neural networks and attention mechanisms, especially the Transformer within that context. I'm not too familiar with all these terms, but I know they're pretty important in NLP and AI.\n\nFirst, let me think about what a neural network is. From what I remember, it's a type of machine learning model inspired by the structure and function of the human brain. It consists of layers of interconnected nodes or neurons, which process information through weighted connections. Each layer processes data differently, like in convolutional networks for images where eac

<think>
Okay, so I need to understand the definition of some core concepts behind Large Language Models (LLMs), specifically focusing on neural networks and attention mechanisms, especially the Transformer within that context. I'm not too familiar with all these terms, but I know they're pretty important in NLP and AI.

First, let me think about what a neural network is. From what I remember, it's a type of machine learning model inspired by the structure and function of the human brain. It consists of layers of interconnected nodes or neurons, which process information through weighted connections. Each layer processes data differently, like in convolutional networks for images where each node responds to a specific feature.

But wait, how do these neural networks translate into LLMs? I think LLMs are built on top of these neural networks but with some modifications. Maybe the traditional approach used in NLP relies heavily on neural networks because they can model complex patterns and relationships between words. So, perhaps the core concept here is that a neural network in an LLM uses layers to capture various levels of features or representations of data.

Now, attention mechanisms come into play. I've heard terms like "self-attention" before. It must have something to do with how models focus on different parts of input data. The idea being that each model cell doesn't know where the most important information is in a sequence; it just processes all tokens. This might make the model more adaptable and better at understanding varying contexts.

The Transformer specifically is a neural network architecture that stands out, especially for NLP tasks like text generation or translation. I think the key features are self-attention and position embeddings. Self-attention allows models to weigh the importance of different positions in a sequence relative to each other, enabling them to attend to different parts at once. Position embeddings add positional information about where tokens are in the model's input.

So putting it all together, the core concepts behind LLMs include neural networks as building blocks that process data through layers, attention mechanisms like self-attention which focus on relevant parts of the input, and the Transformer architecture with its self-attention mechanism and position embeddings. These components work together to enable models to understand language at a very deep level, handle context-aware information, and adaptively focus on different aspects as needed.

I should make sure I'm not mixing up some other architectures like GPT or BERT. Both are Transformers, so the core concepts of self-attention, position embeddings, and layered neural networks should be common to them. Also, LLMs use these mechanisms to predict or generate text by learning from data distributions and context.

I wonder if there's a difference between how traditional models handle context versus Transformers. Maybe traditional approaches like recurrent neural networks (RNNs) process sequences step by step but don't naturally handle long-range dependencies as effectively as Transformers can with self-attention. That must be why Transformers have become so popular in NLP tasks where contextual understanding is crucial.

In summary, the core concepts seem to revolve around using layered neural networks for processing data and employing attention mechanisms that enable models to focus on relevant parts of input sequences, especially through self-attention. The Transformer specifically excels with these features due to its self-attention and positional encoding.
</think>

**Core Concepts Behind LLMs**

1. **Neural Networks:**
   - **Inspiration:** Inspired by the structure and function of the human brain, neural networks are composed of layers of interconnected nodes or neurons.
   - **Functionality:** Process information through weighted connections within layers to capture various levels of features or representations.

2. **Attention Mechanisms:**
   - **Self-Attention:** Enables models to focus on different parts of input data by allowing each model cell to weigh the importance of different positions in a sequence relative to each other.
   - **Key Insight:** Without self-attention, each model cell would process all tokens without prior knowledge of their most relevant context.

3. **Transformer Architecture:**
   - **Self-Attention Mechanism:** Allows models to attend to different parts of input sequences at once, facilitating contextual understanding and adaptability.
   - **Position Embeddings:** Add positional information about token locations in the model's input, enhancing context awareness.

4. **Learning Context and Context-Dependent Features:**
   - Neural networks within LLMs process data through layers, while attention mechanisms enable models to focus on relevant parts of sequences. Transformers excels with self-attention and position embeddings, making them highly effective for tasks requiring contextual understanding and adaptability.

In summary, neural networks form the building blocks, attention mechanisms allow focused processing, and Transformers leverage self-attention and positional encoding to handle context efficiently in NLP tasks.