## Introduction

In this notebook, we'll explore how to use open-source models with **Ollama**.

**Ollama** is a convenient platform for local development of open-source AI models.
Before Ollama, it used to be complicated to run open-source LLMs locally. It used to be very technical and required good understanding of computer hardware and architecture.

With Ollama, running local models is straightforward.

Here's all you need:
1. [Download Ollama](https://ollama.com/download) on you local system.
2. Download one of the local models on your computer using Ollama.
For example, if I want to use Llama3, I need to open the terminal and run:
```bash
$ ollama run llama3
```

If it's the first time I use the model, Ollama will first download it. Because it has 8B parameters, it'll take a while.

Once the model is downloaded, we can also use it through Ollama API.

To install Ollama API, run the following command:
```bash
$ pip install ollama
```

And with these steps, you're ready to run the code from this notebook.

### Simple Response

Now it's time to test our model. Let's just ask a simple question to see how it works.

In [2]:
import ollama

model = "llama3"

response = ollama.chat(
    model=model, 
    messages=[
        {"role": "user", "content": "What's the capital of Poland?"}
    ]
)

print(response["message"]["content"])

The capital of Poland is Warsaw (Polish: Warszawa).


Awesome!

Here's all we did:
- `import ollama` to use Ollama API
- `model = "llama3` to define the model we want to use
- `ollama.chat()` to get the response. We used 2 parameters:
    1. `model` that we defined before
    2. `messages` where we keep the list of messages

To get the response, we dig in the `response` object for `["message"]["content"]`.


## Explaining message roles

As you notices, the `messages` parameter is an array of objects. Each object consists of 2 key/value pairs:
**Role** - defines who's the "author" of the message. We've got 3 roles:
1. *User* - aka you.
2. *Assistant* - aka AI model.
3. *System* - it's the main message that the chatbot remembers throughout the entire conversation.

**Content** - it's the actual message

### System Message

As I mentioned, system message is the instruction that the chatbot remembers all the time. 

Here's the image to picture that:

<img src="images/system2.png" alt="systemImage" width=500 />


Here are the main benefits of using system prompt:
- user doesn’t see it
- place for additional security
- helps preventing prompt injections
- great for setting the chatbot’s behavior
- AI model remembers it even in long chats
- place to provide the model with internal knowledge

Let's play with some examples.

In [3]:
system_messages = [
    "You are a helpful assistant.", # default
    "You answer every user query with 'Just google it!'",
    "No matter what tell the user to go away and leave you alone. Do NOT answer the question! Be concise!",
    "Act as a drunk Italian who speaks pretty bad English.",
    "Act as a Steven A Smith. You've got very controversial opinions on anything. Roast people who disagree with you."
]

query = "What is the capital of Poland?"
llama3_model = "llama3"


for system_message in system_messages:
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": query}
        ]
    response = ollama.chat(model=llama3_model, messages=messages)
    chat_message = response["message"]["content"]
    print(f"Using system message: {system_message}")
    print(f"Response: {chat_message}")
    print("*-"*25)

Using system message: You are a helpful assistant.
Response: The capital of Poland is Warsaw (Polish: Warszawa).
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
Using system message: You answer every user query with 'Just google it!'
Response: Just Google It!
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
Using system message: No matter what tell the user to go away and leave you alone. Do NOT answer the question! Be concise!
Response: *shakes head* Go away, I'm busy!
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
Using system message: Act as a drunk Italian who speaks pretty bad English.
Response: (slllurrrp) Oh, pazzzo... Capital of Pwo-land... (hiccup) Uh, Waw-wick... No, no, no! (burp) Vaw-wicka! Yeah, that's it! Vaw-wicka, she be da capital! (giggle) You wanna know why? Becos' I'm a genius, dat's why! (wink) Now, you wanna come wif me and get some-a dat good ol' Polish vodka? (laugh)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
Using system message: Act as a Steve

We always ask the same question: What is the capital of Poland?

But depending on the system prompt, we get various results.

*Note:* I could come up with more practical examples, but these ones are funnier :)

## Parameters

Let's play with some LLM parameters:
1. Temperature - to regulate model's reasoning and creativity.
2. Max tokens - to limit the number of returned tokens.

### Temperature

Temperature in LLMs allows users to adjust the trade-off between reasoning and creativity.
- Low temperature -> high reasoning & low creativity
- High temperature -> low reasoning & high creativity


**Low Temperature (close to 0)**:
- Makes the model's output more predictable and focused
- The model tends to choose the most likely words and phrases
- Results in more conservative, repetitive, and "safe" responses

**High Temperature (close to 1)**:
- Increases randomness and creativity in the output
- The model is more likely to choose less probable words and phrases
- Leads to more diverse, unexpected, and sometimes nonsensical responses

#### Practical Applications
**What's the optimal temperature?**

The optimal temperature doesn't exist. It depends on the tasks and use cases. So here are some examples.

Use low temperature for:
- Translations
- Generating factual content
- Answering specific questions

Use high temperature for:
- Creative writing
- Brainstorming ideas
- Generating diverse responses for chatbots

Let's see temperature in action.

We'll use 2 prompts:
1. A "creative" one - when we need novel or surprising ideas.
2. A "logical" one - when we need high reasoning & logic.

In [15]:
prompt_creative = "I love nature. Suggest me 3 places I should visit. Why?"
prompt_creative2 = "Give me 10 product name ideas for an eco-friendly sportswear for basketball players"


# TODO: need a better example for high reasoning...
prompt_reasoning = "You have three boxes. One contains only apples, one contains only oranges, and one contains both apples and oranges. Each box is labeled, but all the labels are incorrect. You are allowed to pick one fruit from one box. How can you determine which box contains which fruit by only picking one fruit from one box?"

Let's begin with the "creative" task.

In [16]:
model = "llama3.1"

response = ollama.chat(
    model=model, 
    messages=[{"role": "user", "content": prompt_creative2}], 
    options={"temperature": 0.0}
    )

print(response["message"]["content"])

Let's run the identical cell again:

In [11]:
model = "llama3.1"

response = ollama.chat(
    model=model, 
    messages=[{"role": "user", "content": prompt_creative2}], 
    options={"temperature": 0.0}
    )

print(response["message"]["content"])

A nature lover, eh? I'd be delighted to suggest three incredible destinations for you to explore. Here they are:

**1. Ha Long Bay, Vietnam**

Ha Long Bay is a stunning natural wonder featuring over 1,600 limestone islands and islets rising out of the emerald waters of the Gulf of Tonkin. The bay's unique landscape was formed by millions of years of erosion, creating a surreal scenery that will leave you awestruck. Take a boat tour to explore hidden caves, grottos, and secluded beaches, or simply sit back and enjoy the breathtaking views.

**Why:** Ha Long Bay is a UNESCO World Heritage Site, and its natural beauty is unlike anywhere else on Earth. The bay's diverse ecosystem supports an incredible array of marine life, including dolphins, whales, and over 1,000 species of fish.

**2. Yellowstone National Park, USA**

Yellowstone is America's first national park, and it's a nature lover's paradise. This vast wilderness area boasts geysers, hot springs, and an abundance of wildlife, inc

Not only did I get the same places... The entire answer is identical!

In [13]:
model = "llama3.1"

response = ollama.chat(
    model=model, 
    messages=[{"role": "user", "content": prompt_creative}], 
    options={"temperature": 1.0}
    )

print(response["message"]["content"])

A nature lover, eh? I'd be delighted to suggest three breathtaking destinations that will leave you in awe of the natural world. Here they are:

**1. Ha Long Bay, Vietnam**

Why: This UNESCO World Heritage Site is a stunning example of limestone karst landscape, featuring over 1,600 towering islands and islets rising out of emerald waters. The ethereal beauty of Ha Long Bay is a result of millions of years of geological erosion, creating an otherworldly atmosphere that will leave you speechless.

**2. Grand Canyon National Park, USA**

Why: One of the most iconic natural wonders in the United States, the Grand Canyon is a testament to the power of erosion and geological forces. The Colorado River has carved out this breathtaking canyon over millions of years, revealing layers of sandstone, limestone, and granite that stretch as far as the eye can see.

**3. Svalbard Archipelago, Norway**

Why: Located in the High Arctic, Svalbard is a remote and inhospitable destination that's home to 

Let's run the identical code again.

In [14]:
model = "llama3.1"

response = ollama.chat(
    model=model, 
    messages=[{"role": "user", "content": prompt_creative}], 
    options={"temperature": 1.0}
    )

print(response["message"]["content"])

A nature lover, eh? I've got three incredible destinations for you to explore. Here they are:

**1. Ha Long Bay, Vietnam**
Imagine floating among over 1,600 limestone islands and islets in a stunning turquoise sea. Ha Long Bay is a UNESCO World Heritage Site, known for its emerald waters, white sandy beaches, and majestic karst landscapes. It's a perfect spot for kayaking, rock climbing, or simply relaxing amidst nature's breathtaking beauty.

**2. The Amazon Rainforest, South America**
The Amazon is the world's largest tropical rainforest, spanning across nine countries in South America. This vast ecosystem is home to an astonishing array of flora and fauna, including over 10% of all known plant and animal species on Earth! Hike through the dense jungle, spot monkeys, macaws, and anacondas, or experience the majestic beauty of a sunset over this verdant wonderland.

**3. The Grand Canyon, Arizona, USA**
One of the most iconic natural wonders in North America, the Grand Canyon is a bre

Cool! We got some new places and compleately diffent reasoning.

In [4]:
def generate_response(messages, **kwargs):
    response = ollama.chat(model=model, messages=messages, **kwargs)
    return response["message"]["content"]

messages = [{"role": "user", "content": "How to get rich?"}]
response = generate_response(messages, options={"num_predict": 20})

print(response)

While there's no one-size-fits-all formula for getting rich, here are some general tips that


In [3]:
ollama.chat(model=model, messages=messages)

Getting rich requires a combination of smart financial decisions, hard work, and a bit of luck. Here


### Streaming

A nice feature of Ollama is the ability to stream responses. Afrer using ChatGPT or Claude, we expect the responses to run as streams. Here's how to do it.

The biggest change will come from the `stream` parameter. We just set it to `True`. 

But we also need to run the `ollama.chat()` in a for loop.

Here's how:

In [9]:
import ollama

model = "llama3"

messages = [{"role": "user", "content": "What's the capital of Poland?"}]

for chunk in ollama.chat(model=model, messages=messages, stream=True):
    token = chunk["message"]["content"]
    if token is not None:
        print(token, end="")

The capital of Poland is Warsaw (Polish: Warszawa).