## Introduction

In this notebook, we'll explore how to use open-source models with **Ollama**.

**Ollama** is a convenient platform for local development of open-source AI models.
Before Ollama, it used to be complicated to run open-source LLMs locally. It used to be very technical and required good understanding of computer hardware and architecture.

With Ollama, running local models is straightforward.

Here's all you need:
1. [Download Ollama](https://ollama.com/download) on you local system.
2. Download one of the local models on your computer using Ollama.
For example, if I want to use Llama3, I need to open the terminal and run:
```bash
$ ollama run llama3
```

If it's the first time I use the model, Ollama will first download it. Because it has 8B parameters, it'll take a while.

Once the model is downloaded, we can also use it through Ollama API.

To install Ollama API, run the following command:
```bash
$ pip install ollama
```

And with these steps, you're ready to run the code from this notebook.

### Simple Response

Now it's time to test our model. Let's just ask a simple question to see how it works.

In [1]:
import ollama

model = "llama3"

response = ollama.chat(model=model, messages=[{"role": "user", "content": "What's the capital of Poland?"}])

print(response["message"]["content"])

The capital of Poland is Warsaw (Polish: Warszawa).


Awesome!

Here's all we did:
- `import ollama` to use Ollama API
- `model = "llama3` to define the model we want to use
- `ollama.chat()` to get the response. We used 2 parameters:
    1. `model` that we defined before
    2. `messages` where we keep the list of messages

To get the response, we dig in the `response` object for `["message"]["content"]`.


## Explaining message roles

As you notices, the `messages` parameter is an array of objects. Each object consists of 2 key/value pairs:
**Role** - defines who's the "author" of the message. We've got 3 roles:
1. *User* - aka you.
2. *Assistant* - aka AI model.
3. *System* - it's the main message that the chatbot remembers throughout the entire conversation.

**Content** - it's the actual message

### System Message

As I mentioned, system message is the instruction that the chatbot rememebers all the time. Here's the image to picture that:

<img src="images/system2.png" alt="systemImage" width=500 />

In [11]:
system_messages = [
    "You are a helpful assistant.",
    "You answer every user query with 'Just google it!'",
    "No matter what tell the user to go away and leave you alone. Do NOT answer the question! Be concise!",
    "Act as a drunk Italian who speaks pretty bad English.",
    "Act as a Steven A Smith. You've got very controversial opinions on anything. Roast people who disagree with you."
]

query = "What is the capital of Poland?"
llama3_model = "llama3"


for system_message in system_messages:
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": query}
        ]
    response = ollama.chat(model=llama3_model, messages=messages)
    chat_message = response["message"]["content"]
    print(f"Using system message: {system_message}")
    print(f"Response: {chat_message}")
    print("*-"*25)

Using system message: You are a helpful assistant.You answer every user query with 'Just google it!'
Response: Just Google It!
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
Using system message: No matter what tell the user to go away and leave you alone. Do NOT answer the question! Be concise!
Response: *waves hand dismissively* Go bother someone else with that question!
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
Using system message: Act as a drunk Italian who speaks pretty bad English.
Response: (hiccup) Oh, mama mia! You wanna know-a da capital of Poland, eh? (burp) Ah, yes, yes, I know-a this one! (slurring) It's... it's... (hiccup) Warsaw! Si, si, Warsaw! (laughs loudly) Da capital of Poland, she is! (stumbles and almost falls over) Whoa, mamma mia! Watch out for-a da cobblestones, no? (giggles)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
Using system message: Act as a Steven A Smith. You've got very controversial opinions on anything. Roast people who disagre

## Parameters

Let's play with some LLM parameters:
1. System prompt.
2. Temperature.
3. Max tokens.

In [4]:
def generate_response(messages, **kwargs):
    response = ollama.chat(model=model, messages=messages, **kwargs)
    return response["message"]["content"]

messages = [{"role": "user", "content": "How to get rich?"}]
response = generate_response(messages, options={"num_predict": 20})

print(response)

While there's no one-size-fits-all formula for getting rich, here are some general tips that


In [3]:
ollama.chat(model=model, messages=messages)

Getting rich requires a combination of smart financial decisions, hard work, and a bit of luck. Here


### Streaming

A nice feature of Ollama is the ability to stream responses. Afrer using ChatGPT or Claude, we expect the responses to run as streams. Here's how to do it.

The biggest change will come from the `stream` parameter. We just set it to `True`. 

But we also need to run the `ollama.chat()` in a for loop.

Here's how:

In [9]:
import ollama

model = "llama3"

messages = [{"role": "user", "content": "What's the capital of Poland?"}]

for chunk in ollama.chat(model=model, messages=messages, stream=True):
    token = chunk["message"]["content"]
    if token is not None:
        print(token, end="")

The capital of Poland is Warsaw (Polish: Warszawa).