## Introduction

In this notebook, we'll explore how to use open-source models with **Ollama**.

**Ollama** is a convenient platform for local development of open-source AI models.
Before Ollama, it used to be complicated to run open-source LLMs locally. It used to be very technical and required good understanding of computer hardware and architecture.

With Ollama, running local models is straightforward.

Here's all you need:
1. [Download Ollama](https://ollama.com/download) on you local system.
2. Download one of the local models on your computer using Ollama.
For example, if I want to use Llama3, I need to open the terminal and run:
```bash
$ ollama run llama3
```

If it's the first time I use the model, Ollama will first download it. Because it has 8B parameters, it'll take a while.

Once the model is downloaded, we can also use it through Ollama API.

To install Ollama API, run the following command:
```bash
$ pip install ollama
```

And with these steps, you're ready to run the code from this notebook.

### Simple Response

Now it's time to test our model. Let's just ask a simple question to see how it works.

In [1]:
import ollama

model = "llama3"

response = ollama.chat(model=model, messages=[{"role": "user", "content": "What's the capital of Poland?"}])

print(response["message"]["content"])

The capital of Poland is Warsaw (Polish: Warszawa).


Awesome!

Here's all we did:
- `import ollama` to use Ollama API
- `model = "llama3` to define the model we want to use
- `ollama.chat()` to get the response. We used 2 parameters:
    1. `model` that we defined before
    2. `messages` where we keep the list of messages



## Parameters

Let's play with some LLM parameters:
1. System prompt.
2. Temperature.
3. Max tokens.


In [4]:
def generate_response(messages, **kwargs):
    response = ollama.chat(model=model, messages=messages, **kwargs)
    return response["message"]["content"]

messages = [{"role": "user", "content": "How to get rich?"}]
response = generate_response(messages, options={"num_predict": 20})

print(response)

While there's no one-size-fits-all formula for getting rich, here are some general tips that


In [3]:
ollama.chat(model=model, messages=messages)

Getting rich requires a combination of smart financial decisions, hard work, and a bit of luck. Here


### Streaming

A nice feature of Ollama is the ability to stream responses. Afrer using ChatGPT or Claude, we expect the responses to run as streams. Here's how to do it.

The biggest change will come from the `stream` parameter. We just set it to `True`. 

But we also need to run the `ollama.chat()` in a for loop.

Here's how:

In [8]:
import ollama

model = "llama3"

messages = [{"role": "user", "content": "What's the capital of Poland?"}]

for chunk in ollama.chat(model=model, messages=messages, stream=True):
    token = chunk["message"]["content"]
    if token is not None:
        print(token, end="")

The
 capital
 of
 Poland
 is
 Warsaw
 (
Pol
ish
:
 Wars
z
awa
).

