### Ollama - Chat
This notebook demonstrates a range of language models available via Ollama (https://github.com/ollama/ollama).
Ollama spins up a service that downloads and runs models for you via CLI commands or a dedicated REST API (meaning it can be easily integrated into any web app). Hence, this notebook does not use any dependencies except standard python stuff (HTTP Requests).

Which language models can be used depends on their weights' size, since they have to be loaded exhaustively into the system memory. In our case, a Raspberry Pi 5 with 8GB of RAM provides roughly 7GB of memory available to an AI Model. This makes many models with 7bn parameters work - yet not all of them, since their weights' sizes (I came across) vary between 4-15GB.

### Setup
See the repo's tutorial to learn how to spin up the Ollama service

### Available Models
To see, what models are installed on the Raspi, run the cell in the Configuration section of this Notebook

Browse models available for ollama:
https://ollama.com/search

Tested models are:
- `phi3` (3.8b)
- `deepseek-r1` (7b)
- `llama3.2`(3.8b)
- `codellama` (7b)
- `moondream`(1.8b, multi-modal)

--------------------------------------------

### Chat

#### Configuration

In [None]:
# run this cell first
import requests
import lib

url = "http://127.0.0.1:11434/api"

model = "deepseek-r1" # define model here

stream = True # set False if you want to wait for the final result to be finished
keep_alive = True # set True to improve speed on consecutive prompts to the same model
# but you have to wait 5 minutes until Ollama clears its memory automatically, or manually run:

In [None]:
# Check Info about Model on Ollama server / whether it is installed
lib.check_model(url, model)

In [None]:
# List locally installed models
lib.list_models(url)

##### Demo 1: Chat without chat history

In [None]:
data = {
    "model": model,
    "prompt": "who won the 2022 Champions League in soccer?"
}
result = requests.post(f"{url}/generate", json=data, stream=stream)

response_to_prompt = lib.extract_response_stream(result) if stream else lib.extract_responses(result)

if not keep_alive:
    lib.unload_model(f"{url}/generate", model)

##### Demo 2: Chat with chat history

In [None]:
# run this once at the beginning of a chat
messages = []

In [None]:
# run this for every new chat message you want to send

# put every new prompt in here
prompt = "who was playing on the field?"

# record prompt
messages.append(lib.to_message("user", prompt))

data = {
    "model": model,
    "messages": messages,
}

# dispatch generation request
result = requests.post(f"{url}/chat", json=data, stream=stream)

response_to_prompt = lib.extract_response_stream(result, history=True) if stream else lib.extract_responses(result, history=True)

#record the ai's answer
messages.append(lib.to_message(
    "assistant", 
    lib.without_chain_of_thought(response_to_prompt, model)
))

if not keep_alive:
    lib.unload_model(f"{url}/generate", model)