# Ollama access

Our `compose.yaml` has started Ollama on the Tailscale network (`tailnet`) and make it accessible directly on `localhost` (ie both the notebook and Ollama are exposed over Tailscale) at `http://localhost:11434` from within the running container (although the service is on a different docker IP).


## Python

To support features of this notebook with CoreAI, we need to install some libraries that are not pre-installed but are required for this notebook. 

**Create and Activate the Virtual Environment:**

Open your terminal or command prompt within the Jupyter notebook. Navigate via `File -> New -> Terminal`.
Type `bash` to access a shell compatible with the following commands.
Navigate to the project directory where you want to set up the environment (where this notebook is located):

```bash
export PROJECT_NAME="Ollama"
export PIP_CACHE_DIR=`pwd`/.cache/pip
mkdir -p $PIP_CACHE_DIR
python -m venv --system-site-packages myvenv
source myvenv/bin/activate
pip install ipykernel
python -m ipykernel install --user --name=${PROJECT_NAME}_myvenv --display-name="Python (${PROJECT_NAME}_myvenv)"
echo ""; echo "Before continuing load the created Python kernel: Python (${PROJECT_NAME}_myvenv)"
```

Load the Python kernel described above before running the cell below (it might take a few seconds for the kernel to appear in the list of kernels).

**AFTER the kernel is loaded, install the required Libraries (from `requirements.txt`)**

The rest of this notebook relies on the proper kernel to be loaded and environment variables to be set. 

In [None]:
!. ./myvenv/bin/activate && pip install -r requirements.txt

[Infotrend's CoreAI](https://github.com/Infotrend-Inc/CoreAI) is an Ubuntu 24.04 based Docker container with PyTorch, OpenCV (GPU build) and CUDA.

Being Ubuntu based, we can install components using `apt`

In [None]:
! sudo apt update && sudo apt install -y jq

## Ollama pulling a model using `docker exec`

It is possible to directly ask the container to perform operations using the `ollama` CLI.

```bash
docker exec -it oi25-coreai-ollama ollama pull llama3.1:8b
```

## Ollama access from a shell: using `curl`

Details on how to use the API using is available at https://github.com/ollama/ollama/blob/main/docs/api.md

In [None]:
# Pull a model
! curl http://localhost:11434/api/pull -d '{ "model": "llama3.1:8b" }'

In [None]:
! curl http://localhost:11434/api/pull -d '{ "model": "gpt-oss:20b" }'

In [None]:
# Check list of downloaded model(s)
! curl -X GET http://localhost:11434/api/tags | jq '.'

In [None]:
# Show Model specific information
! curl http://localhost:11434/api/show -d '{ "model": "llama3.1:8b" }' | jq '.model_info'
! curl http://localhost:11434/api/show -d '{ "model": "gpt-oss:20b" }' | jq '.model_info'

In [None]:
%%bash

## Asking a question of the model
# - we will disable streaming to get a sentence (versus word per word)
# - we will set the `seed` and use a `temperature` of `0`, ie when asking the same question of the same model we will get the same answer


curl http://localhost:11434/api/chat -d '{
  "model": "llama3.1:8b",
  "messages": [
    {
      "role": "user",
      "content": "What is OpenStack"
    }
  ],
  "options": {
    "seed": 101,
    "temperature": 0
  },
  "stream": false
}'

In [None]:
%%bash

## Asking a question of the model
# - we will disable streaming to get a sentence (versus word per word)
# - we will set the `seed` and use a `temperature` of `0`, ie when asking the same question of the same model we will get the same answer


curl http://localhost:11434/api/chat -d '{
  "model": "gpt-oss:20b",
  "messages": [
    {
      "role": "user",
      "content": "What is OpenStack"
    }
  ],
  "options": {
    "seed": 101,
    "temperature": 0
  },
  "stream": false
}'

# Ollama access with Python

### OpenAI API

We are using Ollama's OpenAI compatiblity to access the installed model. For more details, see https://github.com/ollama/ollama/blob/main/docs/openai.md

In [None]:
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama' # required but ignored
)

response = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'What is the OpenInfra foundation',
        }
    ],
    model='llama3.1:8b'
)

print(response.choices[0].message.content)

In [None]:
response = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'What is the OpenInfra foundation',
        }
    ],
    model='gpt-oss:20b'
)

print(response.choices[0].message.content)

### Ollama API

https://github.com/ollama/ollama-python

In [None]:
from ollama import chat

stream = chat(
    model='gpt-oss:20b',
    messages=[{'role': 'user', 'content': 'Why is the OpenStack Scientific SIG?'}],
    stream=True,
)

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)