# OLLAMA - REST API Approach

> This page was generated from [ollama-interactive-inference/ollama-sif-api-ibex.ipynb](https://github.com/kaust-rccl/Data-science-onboarding/tree/main/notebooks/inference/ollama-interactive-inference/ollama-sif-api-ibex.ipynb). You can [view or Download notebook](https://github.com/kaust-rccl/Data-science-onboarding/tree/main/notebooks/inference/ollama-interactive-inference/ollama-sif-api-ibex.ipynb). Or [view it on nbviewer](https://nbviewer.org/github/kaust-rccl/Data-science-onboarding/tree/main/notebooks/inference/ollama-interactive-inference/ollama-sif-api-ibex.ipynb)

## Objective
In this notebook, we are going to use Ollama using REST API approach 

## Initial Setup
If you haven't installed conda yet, please follow :ref:`conda_ibex_` to get started.

After conda has been installed, save the following environment yaml file on Ibex under the name ``ollama_env.yaml``

```yml
name: ollama_env
channels:
- conda-forge
dependencies:
- _libgcc_mutex=0.1
- _openmp_mutex=4.5
- bzip2=1.0.8
- ca-certificates=2025.7.14
- icu=75.1
- ld_impl_linux-64=2.44
- libexpat=2.7.1
- libffi=3.4.6
- libgcc=15.1.0
- libgcc-ng=15.1.0
- libgomp=15.1.0
- liblzma=5.8.1
- libmpdec=4.0.0
- libsqlite=3.50.3
- libstdcxx=15.1.0
- libstdcxx-ng=15.1.0
- libuuid=2.38.1
- libzlib=1.3.1
- ncurses=6.5
- openssl=3.5.1
- pip=25.1.1
- python=3.13.5
- python_abi=3.13
- readline=8.2
- tk=8.6.13
- tzdata=2025b
- pip:
    - annotated-types==0.7.0
    - anyio==4.9.0
    - argon2-cffi==25.1.0
    - argon2-cffi-bindings==21.2.0
    - arrow==1.3.0
    - asttokens==3.0.0
    - async-lru==2.0.5
    - attrs==25.3.0
    - babel==2.17.0
    - beautifulsoup4==4.13.4
    - bleach==6.2.0
    - certifi==2025.7.14
    - cffi==1.17.1
    - charset-normalizer==3.4.2
    - comm==0.2.2
    - debugpy==1.8.15
    - decorator==5.2.1
    - defusedxml==0.7.1
    - executing==2.2.0
    - fastjsonschema==2.21.1
    - fqdn==1.5.1
    - h11==0.16.0
    - httpcore==1.0.9
    - httpx==0.28.1
    - idna==3.10
    - ipykernel==6.30.0
    - ipython==9.4.0
    - ipython-pygments-lexers==1.1.1
    - ipywidgets==8.1.7
    - isoduration==20.11.0
    - jedi==0.19.2
    - jinja2==3.1.6
    - json5==0.12.0
    - jsonpointer==3.0.0
    - jsonschema==4.25.0
    - jsonschema-specifications==2025.4.1
    - jupyter==1.1.1
    - jupyter-client==8.6.3
    - jupyter-console==6.6.3
    - jupyter-core==5.8.1
    - jupyter-events==0.12.0
    - jupyter-lsp==2.2.6
    - jupyter-server==2.16.0
    - jupyter-server-terminals==0.5.3
    - jupyterlab==4.4.5
    - jupyterlab-pygments==0.3.0
    - jupyterlab-server==2.27.3
    - jupyterlab-widgets==3.0.15
    - lark==1.2.2
    - markupsafe==3.0.2
    - matplotlib-inline==0.1.7
    - mistune==3.1.3
    - nbclient==0.10.2
    - nbconvert==7.16.6
    - nbformat==5.10.4
    - nest-asyncio==1.6.0
    - notebook==7.4.4
    - notebook-shim==0.2.4
    - ollama==0.5.1
    - overrides==7.7.0
    - packaging==25.0
    - pandocfilters==1.5.1
    - parso==0.8.4
    - pexpect==4.9.0
    - platformdirs==4.3.8
    - prometheus-client==0.22.1
    - prompt-toolkit==3.0.51
    - psutil==7.0.0
    - ptyprocess==0.7.0
    - pure-eval==0.2.3
    - pycparser==2.22
    - pydantic==2.11.7
    - pydantic-core==2.33.2
    - pygments==2.19.2
    - python-dateutil==2.9.0.post0
    - python-json-logger==3.3.0
    - pyyaml==6.0.2
    - pyzmq==27.0.0
    - referencing==0.36.2
    - requests==2.32.4
    - rfc3339-validator==0.1.4
    - rfc3986-validator==0.1.1
    - rfc3987-syntax==1.1.0
    - rpds-py==0.26.0
    - send2trash==1.8.3
    - setuptools==80.9.0
    - six==1.17.0
    - sniffio==1.3.1
    - soupsieve==2.7
    - stack-data==0.6.3
    - terminado==0.18.1
    - tinycss2==1.4.0
    - tornado==6.5.1
    - traitlets==5.14.3
    - types-python-dateutil==2.9.0.20250708
    - typing-extensions==4.14.1
    - typing-inspection==0.4.1
    - uri-template==1.3.0
    - urllib3==2.5.0
    - wcwidth==0.2.13
    - webcolors==24.11.1
    - webencodings==0.5.1
    - websocket-client==1.8.0
    - widgetsnbextension==4.0.14
```

Run the following command to build the conda environment:
```bash
conda env create -f ollama_env.yaml
```

## Starting JupyterLab
Follow [`using_jupyter`](../../jupyter) to start JupyterLab on a an Ibex GPU node Using your conda environment instead of ``machine_learning`` module.
By making the following changes to the Jupyter launch script.
```bash
#module load machine_learning/2024.01
conda activate ollama_en
```

## Starting The Ollama Server
Start the OLLAMA REST API server using the following bash script in a terminal:
```bash
#!/bin/bash

# Cleanup process while exiting the server
cleanup() {
    echo "🧹   Cleaning up before exit..."
    # Put your exit commands here, e.g.:
    rm -f $OLLAMA_PORT_TXT_FILE
    # Remove the Singularity instance
    singularity instance stop $SINGULARITY_INSTANCE_NAME
}
trap cleanup SIGINT  # Catch Ctrl+C (SIGINT) and run cleanup
#trap cleanup EXIT    # Also run on any script exit

# User Editable Section
# 1. Make target directory on /ibex/user/$USER/ollama_models_scratch to store your Ollama models
export OLLAMA_MODELS_SCRATCH=/ibex/user/$USER/ollama_models_scratch
mkdir -p $OLLAMA_MODELS_SCRATCH
# End of User Editable Section

SINGULARITY_INSTANCE_NAME="ollama"
OLLAMA_PORT_TXT_FILE='ollama_port.txt'

# 2. Load Singularity module
module load singularity

# 3. Pull OLLAMA docker image
singularity pull docker://ollama/ollama

# 4. Change the default port for OLLAMA_HOST: (default 127.0.0.1:11434)
export PORT=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')

# 5. Copy the assigned port, it will be required in the second part during working on the notebook.
echo "$PORT" > $OLLAMA_PORT_TXT_FILE

echo "OLLAMA PORT: $PORT  -- Stored in $OLLAMA_PORT_TXT_FILE"

# 6. Define the OLLAMA Host
export SINGULARITYENV_OLLAMA_HOST=127.0.0.1:$PORT

# 7. Change the default model directory stored: (default ~/.ollama/models/manifests/registry.ollama.ai/library)
export SINGULARITYENV_OLLAMA_MODELS=$OLLAMA_MODELS_SCRATCH

# 8. Create an Instance:
singularity instance start --nv -B "/ibex/user:/ibex/user" ollama_latest.sif $SINGULARITY_INSTANCE_NAME

# 7. Run the OLLAMA REST API server on the background
singularity exec instance://$SINGULARITY_INSTANCE_NAME bash -c "ollama serve"
```

> Note: Save the above script in a file called start_ollama_server.sh

```bash
# Run the script to start the Ollama server.
bash start_ollama_server.sh
```

The script has the following:
- A user editable section, where the user defines [Ollama models scratch directory].
- The allocated port is saved in a temporary ollama_port.txt file, in order to be used in the Python notebook to read the assigned port to Ollama server.
- Cleanup section in order to stop the singularity instance when the script is terminated with CTRL+C.

## Using REST API Requests
Follow the following Python notebook below, it contains the codes for [Testing connection to the Ollama server, List local models, Pull models, Chat with the models]:


### Initialization

In [None]:
# Configuration
with open("ollama_port.txt") as f :
    PORT = f.read().strip()
    
BASE_URL=f"http://127.0.0.1:{PORT}"
print(BASE_URL)

http://127.0.0.1:45855


In [2]:
# Testing the server connectivity
import requests

try:
    r = requests.get(BASE_URL)
    print("Ollama is running!", r.status_code)
except requests.ConnectionError as e:
    print("Ollama is NOT reachable:", e)

Ollama is running! 200


### Get a List of Local Models

In [7]:
# Get a list of downloaded models
def list_local_models(base_url=BASE_URL):
    r = requests.get(f"{base_url}/api/tags")
    if r.ok:
        models = r.json().get("models", [])
        return [m["name"] for m in models]
    else:
        raise RuntimeError(f"Failed to list models: {r.text}")

list_local_models()

['qwen3:latest', 'llama3:latest', 'gemma3:latest', 'deepseek-r1:1.5b']

### Pull The Model

In [6]:
# Pull the required model
# You can check the available models in: https://ollama.com/library
import requests

def pull_model(model_name, base_url=BASE_URL):
    url = f"{base_url}/api/pull"
    response = requests.post(url, json={"name": model_name}, stream=True)

    if response.status_code != 200:
        print("❌ Failed to pull model:", response.text)
        return

    for line in response.iter_lines():
        if line:
            decoded = line.decode("utf-8")
            print(decoded)

# Usage
pull_model("qwen3")


{"status":"pulling manifest"}
{"status":"pulling a3de86cd1c13","digest":"sha256:a3de86cd1c132c822487ededd47a324c50491393e6565cd14bafa40d0b8e686f","total":5225374496}
{"status":"pulling a3de86cd1c13","digest":"sha256:a3de86cd1c132c822487ededd47a324c50491393e6565cd14bafa40d0b8e686f","total":5225374496}
{"status":"pulling a3de86cd1c13","digest":"sha256:a3de86cd1c132c822487ededd47a324c50491393e6565cd14bafa40d0b8e686f","total":5225374496}
{"status":"pulling a3de86cd1c13","digest":"sha256:a3de86cd1c132c822487ededd47a324c50491393e6565cd14bafa40d0b8e686f","total":5225374496}
{"status":"pulling a3de86cd1c13","digest":"sha256:a3de86cd1c132c822487ededd47a324c50491393e6565cd14bafa40d0b8e686f","total":5225374496}
{"status":"pulling a3de86cd1c13","digest":"sha256:a3de86cd1c132c822487ededd47a324c50491393e6565cd14bafa40d0b8e686f","total":5225374496}
{"status":"pulling a3de86cd1c13","digest":"sha256:a3de86cd1c132c822487ededd47a324c50491393e6565cd14bafa40d0b8e686f","total":5225374496}
{"status":"pulling

### Running a sample query for testing the server connection

In [3]:
import requests
import json

# Testing sample query 
r = requests.post(
    f"{BASE_URL}/api/chat",
    json={
        "model": "deepseek-r1:1.5b",
        "messages": [{"role": "user", "content": "How old are you"}]
    },
    stream=True
)

for line in r.iter_lines():
    if line:
        data = json.loads(line.decode('utf-8'))
        if "message" in data:
            print(data["message"]["content"], end="", flush=True)


<think>

</think>

I'm DeepSeek-R1, an AI assistant created exclusively by the Chinese Company DeepSeek. I'll do my best to help you. How can I assist you today?

### Quering the Model

In [10]:
import requests
import json


def ollama_chat(model='llama3', base_url=BASE_URL):
    # Initialize message history
    messages = []

    print("🤖 Chat started — type 'exit' to quit.\n")
    
    while True:
        user_input = input("👤 You: ")
        if user_input.lower() == 'exit':
            print("👋 Goodbye!")
            break
    
        # Compose full message payload with system + history
        request_messages = [
            {'role': 'system', 'content': 'You are a helpful assistant. You only give a short sentence by answer.'}
        ] + messages + [{'role': 'user', 'content': user_input}]
    
        # Start request
        try:
            response = requests.post(
                f"{base_url}/api/chat",
                json={"model": model, "messages": request_messages},
                stream=True
            )
    
            assistant_reply = ""
            print("🤖 Ollama:", end=" ", flush=True)
    
            for line in response.iter_lines():
                if line:
                    data = json.loads(line.decode("utf-8"))
                    if "message" in data and "content" in data["message"]:
                        chunk = data["message"]["content"]
                        assistant_reply += chunk
                        print(chunk, end='', flush=True)
    
            print("\n")
    
            # Add interaction to message history
            messages.append({'role': 'user', 'content': user_input})
            messages.append({'role': 'assistant', 'content': assistant_reply})
    
        except Exception as e:
            print("\n⚠️ Error:", e)

ollama_chat(model='qwen3')

🤖 Chat started — type 'exit' to quit.



👤 You:  hello


🤖 Ollama: <think>
Okay, the user said "hello". I need to respond with a short sentence. Let me make sure to keep it friendly and concise. Maybe something like "Hello! How can I assist you today?" That sounds good. It's polite and opens the door for them to ask for help. I should stick to that.
</think>

Hello! How can I assist you today?



👤 You:  what model is you?


🤖 Ollama: <think>
Okay, the user asked, "what model is you?" I need to respond in a short sentence. Let me check the previous conversation. The user said "hello" and I replied with a greeting. Now they're asking about my model.

I should mention that I'm a large language model developed by Alibaba Cloud. But wait, the user might be asking for the specific model name, like Qwen. However, the initial instruction says to keep answers short. Maybe just state that I'm a large language model from Alibaba Cloud. Let me confirm the exact name. The model is called Qwen, but maybe the user wants the full name. Alternatively, if the user is asking for the model type, like whether I'm a transformer-based model, but the question is more about the specific model name. 

Wait, the user's question is "what model is you?" which is a bit unclear. They might be asking for the model's name. Since the user is asking about the model, I should provide the name. But the initial instruction says to keep answer

👤 You:  exit


👋 Goodbye!
