# Interacting with Ollama Locally Using Python

This notebook demonstrates how to set up and interact with a local Ollama instance using Python. It covers two primary methods: using the OpenAI-compatible API endpoint and using the native `ollama` Python package.

## Table of Contents

- [1. Setup](#1.-Setup)
  - [1.1. Verify Ollama Installation](#1.1.-Verify-Ollama-Installation)
  - [1.2. Install Python Dependencies](#1.2.-Install-Python-Dependencies)
- [2. Interacting with Ollama](#2.-Interacting-with-Ollama)
  - [2.1. Using the OpenAI-Compatible Endpoint](#2.1.-Using-the-OpenAI-Compatible-Endpoint)
  - [2.2. Using the `ollama` Python Package](#2.2.-Using-the-ollama-Python-Package)
    - [Basic Chat](#Basic-Chat)
    - [Streaming Responses](#Streaming-Responses)
    - [Handling Streamed Chunks](#Handling-Streamed-Chunks)

## 1. Setup

### 1.1. Verify Ollama Installation

First, ensure that Ollama is installed and running on your local machine. You can verify this by listing the available models from the command line.

In [None]:
! ollama list

### 1.2. Install Python Dependencies

To interact with Ollama, you'll need to install the necessary Python libraries. This notebook uses Conda for environment management.

#### OpenAI Library
The `openai` library allows you to connect to Ollama's OpenAI-compatible endpoint.

In [None]:
! conda install --name hw-1-submission openai -y

#### Ollama Library
For a more direct approach, you can use the native `ollama` Python package. Since it's available on `conda-forge`, you may need to add the channel first.

In [None]:
! conda config --add channels conda-forge
! conda config --set channel_priority strict
! conda install --name hw-1-submission ollama -y

## 2. Interacting with Ollama

### 2.1. Using the OpenAI-Compatible Endpoint

Ollama provides a local server that mimics the OpenAI API. This allows you to use the `openai` Python client to interact with your local models by changing the `base_url`.

In [None]:
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama',  # required, but unused
)

response = client.chat.completions.create(
    model="smollm:1.7b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The LA Dodgers won in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

print(response.choices[0].message.content)

### 2.2. Using the `ollama` Python Package

The official `ollama` package provides a more direct way to interact with the Ollama server. Below are a few examples of its usage.

#### Basic Chat

This is the simplest way to send a prompt and receive a complete response.

In [None]:
from ollama import chat, ChatResponse

response: ChatResponse = chat(model='smollm:1.7b', messages=[
    {
        'role': 'user',
        'content': 'Why is the sky blue?',
    },
])

# You can access the response content as a dictionary key
print(response['message']['content'])

# Or as a field from the response object
print(response.message.content)

#### Streaming Responses

For long-running requests, you can stream the response as it's being generated.

In [None]:
from ollama import chat

stream = chat(
    model='smollm:1.7b',
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    stream=True,
)

for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)

#### Handling Streamed Chunks

This example shows a more advanced way to handle streamed responses, including capturing the model's "thinking" process if available.

In [None]:
from ollama import chat

stream = chat(
    model='smollm:1.7b',
    messages=[{'role': 'user', 'content': 'What is 17 Ã— 23?'}],
    stream=True,
)

in_thinking = False
content = ''
thinking = ''

for chunk in stream:
    if hasattr(chunk['message'], 'thinking') and chunk['message']['thinking']:
        if not in_thinking:
            in_thinking = True
            print('Thinking:\n', end='', flush=True)
        print(chunk['message']['thinking'], end='', flush=True)
        thinking += chunk['message']['thinking']
    elif chunk['message']['content']:
        if in_thinking:
            in_thinking = False
            print('\n\nAnswer:\n', end='', flush=True)
        print(chunk['message']['content'], end='', flush=True)
        content += chunk['message']['content']

# The final response can be appended to the message history for context in future requests
new_messages = [{'role': 'assistant', 'thinking': thinking, 'content': content}]