## Ollama local LLMs  

This notebook is using Ollama and the following for running local LLMs:  
* ollama sdk
* openai sdk 
  
Source 1: [Ollama Docs](https://docs.ollama.com/capabilities/tool-calling#python-2)  
Source 2: [OpenaAI Cookbook](https://cookbook.openai.com/articles/gpt-oss/run-locally-ollama)  
  
**Setup:**  
1. To load a local model run the following from the terminal: `ollama run gpt-oss:20b`
2. Run the notebook using venv and install libs from `requirements.txt`
  
### 1. Ollama SDK

In [1]:
import sys
print(sys.executable)

c:\repo\LLM\venv\Scripts\python.exe


In [None]:
# !pip install -r /requirements.txt


[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip
ERROR: Could not open requirements file: [Errno 2] No such file or directory: '/requirements.txt'


In [2]:
!python --version

Python 3.13.7


In [3]:
from ollama import chat

In [4]:
# Define the message to send to the model
message = """
        Propose the optimal prompt for the gpt-oss:20b model.
        Give an example of a poorly constructed prompt and improve it until it is optiomal.
        The prompt should be usefull for agentic tasks with tool calling.
        The response should be in markdown format.
        """
print(message)


        Propose the optimal prompt for the gpt-oss:20b model.
        Give an example of a poorly constructed prompt and improve it until it is optiomal.
        The prompt should be usefull for agentic tasks with tool calling.
        The response should be in markdown format.
        


#### Thinking

In [None]:
# Define the chat interaction with the model. For tool calling, need think=True
response = chat(
    model='gpt-oss:20b',
    messages=[{'role': 'user',
               'content':message}],
               think = "high", #True/False. Except for gpt-oss use levels: "low", "medium", "high"
               stream=False
)

print('Thinking: \n', response.message.thinking, '\n')
print('Answer: \n', response.message.content)

Thinking: 
 We need to propose an optimal prompt for gpt-oss:20b model. Provide an example of a poorly constructed prompt and improve it until optimal. The prompt should be useful for agentic tasks with tool calling. The response should be in markdown.

We need to consider the gpt-oss:20b model: it's a 20-billion-parameter open-source model, presumably from the GPT-OSS family. We need to propose an optimal prompt: likely a structured prompt that encourages the model to act as an agent, making use of tool calling, and making sure it's concise, gives context, instructions, and clarifies tool usage. We can propose a prompt format: include system messages, user messages, clarifying that the model should think step by step, use tool calls. For GPT-OSS, tool calling support might be via "function calls" interface or "tool calls" similar to GPT-4.

We should provide an example of a poorly constructed prompt, then show improved versions.

We should respond in markdown format.

We need to consi

In [8]:
print(response.message.content)

# Optimal Prompt for **gpt‑oss:20b** – Agentic Tasks with Tool Calling  
> **Goal:** Create a single, reusable prompt that turns GPT‑OSS 20B into a reliable autonomous agent that can think step‑by‑step, decide when to call tools, and format tool calls in a machine‑readable way.

---

## 1. Why a Well‑Structured Prompt Matters  

| Challenge | Impact | How a Good Prompt Helps |
|-----------|--------|------------------------|
| **Ambiguous instructions** | The model may guess, producing irrelevant or noisy output. | Explicit rules (`think first`, `output JSON`, `only JSON when calling a tool`) remove guesswork. |
| **Tool misuse or missing arguments** | Tool calls fail or return incomplete data. | Providing a *tool signature* in the prompt lets the model know exactly what arguments to supply. |
| **Excess verbosity** | Tokens are wasted, hitting limits and lowering response quality. | Concise, bullet‑pointed guidelines keep the prompt short. |
| **Inconsistent format** | The orchestrator

#### Define Tools for LLM
What improves tool usage accuracy.  
Ranked by importance:
1. Clear parameter names
2. Clear docstring
3. Simple signature
4. Stable return format (JSON-serializable)
5. Clean error handling

In [7]:
# Define tool 1 - Improved according to above
def square_number(number: int) -> int:
    """Compute the square of a single integer.
    Args:
        number (int): An integer value to be squared.

    Returns:
        result (int): The square of the input number.
    """
    if not isinstance(number, int):
        raise ValueError("number must be an integer.")
    result = number * number
    
    return result

square_number(3)

9

In [60]:
# Define tool 2 - IMPROVE THIS FUNCTION - TBD
def select_item(number: int) -> str:
    """This function returns an item (str) from a list of items based on the number provided.
    Args:
        number (int): The the number is the index of the item to be selected from a list of items.

    Returns:
        item (str): The selected item from the list.    
    """
    if not isinstance(number, int):
        raise ValueError("number must be an integer.")

    items: list[str] = ['BILLY', 'PAX', 'MARKUS', 'MICKE', 'ALGOT']

    if number < 1 or number > len(items):
        raise IndexError("number out of range.")

    return items[number - 1]

select_item(1)

'BILLY'

#### Tools calling alt 1 - One tool

In [24]:
# Define the message to send to the model and use the tool
message = """
        Compute the square of 3 using the available tool. Only return the numeric result.
"""
messages = [{'role': 'user', 'content': message}]

# Pass a Python functions diretcly as tools (or alternatively provide a JSON schema)
response = chat(
    model='gpt-oss:20b',
    messages=messages,
    tools=[square_number],
    think="high") #True/False. Except for gpt-oss use levels: "low", "medium", "high"

# Adds the model's response to the messages list. To keep the chat context up to date
messages.append(response.message)

# Debug only. Not to be used in production.
if hasattr(response.message, "thinking"):
    print("Thinking:\n", response.message.thinking)

Thinking:
 We have a user request: "Compute the square of 3 using the available tool. Only return the numeric result." There's a tool named "square_number" that computes the square of a single integer. So we must call the function "square_number" with argument number: 3. Then we only return the numeric result. The user wants only numeric result, so we should output just the number 9.

We must use the function tool: we need to produce a function call: {"name":"square_number","arguments":{...}}.

We should not include any other text. Just the function call? The instructions say "Only return the numeric result." But the tool system requires us to call the function first. So we must call the function. Then the system will respond with the result. However, the user only wants numeric result. But we can't skip the function call. According to guidelines: We must call the tool. The tool returns something like 9. The system will produce output? We need to decide: In the conversation we call the

In [25]:
if response.message.tool_calls:
    print("yes")

yes


##### Break down step-by-step of Tools calling alt 1 - One tool

In [None]:
# Step 1 - Break down 
print(type(messages))
print(len(messages))
print(messages[0])
print(messages[0]['role'])
print(messages[0]['content'])
print(messages[1])

<class 'list'>
2
{'role': 'user', 'content': '\n        Compute the square of 3 using the available tool. Only return the numeric result.\n'}
user

        Compute the square of 3 using the available tool. Only return the numeric result.

role='assistant' content='' thinking='We have a user request: "Compute the square of 3 using the available tool. Only return the numeric result." There\'s a tool named "square_number" that computes the square of a single integer. So we must call the function "square_number" with argument number: 3. Then we only return the numeric result. The user wants only numeric result, so we should output just the number 9.\n\nWe must use the function tool: we need to produce a function call: {"name":"square_number","arguments":{...}}.\n\nWe should not include any other text. Just the function call? The instructions say "Only return the numeric result." But the tool system requires us to call the function first. So we must call the function. Then the system will r

In [None]:
# Step 2 - Break down 
print(response.message.tool_calls[0])

function=Function(name='square_number', arguments={'number': 3})


In [None]:
# Step 3 - Break down 
call = response.message.tool_calls[0]
print(call)
result = square_number(**call.function.arguments)
print(result)

function=Function(name='square_number', arguments={'number': 3})
9


In [None]:
# Step 4 - Break down 

# add the tool result to the messages
messages.append({"role": "tool", 
                 "tool_name": call.function.name, 
                 "content": str(result)})

In [None]:
# Step 5 - Break down 

# debug prints after adding tool result
print(type(messages))
print(len(messages))
print(messages[0])
print(messages[0]['role'])
print(messages[0]['content'])
print(messages[1])
print(messages[2])

<class 'list'>
3
{'role': 'user', 'content': '\n        Compute the square of 3 using the available tool. Only return the numeric result.\n'}
user

        Compute the square of 3 using the available tool. Only return the numeric result.

role='assistant' content='' thinking='We have a user request: "Compute the square of 3 using the available tool. Only return the numeric result." There\'s a tool named "square_number" that computes the square of a single integer. So we must call the function "square_number" with argument number: 3. Then we only return the numeric result. The user wants only numeric result, so we should output just the number 9.\n\nWe must use the function tool: we need to produce a function call: {"name":"square_number","arguments":{...}}.\n\nWe should not include any other text. Just the function call? The instructions say "Only return the numeric result." But the tool system requires us to call the function first. So we must call the function. Then the system will r

In [33]:
# only recommended for models which only return a single tool call
messages.append(response.message)
if response.message.tool_calls:

  call = response.message.tool_calls[0] # get function call - function=Function(name='square_number', arguments={'number': 2})
  result = square_number(**call.function.arguments) # run the tool with the arguments

  # add the tool result to the messages - appends another item to the list
  messages.append({"role": "tool", 
                   "tool_name": call.function.name, 
                   "content": str(result)})

  final_response = chat(model="gpt-oss:20b", 
                        messages=messages, 
                        tools=[square_number], 
                        think="high")

  print('Final Thinking: \n', final_response.message.thinking, '\n')

Final Thinking: 
 None 



In [None]:
# Step 6 - Break down 

print(final_response.message) # Why is think none?

role='assistant' content='9' thinking=None images=None tool_name=None tool_calls=None


In [None]:
# Step 7 - Break down - END OF BREAKDOWN 

print(final_response.message.content)

9


In [None]:
# Improved version (tool complete)
# It seems that the model needs to be called twice - once for tool planning 
# and once for final response after tool execution

# User prompt
message = "Compute the square of 3 using the available tool. Return only the numeric result."

messages = [{"role": "user", "content": message}]

# First model call (tool planning)
response = chat(
    model="gpt-oss:20b",
    messages=messages,
    tools=[square_number],
    think="high"
)

messages.append(response.message)

# Debug: internal reasoning (optional)
if hasattr(response.message, "thinking"):
    print("Thinking:\n", response.message.thinking, "\n") # Print thinking process before tool calling

# Tool execution
if response.message.tool_calls:
    call = response.message.tool_calls[0] # get function call - function=Function(name='square_number', arguments={'number': 3})

    result = square_number(**call.function.arguments) # run the tool with the arguments

    # Inject tool result
    messages.append({
        "role": "tool",
        "tool_name": call.function.name,
        "content": str(result)
    })

    # Final model response
    final_response = chat(
        model="gpt-oss:20b",
        messages=messages,
        tools=[square_number],
        think="high"
    )

    print("Final Answer:", final_response.message.content)

Thinking:
 The user says: "Compute the square of 3 using the available tool. Return only the numeric result."

We must call the square_number function with number: 3. Then return only numeric result. So we need to call the function via the "functions" tool. We'll do that. 

Final Answer: 9


##### Tools calling alt 1 - DEBUGGING

In [83]:
# print('Answer: \n', response.message.content) # There is no content now?
print(response.message)

role='assistant' content='4' thinking='User asks: "What is the square of 2?" Only return numeric result. So we should compute 2 squared = 4. But we also have tool square_number. They want numeric result. We can just call the function? The instructions say "Only return the numeric result." So we should produce just "4". We can use the tool or not. According to guidelines, when using tool we produce a tool call JSON. But user expects only numeric result. We might not need to use tool. It\'s simpler: just output 4. But guidelines: The user didn\'t ask for the function output. But we can still compute ourselves. The result is 4. We\'ll output "4".' images=None tool_name=None tool_calls=None


In [None]:
# debug prints
print(type(messages))
print(len(messages))
print(messages[0])
print(messages[0]['role'])
print(messages[0]['content'])

In [None]:
# debug prints
print(type(response))
help(response)

<class 'ollama._types.ChatResponse'>
Help on ChatResponse in module ollama._types object:

class ChatResponse(BaseGenerateResponse)
 |  ChatResponse(
 |      *,
 |      model: Optional[str] = None,
 |      created_at: Optional[str] = None,
 |      done: Optional[bool] = None,
 |      done_reason: Optional[str] = None,
 |      total_duration: Optional[int] = None,
 |      load_duration: Optional[int] = None,
 |      prompt_eval_count: Optional[int] = None,
 |      prompt_eval_duration: Optional[int] = None,
 |      eval_count: Optional[int] = None,
 |      eval_duration: Optional[int] = None,
 |      message: ollama._types.Message
 |  ) -> None
 |
 |  Response returned by chat requests.
 |
 |  Method resolution order:
 |      ChatResponse
 |      BaseGenerateResponse
 |      SubscriptableBaseModel
 |      pydantic.main.BaseModel
 |      builtins.object
 |
 |  Data and other attributes defined here:
 |
 |  __abstractmethods__ = frozenset()
 |
 |  __annotations__ = {'message': <class 'oll

In [None]:
# inspect response prints
print(response.model)
print(response.created_at)
print(response.done)
print(response.done_reason)
print(response.total_duration)
print(response.load_duration)
print(response.prompt_eval_count)
print(response.prompt_eval_duration)
print(response.eval_count)
print(response.eval_duration)
print(response.message)

gpt-oss:20b
2025-12-26T21:05:27.5932441Z
True
stop
624075300
148588300
149
103599800
42
356871600
role='assistant' content='' thinking='We need to call the tool square_nums with number=2. Then return the result.' images=None tool_name=None tool_calls=[ToolCall(function=Function(name='square_nums', arguments={'number': 2}))]


In [65]:
print(type(response.message))
help(response.message)

<class 'ollama._types.Message'>
Help on Message in module ollama._types object:

class Message(SubscriptableBaseModel)
 |  Message(
 |      *,
 |      role: str,
 |      content: Optional[str] = None,
 |      thinking: Optional[str] = None,
 |      images: Optional[Sequence[ollama._types.Image]] = None,
 |      tool_name: Optional[str] = None,
 |      tool_calls: Optional[Sequence[ollama._types.Message.ToolCall]] = None
 |  ) -> None
 |
 |  Chat message.
 |
 |  Method resolution order:
 |      Message
 |      SubscriptableBaseModel
 |      pydantic.main.BaseModel
 |      builtins.object
 |
 |  Data and other attributes defined here:
 |
 |  ToolCall = <class 'ollama._types.Message.ToolCall'>
 |      Model tool calls.
 |
 |
 |  __abstractmethods__ = frozenset()
 |
 |  __annotations__ = {'content': typing.Optional[str], 'images': typing.O...
 |
 |  __class_vars__ = set()
 |
 |  __private_attributes__ = {}
 |
 |  __pydantic_complete__ = True
 |
 |  __pydantic_computed_fields__ = {}
 |
 |  

In [97]:
# inspect response.message prints
print("role:", response.message.role)
print("content:", response.message.content)
print("thinking:", response.message.thinking)
print("images:", response.message.images)
print("tool_name:", response.message.tool_name)
print("tool_calls:", response.message.tool_calls)
print("length of tool_calls:", len(response.message.tool_calls))

role: assistant
content: 
thinking: The user wants the square of 2. They want only the numeric result, no other explanation. They gave a tool: compute square_number. So we should use the tool. The input should be number: 2. Then we return the result.

We need to call the function.
images: None
tool_name: None
tool_calls: [ToolCall(function=Function(name='square_number', arguments={'number': 2}))]
length of tool_calls: 1


##### Tools calling alt 1 - END OF DEBUGGING

#### Tools calling alt 1 - Two tools

In [71]:
# Define the message to send to the model and use the tool
message = """
        Compute the square of 2 using the available tool. Return only the numeric result and use it to call another tool to select an item.
        Only return the squared number and the item.
"""
messages = [
    {'role': 'user',
     'content': message,
    }
]

# # pass tools to the model
# response = chat(
#     model='gpt-oss:20b',
#     messages=messages,
#     tools=[square_number, select_item],
#     think="high")

# messages.append(response.message)

# # Debug: internal reasoning (optional)
# if hasattr(response.message, "thinking:"):
#     print("thinking:\n", response.message.thinking, "\n") # Print thinking process before tool calling

In [72]:
tools = [square_number, select_item]

while True:
    response = chat(
        model="gpt-oss:20b",
        messages=messages,
        tools=tools,
        think="high"
    )

    message = response.message
    messages.append(message)

    tool_calls = message.get("tool_calls")

    # Stop if no tool calls are requested
    if not tool_calls:
        break

    for call in tool_calls:
        tool_name = call["function"]["name"]
        args = call["function"]["arguments"]

        if tool_name == "square_number":
            result = square_number(**args)
        elif tool_name == "select_item":
            result = select_item(**args)
        else:
            raise RuntimeError(f"Unknown tool: {tool_name}")

        messages.append({
            "role": "tool",
            "tool_name": tool_name,
            "content": str(result)
        })

In [73]:
# At this poin tools have been executed
# Results are in messages
# But there is no final answer yet

final_response = chat(
    model="gpt-oss:20b",
    messages=messages
)

In [75]:
print("Final Answer:", final_response.message.content)

Final Answer: 


In [74]:
print(final_response.message.get("content"))




In [65]:
# print(len(response.message.thinking))
print(response.message.thinking)

None


In [66]:
print(response.message.tool_calls)

None


In [67]:
# debug prints after adding tool result
print(type(messages))
print(len(messages))

<class 'list'>
3


In [68]:
print(final_response.message.content)




In [56]:
if response.message.tool_calls:
  # process each tool call 
  for call in response.message.tool_calls:
    # execute the appropriate tool
    if call.function.name == 'square_number':
      result = square_number(**call.function.arguments)
    elif call.function.name == 'select_item':
      result = select_item(**call.function.arguments)
    else:
      result = 'Unknown tool'
    # add the tool result to the messages
    messages.append({'role': 'tool',  'tool_name': call.function.name, 'content': str(result)})


In [57]:
# debug prints after adding tool result
print(type(messages))
print(len(messages))

<class 'list'>
2


In [58]:
# generate the final response
final_response = chat(model='gpt-oss:20b', 
                      messages=messages, 
                      tools=[square_number, select_item], 
                      think="high")

In [59]:
print(final_response.message.content)




Limitations in Ollama, it cannot really call multiple tools?

### 2. OpenAI SDK

### Tools calling alt 2

In [21]:
import openai
print("Library version:", openai.__version__)

Library version: 2.8.1


Ollama exposes a Chat Completion-compatible API, so we can use the OpenAI SDK withouth chaning much.

In [14]:
from openai import OpenAI
 
client = OpenAI(
    base_url="http://localhost:11434/v1",  # Local Ollama API
    api_key="ollama"                       # Dummy key
)
 
response = client.chat.completions.create(
    model="gpt-oss:20b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain how to prompt the gpt-oss:20b model effectively."}
    ]
)
 
print(response.choices[0].message.content)

### Prompt‑Engineering for the **gpt‑oss:20B** Model  
*(An 20‑billion‑parameter, open‑source, instruction‑tuned GPT‑style model)*  

Below is a practical, step‑by‑step reference that covers everything you need to get **clean, consistent, and usable output** from gpt‑oss:20B. It blends the *science* of prompt design (token limits, instruction tone, etc.) with *hands‑on tips* that you can apply right away, whether you’re calling the model via the 🤗 Transformers API, a custom Docker deployment, or a local web front‑end.

---

## 1. Know the Model’s Key Characteristics

| Feature | What You Need to Know |
|---------|------------------------|
| **Token limit** | 8 192 tokens (≈ 32 k words).  |  
| **Best instruction‑style** | A **system message** followed by a **user message** gives the model the clearest direction. |
| **Architecture** | LLaMA‑style transformer with full‑attention. No special prompt format like GPT‑3’s “system” vs. “assistant” tags – just a list of messages. |
| **Calibra

In [None]:
# Try out the above... TBD

### 3. Ollama + Python agent loop.  
Fully local, minimal dependencies, deterministic.  
We must code the orchestration ourselves.

### Tools calling alt 2

In [None]:
# Try out below ... TBD

In [None]:
# tools.py

def square_number(number: int) -> int:
    """Compute the square of a number."""
    if not isinstance(number, int):
        raise ValueError("number must be an integer.")
    return number * number


def select_item(index: int) -> str:
    """Select an item from a list based on index."""
    items = ['BILLY', 'PAX', 'MARKUS', 'MICKE', 'ALGOT']
    if index < 1 or index > len(items):
        raise ValueError("Index out of range")
    return items[index - 1]

In [None]:
### TRY THIS - Download ministral model!

# agent_loop.py

from ollama import chat
#from tools import square_number, select_item

# Register tools
TOOLS = {
    "square_number": square_number,
    "select_item": select_item
}

# Initial user prompt
messages = [{
    "role": "user",
    "content": (
        "Compute the square of 2. "
        "Then use that number to select an item from the list. "
        "Return only the squared number and the selected item."
    )
}]

MAX_STEPS = 5  # safety to prevent infinite loops

for step in range(MAX_STEPS):
    # Step 1: Ask the model what to do next
    response = chat(
        model="ministral-3:14b", # This is preferred for tool use
        messages=messages,
        tools=list(TOOLS.values())  # Ollama knows the available functions
    )

    assistant_msg = response.message
    messages.append(assistant_msg)

    # Step 2: Check for tool calls
    tool_calls = assistant_msg.get("tool_calls", [])
    if not tool_calls:
        break  # no more tools to execute

    # Step 3: Execute each tool call
    for call in tool_calls:
        name = call["function"]["name"]
        args = call["function"]["arguments"]

        if name not in TOOLS:
            raise RuntimeError(f"Unknown tool: {name}")

        # Execute tool
        result = TOOLS[name](**args)

        # Feed result back into conversation
        messages.append({
            "role": "tool",
            "tool_name": name,
            "content": str(result)
        })

# Step 4: Force final answer generation
messages.append({
    "role": "user",
    "content": "Provide the final answer using the tool results only."
})

final_response = chat(
    model="ministral-3:14b", # This is preferred for tool use
    messages=messages
)

print("Final Answer:", final_response.message["content"])


Final Answer: 4, None


### Realistic solutions: ###

1. Switch to a more instruction-tuned local model like `ministral-3:14b`.  
  * It reliably follows structured instructions and JSON output.
  * Fits your RTX 5090 VRAM with 4/8-bit quantization.
2. Simpler agent loop (without JSON) works better with `gpt-oss:20b`:
  * Ask the model explicitly for the next tool and arguments in plain text.
  * Python parses it with simple string parsing (e.g., regex for numbers) instead of strict JSON.  
3. If you insist on JSON:
  * You may need a “validator + retry” loop: keep prompting the model until it outputs valid JSON.
  * `gpt-oss:20b` may fail 30–50% of the time on the first attempt.