## Ollama local LLMs  

This notebook is using Ollama and the following for running local LLMs:  
* ollama sdk
* openai sdk 
  
Source 1: [Ollama Docs](https://docs.ollama.com/capabilities/tool-calling#python-2)  
Source 2: [OpenaAI Cookbook](https://cookbook.openai.com/articles/gpt-oss/run-locally-ollama)  
  
**Setup:**  
1. To load a local model run the following from the terminal: `ollama run gpt-oss:20b`
2. Run the notebook using venv and install libs from `requirements.txt`
  
### 1. Ollama SDK

In [1]:
import sys
print(sys.executable)

c:\repo\LLM\venv\Scripts\python.exe


In [None]:
# !pip install -r /requirements.txt


[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip
ERROR: Could not open requirements file: [Errno 2] No such file or directory: '/requirements.txt'


In [2]:
!python --version

Python 3.13.7


In [3]:
from ollama import chat

In [4]:
# Define the message to send to the model
message = """
        Propose the optimal prompt for the gpt-oss:20b model.
        Give an example of a poorly constructed prompt and improve it until it is optiomal.
        The prompt should be usefull for agentic tasks with tool calling.
        The response should be in markdown format.
        """
print(message)


        Propose the optimal prompt for the gpt-oss:20b model.
        Give an example of a poorly constructed prompt and improve it until it is optiomal.
        The prompt should be usefull for agentic tasks with tool calling.
        The response should be in markdown format.
        


#### Thinking

In [None]:
# Define the chat interaction with the model. For tool calling, need think=True
response = chat(
    model='gpt-oss:20b',
    messages=[{'role': 'user',
               'content':message}],
               think = "high", #True/False. Except for gpt-oss use levels: "low", "medium", "high"
               stream=False
)

print('Thinking: \n', response.message.thinking, '\n')
print('Answer: \n', response.message.content)

Thinking: 
 We need to propose an optimal prompt for gpt-oss:20b model. Provide an example of a poorly constructed prompt and improve it until optimal. The prompt should be useful for agentic tasks with tool calling. The response should be in markdown.

We need to consider the gpt-oss:20b model: it's a 20-billion-parameter open-source model, presumably from the GPT-OSS family. We need to propose an optimal prompt: likely a structured prompt that encourages the model to act as an agent, making use of tool calling, and making sure it's concise, gives context, instructions, and clarifies tool usage. We can propose a prompt format: include system messages, user messages, clarifying that the model should think step by step, use tool calls. For GPT-OSS, tool calling support might be via "function calls" interface or "tool calls" similar to GPT-4.

We should provide an example of a poorly constructed prompt, then show improved versions.

We should respond in markdown format.

We need to consi

In [8]:
print(response.message.content)

# Optimal Prompt for **gpt‑oss:20b** – Agentic Tasks with Tool Calling  
> **Goal:** Create a single, reusable prompt that turns GPT‑OSS 20B into a reliable autonomous agent that can think step‑by‑step, decide when to call tools, and format tool calls in a machine‑readable way.

---

## 1. Why a Well‑Structured Prompt Matters  

| Challenge | Impact | How a Good Prompt Helps |
|-----------|--------|------------------------|
| **Ambiguous instructions** | The model may guess, producing irrelevant or noisy output. | Explicit rules (`think first`, `output JSON`, `only JSON when calling a tool`) remove guesswork. |
| **Tool misuse or missing arguments** | Tool calls fail or return incomplete data. | Providing a *tool signature* in the prompt lets the model know exactly what arguments to supply. |
| **Excess verbosity** | Tokens are wasted, hitting limits and lowering response quality. | Concise, bullet‑pointed guidelines keep the prompt short. |
| **Inconsistent format** | The orchestrator

#### Define Tools for LLM
What improves tool usage accuracy:
1. Ranked by importance:
2. Clear parameter names
3. Clear docstring
4. Simple signature
5. Stable return format (JSON-serializable)
6. Clean error handling

In [95]:
# Define tool 1 - Improved according to above
def square_number(number: int) -> int:
    """Compute the square of a single integer.
    Args:
        number (int): An integer value to be squared.

    Returns:
        result (int): The square of the input number.
    """
    if not isinstance(number, int):
        raise ValueError("number must be an integer.")
    result = number * number
    
    return result

square_number(3)

9

In [None]:
# Define tool 2 - IMPROVE THIS FUNCTION - TBD
def select_item(number: int) -> str:
    """This function returns an item (str) from a list of items based on the number provided.
    Args:
        number (int): The the number is the index of the item to be selected from a list of items.

    Returns:
        item (str): The selected item from the list.    
    """
    
    items: list[str] = ['BILLY', 'PAX', 'MARKUS', 'MICKE', 'ALGOT']
    if number <= 0 or number > len(items):
        return "Error: Number out of range."
      
    else:
        item = items[number-1]
    
    return item

select_item(1)

'BILLY'

#### Tools calling alt 1 - One tool

In [102]:
# Define the message to send to the model and use the tool
message = """
        Compute the square of 2 using the available tool? Only return the numeric result.
"""
messages = [{'role': 'user', 'content': message}]

# Pass a Python functions diretcly as tools (or alternatively provide a JSON schema)
response = chat(
    model='gpt-oss:20b',
    messages=messages,
    tools=[square_number],
    think="high") #True/False. Except for gpt-oss use levels: "low", "medium", "high"

# Adds the model's response to the messages list. To keep the chat context up to date
messages.append(response.message)

# Debug only. Not to be used in production.
if hasattr(response.message, "thinking"):
    print("Thinking:\n", response.message.thinking)

Thinking:
 We are ChatGPT, with knowledge of available tools. The user says "Compute the square of 2 using the available tool? Only return the numeric result." So we must use the function square_number. There's a tool defined. We need to call the function. The response should be numeric only.

We must use the functions namespace: functions.square_number. We'll call with number: 2. The result will be something like {result: ...} but we want only numeric result.

We need to produce the function call. The spec says: "Only return the numeric result." However, to use the tool we must produce a JSON function call as the output, I think.

But the instructions: "You are ChatGPT, ... Only return the numeric result." Possibly we must directly call the function and output the numeric result. But we can't just call function. The system expects us to produce a tool invocation. Let's think: The instruction says "Compute the square of 2 using the available tool? Only return the numeric result." That 

In [103]:
if response.message.tool_calls:
    print("yes")

yes


In [100]:
# Break down this function calling cell - TBD
messages.append(response.message)
if response.message.tool_calls:
  # only recommended for models which only return a single tool call
  call = response.message.tool_calls[0]
  result = square_number(**call.function.arguments)

  # add the tool result to the messages
  messages.append({"role": "tool", 
                   "tool_name": call.function.name, 
                   "content": str(result)})

  final_response = chat(model="gpt-oss:20b", 
                        messages=messages, 
                        tools=[square_number], 
                        think="high")

  print('Final Thinking: \n', final_response.message.thinking, '\n')

Final Thinking: 
 None 



In [101]:
print(final_response.message.content)

4


In [None]:
# Improved version (tool complete) - BREAK THIS DOEN ROW BY ROW - TBD
# It seems that the model needs to be called twice - once for tool planning 
# and once for final response after tool execution

# User prompt
message = "Compute the square of 2 using the available tool. Return only the numeric result."

messages = [{"role": "user", "content": message}]

# First model call (tool planning)
response = chat(
    model="gpt-oss:20b",
    messages=messages,
    tools=[square_number],
    think="high"
)

messages.append(response.message)

# Debug: internal reasoning (optional)
if hasattr(response.message, "thinking"):
    print("Thinking:\n", response.message.thinking, "\n")

# Tool execution
if response.message.tool_calls:
    call = response.message.tool_calls[0]

    result = square_number(**call.function.arguments)

    # Inject tool result
    messages.append({
        "role": "tool",
        "tool_name": call.function.name,
        "content": str(result)
    })

    # Final model response
    final_response = chat(
        model="gpt-oss:20b",
        messages=messages,
        tools=[square_number],
        think="high"
    )

    print("Final Answer:", final_response.message.content)

##### Tools calling alt 1 - DEBUGGING

In [83]:
# print('Answer: \n', response.message.content) # There is no content now?
print(response.message)

role='assistant' content='4' thinking='User asks: "What is the square of 2?" Only return numeric result. So we should compute 2 squared = 4. But we also have tool square_number. They want numeric result. We can just call the function? The instructions say "Only return the numeric result." So we should produce just "4". We can use the tool or not. According to guidelines, when using tool we produce a tool call JSON. But user expects only numeric result. We might not need to use tool. It\'s simpler: just output 4. But guidelines: The user didn\'t ask for the function output. But we can still compute ourselves. The result is 4. We\'ll output "4".' images=None tool_name=None tool_calls=None


In [84]:
# debug prints
print(type(messages))
print(len(messages))
print(messages[0])
print(messages[0]['role'])
print(messages[0]['content'])

<class 'list'>
1
{'role': 'user', 'content': '\n        What is the square of 2? \n        Only return the numeric result.\n'}
user

        What is the square of 2? 
        Only return the numeric result.



In [None]:
# debug prints
print(type(response))
help(response)

<class 'ollama._types.ChatResponse'>
Help on ChatResponse in module ollama._types object:

class ChatResponse(BaseGenerateResponse)
 |  ChatResponse(
 |      *,
 |      model: Optional[str] = None,
 |      created_at: Optional[str] = None,
 |      done: Optional[bool] = None,
 |      done_reason: Optional[str] = None,
 |      total_duration: Optional[int] = None,
 |      load_duration: Optional[int] = None,
 |      prompt_eval_count: Optional[int] = None,
 |      prompt_eval_duration: Optional[int] = None,
 |      eval_count: Optional[int] = None,
 |      eval_duration: Optional[int] = None,
 |      message: ollama._types.Message
 |  ) -> None
 |
 |  Response returned by chat requests.
 |
 |  Method resolution order:
 |      ChatResponse
 |      BaseGenerateResponse
 |      SubscriptableBaseModel
 |      pydantic.main.BaseModel
 |      builtins.object
 |
 |  Data and other attributes defined here:
 |
 |  __abstractmethods__ = frozenset()
 |
 |  __annotations__ = {'message': <class 'oll

In [None]:
# inspect response prints
print(response.model)
print(response.created_at)
print(response.done)
print(response.done_reason)
print(response.total_duration)
print(response.load_duration)
print(response.prompt_eval_count)
print(response.prompt_eval_duration)
print(response.eval_count)
print(response.eval_duration)
print(response.message)

gpt-oss:20b
2025-12-26T21:05:27.5932441Z
True
stop
624075300
148588300
149
103599800
42
356871600
role='assistant' content='' thinking='We need to call the tool square_nums with number=2. Then return the result.' images=None tool_name=None tool_calls=[ToolCall(function=Function(name='square_nums', arguments={'number': 2}))]


In [65]:
print(type(response.message))
help(response.message)

<class 'ollama._types.Message'>
Help on Message in module ollama._types object:

class Message(SubscriptableBaseModel)
 |  Message(
 |      *,
 |      role: str,
 |      content: Optional[str] = None,
 |      thinking: Optional[str] = None,
 |      images: Optional[Sequence[ollama._types.Image]] = None,
 |      tool_name: Optional[str] = None,
 |      tool_calls: Optional[Sequence[ollama._types.Message.ToolCall]] = None
 |  ) -> None
 |
 |  Chat message.
 |
 |  Method resolution order:
 |      Message
 |      SubscriptableBaseModel
 |      pydantic.main.BaseModel
 |      builtins.object
 |
 |  Data and other attributes defined here:
 |
 |  ToolCall = <class 'ollama._types.Message.ToolCall'>
 |      Model tool calls.
 |
 |
 |  __abstractmethods__ = frozenset()
 |
 |  __annotations__ = {'content': typing.Optional[str], 'images': typing.O...
 |
 |  __class_vars__ = set()
 |
 |  __private_attributes__ = {}
 |
 |  __pydantic_complete__ = True
 |
 |  __pydantic_computed_fields__ = {}
 |
 |  

In [97]:
# inspect response.message prints
print("role:", response.message.role)
print("content:", response.message.content)
print("thinking:", response.message.thinking)
print("images:", response.message.images)
print("tool_name:", response.message.tool_name)
print("tool_calls:", response.message.tool_calls)
print("length of tool_calls:", len(response.message.tool_calls))

role: assistant
content: 
thinking: The user wants the square of 2. They want only the numeric result, no other explanation. They gave a tool: compute square_number. So we should use the tool. The input should be number: 2. Then we return the result.

We need to call the function.
images: None
tool_name: None
tool_calls: [ToolCall(function=Function(name='square_number', arguments={'number': 2}))]
length of tool_calls: 1


##### Tools calling alt 1 - END OF DEBUGGING

#### Tools calling alt 1 - Two tools

In [None]:
# Define the message to send to the model and use the tool
message = """
        Give me the item by taking my number and calling the suqare_num function and use the output of that to call the select_item function which will return the item as a string. 
        Only return the squared number and the item.
"""
messages = [
    {'role': 'user',
     'content': message,
    }
]

# pass tool to the model
response = chat(
    model='gpt-oss:20b',
    messages=messages,
    tools=[square_number,select_item],
    think=True)

print('Thinking: \n', response.message.thinking, '\n')

Thinking: 
 The user wants: "Give me the item by taking my number and calling the square_num function and use the output of that to call the select_item function which will return the item as a string. Only return the squared number and the item."

They didn't provide a number. We need to ask for the number? They want to supply number? The instruction: "Give me the item by taking my number and calling the suqare_num function..." But they didn't provide a number. So we need to prompt for a number.

But maybe they omitted it but we should ask: "Please provide your number." Also mention that we will call square_nums then select_item. According to instruction "Only return the squared number and the item." So we should output just the squared number and item. But we don't have the number yet.

Thus ask for number. 



In [73]:
print('Answer: \n', response.message.content)

response.message.tool_calls

Answer: 
 Sure! Please tell me the number you’d like to use. Once I have it, I’ll square it and use that result to pick the corresponding item.


In [None]:
# Define the message to send to the model and use the tool -- TBD!!!
message = """
        Give me the item based on my number whivh is 2? Only return the numeric result and the item.
"""
messages = [
    {'role': 'user',
     'content': message,
    }
]

# pass tool to the model
response = chat(
    model='gpt-oss:20b',
    messages=messages,
    tools=[square_number,select_item],
    think=True)

print('Thinking: \n', response.message.thinking, '\n')

Thinking: 
 The user says: "Give me the item based on my number which is 2? Only return the numeric result and the item." So we need to pick an item from a list based on the number. But we don't have a list. The prompt mentions "function returns an item (str) from a list of items based on the number provided." But we don't have the list. Probably we need to ask for the list? The instruction says only return numeric result and item. But we need a list. There's no context. Maybe they assume we have a default list? This is ambiguous. We could assume a standard list: maybe [ "apple", "banana", "cherry" ]. But that's arbitrary. The instruction says "Give me the item based on my number wichi is 2? Only return the numeric result and the item." Possibly the user wants to call the function with number=2. We can use the function select_item with number=2. But we need to pass a list? The function signature only expects a number. It doesn't provide a list. The function is ambiguous: maybe the list

In [79]:
print(final_response.message.content)

4


### 2. OpenAI SDK

### Tools calling alt 2

In [21]:
import openai
print("Library version:", openai.__version__)

Library version: 2.8.1


Ollama exposes a Chat Completion-compatible API, so we can use the OpenAI SDK withouth chaning much.

In [14]:
from openai import OpenAI
 
client = OpenAI(
    base_url="http://localhost:11434/v1",  # Local Ollama API
    api_key="ollama"                       # Dummy key
)
 
response = client.chat.completions.create(
    model="gpt-oss:20b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain how to prompt the gpt-oss:20b model effectively."}
    ]
)
 
print(response.choices[0].message.content)

### Prompt‑Engineering for the **gpt‑oss:20B** Model  
*(An 20‑billion‑parameter, open‑source, instruction‑tuned GPT‑style model)*  

Below is a practical, step‑by‑step reference that covers everything you need to get **clean, consistent, and usable output** from gpt‑oss:20B. It blends the *science* of prompt design (token limits, instruction tone, etc.) with *hands‑on tips* that you can apply right away, whether you’re calling the model via the 🤗 Transformers API, a custom Docker deployment, or a local web front‑end.

---

## 1. Know the Model’s Key Characteristics

| Feature | What You Need to Know |
|---------|------------------------|
| **Token limit** | 8 192 tokens (≈ 32 k words).  |  
| **Best instruction‑style** | A **system message** followed by a **user message** gives the model the clearest direction. |
| **Architecture** | LLaMA‑style transformer with full‑attention. No special prompt format like GPT‑3’s “system” vs. “assistant” tags – just a list of messages. |
| **Calibra