## Ollama local LLMs  

This notebook is using Ollama and the following for running local LLMs:  
* ollama sdk
* openai sdk 
  
Source 1: [Ollama Docs](https://docs.ollama.com/capabilities/tool-calling#python-2)  
Source 2: [OpenaAI Cookbook](https://cookbook.openai.com/articles/gpt-oss/run-locally-ollama)  
  
**Setup:**  
1. To load a local model run the following from the terminal: `ollama run gpt-oss:20b`
2. Run the notebook using venv and install libs from `requirements.txt`
  
### 1. Ollama SDK

In [1]:
import sys
print(sys.executable)

c:\repo\LLM\venv\Scripts\python.exe


In [None]:
# !pip install -r /requirements.txt


[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip
ERROR: Could not open requirements file: [Errno 2] No such file or directory: '/requirements.txt'


In [2]:
!python --version

Python 3.13.7


In [3]:
from ollama import chat

In [4]:
# Define the message to send to the model
message = """
        Propose the optimal prompt for the gpt-oss:20b model.
        Given an example of a poorly constructed prompt and improve it until it is optiomal.
        """
print(message)


        Propose the optimal prompt for the gpt-oss:20b model.
        Given an example of a poorly constructed prompt and improve it until it is optiomal.
        


#### Thinking

In [8]:
# Define the chat interaction with the model
response = chat(
    model='gpt-oss:20b',
    messages=[{'role': 'user',
               'content':message}],
               think=True,
               stream=False
)

print('Thinking: \n', response.message.thinking, '\n')
print('Answer: \n', response.message.content)

Thinking: 
 We need to propose optimal prompt for GPT-OSS:20B model. Provide an example of a poorly constructed prompt and improve it until optimal. Likely we need to discuss aspects: context, clarity, length, format, explicitness, etc. Provide before and after examples. Also propose the final prompt guidelines. So answer: first a general best-practice prompt. Then an example of bad prompt, improvement steps.

We should keep it succinct but thorough. Provide guidelines: specify role, task, constraints, format, etc.

We need to mention that GPT-OSS:20B is an open-source large language model; likely similar to GPT-3/4. Provide prompt design guidelines: role specification, instruction clarity, context, examples, formatting, output specification, etc. Provide sample prompts: initial poorly constructed (e.g., "Explain quantum computing.") and improved version: "You are a quantum physicist... produce a 300-word explanation...".

But the user says: "Propose the optimal prompt for the gpt-oss:

In [9]:
print(response.message.content)

## 1.  “Optimal Prompt” for **gpt‑oss:20B**

> **Purpose** – The prompt is the *only* thing you can hand to a large‑language‑model (LLM) to shape its behavior.  
> **Goal** – Make the LLM produce a single, high‑quality answer that follows your constraints exactly, using the fewest possible tokens so the model has more capacity for the answer itself.

Below is a **template** you can copy‑paste and adjust for almost any task.  It contains all the ingredients that research and practice have shown to work best with a 20‑B‑parameter LLM:

```text
You are a {role: e.g. "professional copywriter"} who {has expertise in X: e.g. "B2B SaaS marketing"}.

Task: {explicit, concise instruction}
    • {Primary objective: e.g. "Write a 150‑word product description"}
    • {Tone: e.g. "Professional and persuasive"}
    • {Target audience: e.g. "C‑level executives in the tech industry"}
    • {Formatting: e.g. "Start with a headline, then a short paragraph"].

Constraints: 
    • No personal data or priv

#### Tools calling alt 1

In [10]:
# Define tool 
def square_nums(number: int) -> int:
    """This function returns the square of a number.
    Args:
        number (int): The number to be squared."""
    results = number * number
    
    return results

square_nums(23)

529

In [16]:
# Define the message to send to the model and use the tool
message = """
        What is the square of 22?
"""
messages = [
    {'role': 'user',
     'content': message,
    }
]

# pass tool to the model
response = chat(
    model='gpt-oss:20b',
    messages=messages,
    tools=[square_nums],
    think=True)

print('Thinking: \n', response.message.thinking, '\n')

Thinking: 
 The user asks: "What is the square of 22?" We should use the tool "square_nums" to compute. Then respond. 



In [None]:
# print('Answer: \n', response.message.content)

Answer: 
 


In [17]:
# Break down this function calling cell - TBD
messages.append(response.message)
if response.message.tool_calls:
  # only recommended for models which only return a single tool call
  call = response.message.tool_calls[0]
  result = square_nums(**call.function.arguments)

  # add the tool result to the messages
  messages.append({"role": "tool", 
                   "tool_name": call.function.name, 
                   "content": str(result)})

  final_response = chat(model="gpt-oss:20b", 
                        messages=messages, 
                        tools=[square_nums], 
                        think=True)
  
  print('Final Thinking: \n', final_response.message.thinking, '\n')


Final Thinking: 
 None 



In [18]:
print(final_response.message.content)

The square of 22 is **484**.


### 2. OpenAI SDK

### Tools calling alt 2

In [None]:
import openai
print("Library version:", openai.__version__)

Library version: 2.8.1


Ollama exposes a Chat Completion-compatible API, so we can use the OpenAI SDK withouth chaning much.

In [21]:
from openai import OpenAI
 
client = OpenAI(
    base_url="http://localhost:11434/v1",  # Local Ollama API
    api_key="ollama"                       # Dummy key
)
 
response = client.chat.completions.create(
    model="gpt-oss:20b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain how to prompt the gpt-oss:20b model effectively."}
    ]
)
 
print(response.choices[0].message.content)

## Prompt‑Engineering 101 for **GPT‑OSS:20B**

*GPT‑OSS:20B* is a 20‑billion‑parameter open‑source transformer (≈ ≈ 10 GB of model weights).  
It behaves similarly to GPT‑3.5‑turbo but has a **≈ 4 k–5 k token context window** (depending on the build) and no built‑in instruction‑following or safety filters. That means the *quality* of your prompt is the single biggest lever for consistent, useful results.

Below is a practical checklist and set of templates you can copy‑paste, tweak, or build on.  

---

### 1. Understand the Basics

| What you need to know | Why it matters |
|-----------------------|----------------|
| **Token limit** | 4 k–5 k tokens max. Keep the prompt + expected reply under that. |
| **No implicit safety** | The model can hallucinate or produce harmful content. Use filtering or post‑processing. |
| **No external memory** | Each generation is independent; you must pass all context you want the model to see. |

---

### 2. Three‑Part Prompt Architecture

| Section | 