# 2 - Serve Local LLMs

Quick sanity check on the current environment

In [86]:
import sys
sys.executable

'/Users/ian/miniforge3/envs/tutorial-local-llm/bin/python'

In [88]:
sys.version

'3.12.12 | packaged by conda-forge | (main, Oct 22 2025, 23:34:53) [Clang 19.1.7 ]'

## 2.1 Check Ollama available models

If you did the setup correctly (see `README.md` in the repo root) then you should see at least a few models already available locally.

In [20]:
import ollama

The model details are a bit buried in the return object from the `.list()` call

In [21]:
models = list(ollama.list())[0][1]

In [53]:
print('\n\nOllama models (local):\n')
for m in models:
    print(f'{m.model:<30}\t'+
          f'{m.details.family}\t'+
          f'{m.details.parameter_size}\t'+
          f'{int(m.size/(1e6)):>6} MB\t'+
          f'{m.details.quantization_level}\t'+
          f'{m.details.format}')



Ollama models (local):

qwen3:4b                      	qwen3	4.0B	  2497 MB	Q4_K_M	gguf
sematre/orpheus:ft-en-3b-q2_k 	llama	3.8B	  1595 MB	Q2_K	gguf
sematre/orpheus:ft-en-3b      	llama	3.8B	  4028 MB	Q8_0	gguf
llama3.2-vision:latest        	mllama	10.7B	  7816 MB	Q4_K_M	gguf
tinyllama:1.1b                	llama	1B	   637 MB	Q4_0	gguf
gemma2:2b                     	gemma2	2.6B	  1629 MB	Q4_0	gguf
glm-4.6:cloud                 	glm4	355B	     0 MB	FP8	
codestral:latest              	llama	22.2B	 12569 MB	Q4_0	gguf
mathstral:latest              	llama	7.2B	  4113 MB	Q4_0	gguf
mistral:7b                    	llama	7.2B	  4372 MB	Q4_K_M	gguf
falcon3:1b                    	llama	1.7B	  1778 MB	Q8_0	gguf
gemma3:270m                   	gemma3	268.10M	   291 MB	Q8_0	gguf
qwen:latest                   	qwen2	4B	  2330 MB	Q4_0	gguf


## 2.2 Connect to your Ollama local LLM server

Now make a one-shot request to the smallest LLM, `gemma3:270m`

In [37]:
%%time
response = ollama.generate(model='gemma3:270m', prompt='Tell me a one paragraph story about a chicken')

CPU times: user 1.72 ms, sys: 4.33 ms, total: 6.05 ms
Wall time: 995 ms


If that didn't work for you, make sure Ollama is running.  There are two ways to do this:

* Desktop native app -- search *Start* (Windows) or *CMD-SPACE* (MacOS) for "Ollama" and make sure it is running
* From the command line:

```bash
ollama start
```

The latter has the advantage that you can see incoming requests.

In [40]:
response.response

A lonely chicken, with a feathered heart, found its way to a cozy coop. Its journey was filled with foraging for food, cleaning the coop, and protecting its young from predators. Despite the hardships, the chicken persevered, its strong legs and unwavering spirit proving its resilience.



### EXERCISE: Experiment with different models & one-shot queries
*(5 minutes)*

Notes:
* Start with the smallest model and then increment in parameter size
* Use *"Task Manager"* (Windows) or *"Activity Monitor"* (MacOS) to see how much CPU and RAM Ollama is using
* Try the same prompt more than once with the same model to get a sense of intra-model variability
* Try the same prompt more than once with different models to get a sense of inter-model variability

If the model outputs Markdown, you can display it in a Jupyter notebook with:

```python
from IPython.display import display, Markdown

display(Markdown(response.response))
```

Outside of Jupyter notebook you'll need something like [`python-markdown`](https://python-markdown.github.io/) to convert Markdown text to HTML.

There is a helper function `printmd()` below that you can use to directly display generated Markdown in Jupyter notebook.

In [41]:
%%time
response = ollama.generate(model='qwen3:4b', prompt='What are some of the current geo-political issues?')

CPU times: user 6.86 ms, sys: 18.8 ms, total: 25.7 ms
Wall time: 1min 15s


In [95]:
from IPython.display import display, Markdown, Latex

def printmd(text:str) -> None:
    ''' Jupyter-only print function for markdown text '''
    display(Markdown(text))

In [43]:
printmd(response.response)

Here are some of the most **significant, active, and interconnected geo-political issues** as of late 2023/early 2024, based on current global events, expert analysis, and real-world impact. I've prioritized issues with high stakes, ongoing volatility, and direct relevance to global stability:

---

### 1. **The Russia-Ukraine War (Ongoing)**  
   - **Why it matters**: The largest active conflict since WWII, with profound implications for global security, energy, food, and finance.  
   - **Current dynamics**:  
     - Russia's continued invasion (2022‚Äìpresent), including attacks on infrastructure in Ukraine.  
     - Western sanctions on Russia (e.g., oil/gas price caps, banking isolation) and Russia's response (e.g., energy exports to China, hybrid warfare).  
     - Ukraine's push for NATO integration and European security reform.  
     - **Humanitarian impact**: Over 10 million displaced, massive food insecurity in Ukraine, and global food price volatility.  
   - *Why it's critical*: A direct test of Western unity, energy security, and the future of European stability.

---

### 2. **The Israel-Hamas War (Ongoing)**  
   - **Why it matters**: A rapidly escalating conflict with global implications for Middle Eastern security, refugees, and regional power dynamics.  
   - **Current dynamics**:  
     - Hamas' October 7, 2023 attack on Israel (resulting in ~1,200+ deaths) and Israel's military response in Gaza.  
     - **Gaza humanitarian crisis**: Over 40,000+ deaths, widespread destruction, and severe shortages of water, food, and medical supplies.  
     - Regional tensions: Escalating conflicts in Lebanon (Hezbollah vs. Israel), Jordan, and the broader Middle East.  
     - U.S. mediation efforts and global calls for ceasefire.  
   - *Why it's critical*: Threatens regional stability, displaces millions, and risks a broader conflict in the Middle East.

---

### 3. **U.S.-China Strategic Competition (Intensifying)**  
   - **Why it matters**: The defining global power struggle of the 21st century, impacting technology, trade, climate, and security.  
   - **Current dynamics**:  
     - **Tech war**: U.S. restrictions on Chinese tech (e.g., semiconductor exports, AI), China's push for "self-reliance" (e.g., semiconductor production).  
     - **Trade tensions**: Tariffs, supply chain shifts (e.g., U.S. moving manufacturing to Asia), and China's economic slowdown.  
     - **Military competition**: U.S. naval presence in the South China Sea, China's assertive claims in the South China Sea and Taiwan Strait.  
     - **Climate & security**: Both nations competing for influence in climate policy and global security frameworks (e.g., nuclear proliferation).  
   - *Why it's critical*: Could reshape global trade, technology governance, and even climate action.

---

### 4. **Climate Change & Geopolitical Instability**  
   - **Why it matters**: Climate disasters are increasingly triggering migration, resource conflicts, and political upheaval.  
   - **Current dynamics**:  
     - **Extreme weather**: Record floods in Pakistan (2022), wildfires in Canada (2023), and heatwaves in Europe (2023‚Äì2024).  
     - **Resource conflicts**: Water scarcity in the Middle East (e.g., Jordan), competition over Arctic resources (oil/gas), and food security crises (e.g., Ukraine war disrupting grain exports).  
     - **Migration**: Climate-driven displacement (e.g., 30+ million displaced by climate disasters since 2015) straining regions like Europe and Africa.  
     - **Policy gaps**: Failure to implement the Paris Agreement effectively, with major economies lagging on emissions targets.  
   - *Why it's critical*: Climate change is now a **direct driver** of conflict, migration, and economic instability‚Äîmaking it inseparable from geo-politics.

---

### 5. **Global Economic Instability & Debt Crises**  
   - **Why it matters**: Economic fragility threatens global growth, inequality, and financial systems.  
   - **Current dynamics**:  
     - **Debt crises**: High debt levels in developing nations (e.g., Argentina, Ghana), exacerbated by climate disasters and the Ukraine war.  
     - **Inflation & recessions**: Persistent inflation in the U.S. (2022‚Äì2024), China's economic slowdown, and supply chain disruptions.  
     - **Currency wars**: Competition over reserves (e.g., U.S. dollar dominance vs. China's push for digital currencies).  
     - **Food/fuel prices**: Ukraine war disrupting grain exports (30% of global wheat), driving up food costs globally.  
   - *Why it's critical*: Economic instability fuels social unrest, migration, and geopolitical realignments (e.g., "de-dollarization" movements).

---

### 6. **The Rise of Non-State Actors & Hybrid Warfare**  
   - **Why it matters**: Groups like Hezbollah, Hamas, and terrorist networks are increasingly pivotal in conflicts.  
   - **Current dynamics**:  
     - **Hybrid warfare**: Russia's use of disinformation, cyberattacks, and proxy forces in Ukraine.  
     - **Regional conflicts**: Somalia (Al-Shabaab), Sudan (civil war), and Afghanistan (Taliban's return).  
     - **Tech-enabled warfare**: AI for targeting, drone strikes, and disinformation campaigns.  
   - *Why it's critical*: Weakens state-centric security models and shifts power to decentralized actors.

---

### Why These Issues Matter *Together*  
These problems are **interconnected**, not isolated:  
- The Ukraine war ‚Üí food price spikes ‚Üí global inflation ‚Üí economic instability.  
- U.S.-China tech competition ‚Üí supply chain shifts ‚Üí climate tech access ‚Üí economic inequality.  
- Climate disasters ‚Üí migration ‚Üí regional conflicts (e.g., Africa, Middle East) ‚Üí resource wars.  

> üí° **Key takeaway for the user**: The most urgent geo-political challenges today are **not just about "who wins"**, but about **how the world adapts to interlinked crises** (war, climate, economics). Solutions require cooperation‚Äînot competition‚Äîbetween nations, regions, and even within societies.

---

### Resources for Deeper Understanding:
- **For real-time updates**: [Reuters Geo-Politics](https://www.reuters.com/world/geopolitics), [Bloomberg Geo-Politics](https://www.bloomberg.com/news/topics/geopolitics)  
- **For analysis**: *The Economist*‚Äôs "Geo-Political Hotspots" section, [CIA World Factbook](https://www.cia.gov/the-world-factbook/) (for country-specific context).

If you're interested in a specific region, issue, or historical context (e.g., "How did the Ukraine war start?" or "What's the impact on developing countries?"), I can dive deeper!

## 2.3 Chat Sessions

In [66]:
from ollama import chat

class ChatSession:
    def __init__(self, model:str, system:str = 'You are a helpful chatbot'):
        self.model    = model
        self.system   = system
        self.messages = []

        self.messages.append(dict(role='system', content=system))

    def prompt(self, msg) -> str:
        self.messages.append(dict(role='user', content=msg))
        response = chat(model=self.model, messages=self.messages).message.content
        self.messages.append(dict(role='assistant', content=response))
        return response

In [96]:
cs = ChatSession(model='gemma2:2b', system='Please provide short and concise answers')

In [97]:
printmd(cs.prompt("I am thinking about a good gift for my mother"))

Here are some gift ideas for your mom, depending on her interests:

**Experiences:**

* **Concert/Show tickets** 
* **Spa day**
* **Cooking class** 
* **Weekend getaway**

**Personal Gifts:**

* **Personalized jewelry** (bracelet, necklace)
* **Photo album or scrapbook** with memories
* **Handmade gift basket** of her favorite treats
* **Subscription box** for something she enjoys (books, coffee, beauty products)

**Classic & Thoughtful:**

* **Flowers and a card** 
* **Donations to her charity** in her name
* **Homemade meal** she'll love


Let me know what your mom likes and I can suggest more specific ideas! üòä 


In [98]:
printmd(cs.prompt("I think she'd like a piece of jewelry. Do you have any recommendations?"))

To give better recommendations, tell me:

1. **What kind of metals does she prefer (gold, silver, platinum)?**
2. **Does she prefer simple or more intricate designs?**  (e.g., dainty chain necklace, bold statement ring) 
3. **Any particular gemstones she likes (emerald, diamond, etc.)?** 
4. **What's your budget range for the piece?**


Once I have these details, I can give you more tailored suggestions! üòä  


In [99]:
printmd(cs.prompt("I have a budget of $200, can you just make a suggestion?"))

Okay, here is a suggestion within your budget:

* **A beautiful sterling silver pendant with a birthstone or meaningful symbol engraved on it.** 
    * You could choose a crescent moon for peace/new beginnings, a heart for love, or an infinity symbol for eternity.  
    * This combines style and personalization at an appealing price point.

Let me know if you'd like to brainstorm further! üéÅ 


### Record of interaction

Our `ChatSession` object has retained a record of the interaction in the `.messages` list attribute.

Depending on your objectives you may need to be logging details of chat sessions, including:

* model
* input
* output
* performance



In [100]:
cs.messages

[{'role': 'system', 'content': 'Please provide short and concise answers'},
 {'role': 'user', 'content': 'I am thinking about a good gift for my mother'},
 {'role': 'assistant',
  'content': "Here are some gift ideas for your mom, depending on her interests:\n\n**Experiences:**\n\n* **Concert/Show tickets** \n* **Spa day**\n* **Cooking class** \n* **Weekend getaway**\n\n**Personal Gifts:**\n\n* **Personalized jewelry** (bracelet, necklace)\n* **Photo album or scrapbook** with memories\n* **Handmade gift basket** of her favorite treats\n* **Subscription box** for something she enjoys (books, coffee, beauty products)\n\n**Classic & Thoughtful:**\n\n* **Flowers and a card** \n* **Donations to her charity** in her name\n* **Homemade meal** she'll love\n\n\nLet me know what your mom likes and I can suggest more specific ideas! üòä \n"},
 {'role': 'user',
  'content': "I think she'd like a piece of jewelry. Do you have any recommendations?"},
 {'role': 'assistant',
  'content': "To give bet

### EXERCISE: Experiment with a chat session

## 2.4 Creating Your Own Models

Ollama provides a number of ways to create your own model, from any of these sources:

* your local Ollama model repository
* the global/public Ollama model repository
* GGUF files you have locally

This allows you to create model variants to meet your specific needs.  We'll experiment more with this later, but we can start with some basic examples of ephemeral models (i.e. ones that only exist in memory) which use *system prompts* as the basis for creating a model variant.

More details on Ollama's `create` API can be found [here](https://github.com/ollama/ollama/blob/main/docs/api.md#create-a-model).

### Note on Prompt Engineering

Prompt engineering is a critical skill for successfully interacting with LLMs.  Details of how to do this well are out of scope for this tutorial, but as a minimum it is important to understand that *system prompts* provide a universal context for all chat messages within a session.  The model will always consider the system prompt when constructing a response.

In [122]:
%%time
ollama.create(model='mario', from_='qwen3:4b', system="You are Mario from Super Mario Bros.")

CPU times: user 2.78 ms, sys: 6.29 ms, total: 9.08 ms
Wall time: 87.4 ms


ProgressResponse(status='success', completed=None, total=None, digest=None)

In [111]:
%%time
mario = ollama.generate(model='mario', prompt='What is on your mind today?')
printmd(mario.response)

CPU times: user 5.88 ms, sys: 11.3 ms, total: 17.2 ms
Wall time: 1min 7s


## 1.x Model and Response Objects

Let's take a quick look at the properties of an Ollama `model` object (this will help us in the next step)

**Tangent:** how do you investigate/discover properties of objects?

* t...
* d...
* h...
* g...
* c...

In [23]:
m_properties = set()
for p in dir(m):
    m_properties.add(p)
    if not p.startswith('_'):
        print(f'{p:<30}{type(getattr(m,p)).__name__}')

construct                     method
copy                          method
details                       ModelDetails
dict                          method
digest                        str
from_orm                      method
get                           method
json                          method
model                         str
model_computed_fields         dict
model_config                  dict
model_construct               method
model_copy                    method
model_dump                    method
model_dump_json               method
model_extra                   NoneType
model_fields                  dict
model_fields_set              set
model_json_schema             method
model_parametrized_name       method
model_post_init               method
model_rebuild                 method
model_validate                method
model_validate_json           method
model_validate_strings        method
modified_at                   datetime
parse_file                    method
parse_

/var/folders/xb/w88hbd6d14gfdjbk3j4lj_nc0000gn/T/ipykernel_42462/2218771710.py:5: PydanticDeprecatedSince211: Accessing the 'model_computed_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.
  print(f'{p:<30}{type(getattr(m,p)).__name__}')
/var/folders/xb/w88hbd6d14gfdjbk3j4lj_nc0000gn/T/ipykernel_42462/2218771710.py:5: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.
  print(f'{p:<30}{type(getattr(m,p)).__name__}')


Repeat for `m.details`, only looking at the new properties:

In [24]:
for p in dir(m.details):
    if not p.startswith('_') and not p in m_properties:
        print(f'{p:<30}{type(getattr(m.details,p)).__name__}')

families                      list
family                        str
format                        str
parameter_size                str
parent_model                  str
quantization_level            str


In [38]:
type(response)

ollama._types.GenerateResponse

In [39]:
for p in dir(response):
    if not p.startswith('_') and not p in m_properties:
        print(f'{p:<30}{type(getattr(response,p)).__name__}')

context                       list
created_at                    str
done                          bool
done_reason                   str
eval_count                    int
eval_duration                 int
load_duration                 int
logprobs                      NoneType
prompt_eval_count             int
prompt_eval_duration          int
response                      str
thinking                      NoneType
total_duration                int
