# ollama Colab demo

Version 1: using `xterm`

## Step 1: runtime

Ensure your Colab runtime is "T4 GPU" through the _Runtime_ => _Change Runtime_ menu.

After that, execute the next cells.

## Step 2: install xterm

Run the following to install and load the xterm extension for Colab:

In [None]:
!pip install -q colab-xterm
%load_ext colabxterm

## Step 3: start xterm

This will remain open and provide a terminal into the container running the Colab. You will type commands in this window

1. Be aware of a substantial typing latency and be patient when entering commands
2. Do not mind the mingled formatting for progress bars, etc.

In [None]:
%xterm

## Step 4: install and set up ollama

Bring the focus into the xterm window above and, without fretting, type the following three commands. Wait for each one to complete before launching the next one.

```
# Command 1 to install ollama:
curl https://ollama.ai/install.sh | sh

# Command 2 to start ollama (in background. Type Enter after a little while to regain the xterm prompt):
ollama serve &

# Command 3 to retrieve a LLM:
ollama pull llama3.2

# You should see "success" on the last line of output by ollama.
```

## Step 5: use ollama from other cells

You can launch ollama commands with the usual notebook bang-notation:

In [None]:
!ollama list

Now to control ollama from within Python code you need e.g. this package (this is for Langchain)

In [None]:
%pip install -q -U langchain-ollama

A small example to demonstrate. First a little setup:

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM

template = """Question: {question}

Answer as concisely as possible.

Answer:"""

prompt = ChatPromptTemplate.from_template(template)

model = OllamaLLM(model="llama3.2")

chain = prompt | model

Now you can run the model:

In [None]:
chain.invoke({"question": "Are the anamorphs of Ascomycota all mapped to their holomorphs as of today? And if not, what would be the best technique to achieve that?"})

Behind the scenes, what happens is localhost HTTP requests with ollama. Observe:

_(note: this request turns off streaming responses for demonstration purposes, as outlined in [the docs](https://github.com/ollama/ollama/blob/main/docs/api.md#request-no-streaming).)_

In [None]:
import json
import requests

body = {
    "model": "llama3.2",
    "prompt": "Is Puccinia graminis an obligate parasite of Poaceae?",
    "stream": False,
}

response = requests.post("http://localhost:11434/api/generate", json=body)
resp_json = response.json()

resp_short = {k: f"{str(v)[:30]} ..." for k, v in resp_json.items()}
print("Response (shortened):")
print(json.dumps(resp_short, indent=2))

print("\nAnswer:")
print(resp_json["response"])

You can check that each request is being logged in the xterm panel above.