<img align="right" width="400" src="https://drive.google.com/thumbnail?id=1rPeHEqFWHJcauZlU82a4hXM10TUjmHxM&sz=s4000" alt="FHNW Logo">

# Provide Access to LLMs

## Summary
The aim of this notebook is to provide access to LLMs through [Ollama](https://ollama.com/).

In case you use Google Colab as your programming environment, it is possible to install Ollama within your Colab notebook. Alternatively, you can also install Ollama within a dedicated Colab notebook and access this Ollama server from other Colab notebooks (as it is cumbersome to reinstall Ollama for each individual notebook). Connecting from remote is only allowed, if you are a paid Colab user (see [terms of service](https://research.google.com/colaboratory/faq.html#disallowed-activities)). Therefore, it is highly recommended to buy a [Colab Pro](https://colab.research.google.com/signup) license as otherwise your free quota will be consumed rather quickly.


## Links
- [Buy a Colab License](https://colab.research.google.com/signup)
- [Ollama](https://ollama.com/)
- [Install Ollama](https://ollama.com/download)
- [Ollama Models](https://ollama.com/search)
- [Ollama with Docker](https://hub.docker.com/r/ollama/ollama)
- [ollama-remote](https://github.com/amitness/ollama-remote)

## Side Note
This notebook builds on [Ollama](https://ollama.com/). There are alternatives:
  - [vLLM](https://github.com/vllm-project/vllm) 
  - [llama.cpp](https://github.com/ggerganov/llama.cpp)
  - [OpenLLM](https://github.com/bentoml/OpenLLM)
  - ...

Ollama has several [Community Integrations](https://github.com/ollama/ollama?tab=readme-ov-file#community-integrations) like Web & Desktop UIs, terminals etc.

This notebook contains assigments: <font color='red'>Questions are written in red.</font>

<a href="https://colab.research.google.com/github/markif/2025_HS_Advanced_NLP_LAB/blob/master/03_a_Provide_Access_to_LLMs.ipynb">
  <img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## Ollama

Ollama is a free and open-source project that lets you run various open source LLMs locally (or on a server with GPUs). Think of it like Docker. With Docker, you download various images from a central repository and run them in a container. With Ollama, you download various open source LLMs and run them in your terminal.

### Setup Ollama

We are going to install Ollama from within this notebook. This makes sense in the context of this class (since it allows to utilize the resources provided by Collab) but for your project you probably want to use your own hardware.

In case you have access to hardware with GPU that you can access from your programming environment, it is possible to install [Ollama](https://ollama.com/download) on that hardware. The simplest option is to use [Ollama's Docker container](https://hub.docker.com/r/ollama/ollama). Please follow the [installation instructions](https://hub.docker.com/r/ollama/ollama) for Nvidia GPU or AMD GPU and then start Ollama

```bash
# Nvidia
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
# AMD
docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm

# install LLM's
docker exec -it ollama ollama pull qwen2.5vl:3b
docker exec -it ollama ollama pull qwen3:4b
docker exec -it ollama ollama pull gemma3:4b
docker exec -it ollama ollama pull phi4-mini
docker exec -it ollama ollama pull deepseek-r1:7b
docker exec -it ollama ollama pull nomic-embed-text
```

Run this code from <b>Google Colab</b>. It will install and run Ollama.

**Make sure that a GPU is available (see [here](https://www.tutorialspoint.com/google_colab/google_colab_using_free_gpu.htm))!!!**

In [1]:
%%capture

# install dependencies
!sudo apt update && sudo apt install pciutils lshw wget

In [2]:
# download and install Ollama

!curl -fsSL https://ollama.com/install.sh | sh

>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> NVIDIA GPU installed.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [3]:
# start Ollama

import subprocess
process = subprocess.Popen(["ollama", "serve"])

#!nohup ollama serve > ollama.log 2>&1 &

In [4]:
import time

# wait for Ollama
time.sleep(10)

By default it is not possible to access a running service (like e.g. Ollama) from outside a Google Colab notebook. Following package will open a tunnel to Cloudflare and provide access to your running Ollama instance running within this notebook through a reverse proxy.

Cloudflare Tunnel is a combination of a tunnel and a reverse proxy. It establishes a permanent tunnel that starts locally and ends in the Cloudflare cloud. This tunnel is used to route traffic into and out of your network without having to drill holes in the firewall. Cloudflare then acts as a reverse proxy, forwarding traffic to your backend through the established tunnel.

In [5]:
%%capture

!pip install ollama-remote

In [6]:
# check if we are not on google colab
if 'google.colab' not in str(get_ipython()):
    import os
    # trick to allow for background processes
    get_ipython().system = os.system
    # download and install cloudflared
    !curl --location --output cloudflared.deb https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb && dpkg -i cloudflared.deb

# run ollama-remote
!nohup ollama-remote > ollama-remote.log 2>&1 &

In [7]:
import time

# wait for ollama-remote
time.sleep(10)

In [8]:
# show log messages

!cat ollama-remote.log

Installing Cloudflared...
Error: listen tcp 127.0.0.1:11434: bind: address already in use
Setup is complete.
# Commands:
---------------------------------------------

export 
OLLAMA_HOST='https://occasion-caroline-deborah-centuries.trycloudflare.com'


ollama run phi3:mini --verbose


# Use below code to access via OpenAI SDK:
-------------------------------------------

from openai import OpenAI

client = OpenAI(
    base_url="https://occasion-caroline-deborah-centuries.trycloudflare.com/v1/"
,
    api_key="ollama",
)

response = client.chat.completions.create(
    model="phi3:mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)
print(response.choices[0].message.content)


Copy the URL stored in `OLLAMA_HOST` from the above printout (at time of execution this was `https://occasion-caroline-deborah-centuries.trycloudflare.com` - be aware that this URL will change with every new execution).

You will need this URL for your other notebooks (in case you do not see such an URL, re-execute the cell above).

<font color='red'>**TASK: Go to [Ollama Models](https://ollama.com/search) and select a text and a multimodal (a model that can handle text and images) model you want to use. In order to get detail information about a model, you can also access Huggingface's model cards (e.g. for [Qwen3](https://huggingface.co/Qwen/Qwen3-4B)) about your model of interest.**</font>

Install the LLMs you want to use...

In [9]:
!ollama pull qwen2.5vl:3b #multimodal
!ollama pull qwen3:4b #text/reasoning
#!ollama pull gemma3:4b #multimodal
#!ollama pull phi4-mini #text
#!ollama pull deepseek-r1:7b #text/reasoning
#!ollama pull nomic-embed-text #embedding
#!ollama pull llama3.2-vision #multimodal

# all at once
#!echo qwen2.5vl:3b qwen3:4b gemma3:4b phi4-mini deepseek-r1:7b nomic-embed-text | xargs -n1 ollama pull

# all at once when you use docker
#docker exec -it ollama sh -c "echo qwen2.5vl:3b qwen3:4b gemma3:4b phi4-mini deepseek-r1:7b nomic-embed-text | xargs -n1 ollama pull"

[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?2