# OLLAMA server on google COLAB

This project servers as an Ollama server on google Colab and utilizes ngrok to expose the endpoint.


## Step 1: Installing ollama
It provides a simple API for creating, running, and managing models, as well as a library of pre-built models

https://ollama.com/

In [1]:
!curl https://ollama.ai/install.sh | sh

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13269    0 13269    0     0  49013      0 --:--:-- --:--:-- --:--:-- 49144
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
############################################################################################# 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


Running Ollama
In order to use Ollama it needs to run as a service in background parallel to your scripts. Becasue Jupyter Notebooks is built to run code blocks in sequence this make it difficult to run two blocks at the same time. As a workaround we will create a service using subprocess in Python so it doesn't block any cell from running.

Service can be started by command ollama serve.

time.sleep(5) adds some delay to get the Ollama service up before downloading the model.

In [12]:
import threading
import subprocess
import time

def run_ollama_serve():
  subprocess.Popen(["ollama", "serve"])

thread = threading.Thread(target=run_ollama_serve)
thread.start()
time.sleep(5)

In [13]:
!ollama pull llama3.3

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
pulling 4824460d29f2... 100% ▕▏  42 GB                         
pulling 948af2743fc7... 100% ▕▏ 1.5 KB                         
pulling bc371a43ce90... 100% ▕▏ 7.6 KB                         
pulling 53a87df39647... 100% ▕▏ 5.6 KB                         
pulling 56bb8bd477a5... 100% ▕▏   96 B                         
pulling c7091aa45e9b... 100% ▕▏  562 B                         
verifying sha256 digest ⠏ [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 4824460d29f2... 100% ▕▏  42 GB                         
pulling 948af2743fc7... 100% ▕▏ 1.5 KB                         
pulling bc371a43ce90... 100% ▕▏ 7.6 KB                         
pulling 53a87df39647... 100% ▕▏ 5.6 KB                         
pulling 56bb8bd477a5... 100% ▕▏   96 B                         
pulling c7091aa45e9b... 100% ▕▏  562 B                         
verifying sha256 digest ⠋ 

In [13]:
!ollama pull qwen2 && !ollama pull qwen2 && !ollama pull nomic-embed-text

## Step 2: Installing ngrok

### Download:

In [2]:
!wget https://bin.equinox.io/c/bNyj1mQVY4c/ngrok-v3-stable-linux-amd64.tgz

--2024-12-30 17:51:35--  https://bin.equinox.io/c/bNyj1mQVY4c/ngrok-v3-stable-linux-amd64.tgz
Resolving bin.equinox.io (bin.equinox.io)... 35.71.179.82, 75.2.60.68, 99.83.220.108, ...
Connecting to bin.equinox.io (bin.equinox.io)|35.71.179.82|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14809556 (14M) [application/octet-stream]
Saving to: ‘ngrok-v3-stable-linux-amd64.tgz’


2024-12-30 17:51:37 (16.8 MB/s) - ‘ngrok-v3-stable-linux-amd64.tgz’ saved [14809556/14809556]



### Extract the bin file:

In [3]:
!tar xvzf ngrok-v3-stable-linux-amd64.tgz ngrok

ngrok


### Set the auth token

a ngrok auth token must be aquired in order to use it.  

Follow the instruction below:  
https://ngrok.com/docs/getting-started/  

Then replace your AuthToken with `<NGROK_AUTH_TOKEN>` below:


In [4]:
!./ngrok authtoken 1XeNRJzaDz4FHQQccNdtZQl1cF6_4d2j96ASTM9ni7JXjNEJS

Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


## Final Step: Start ollama, pull llama model and stablish ngrok token

After pulling the selected model (in this example llama3) an endpoint will be created which can be used to access to ollama api.   
**You can also find the endpoint inside your ngrok panel: https://dashboard.ngrok.com/cloud-edge/endpoints**


In [6]:
!./ngrok http 11434 --host-header="localhost:11434" --log stdout

Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAID/hCnYCllFbLMAj86OsTRbPw65VX2d0szEcpyHvhXPt

2024/12/30 17:53:19 routes.go:1259: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* 