# Running Ollama with Ngrok in Google Colab

This notebook sets up an Ollama server with Ngrok tunneling in Google Colab, allowing you to access your Ollama instance from anywhere.

## Setup Instructions

First, you'll need to configure your ngrok authentication token:

1. Go to https://ngrok.com and sign up for a free account
2. After signing in, go to https://dashboard.ngrok.com/get-started/your-authtoken
3. Copy your authtoken
4. In Google Colab:
   - Click on the key icon in the left sidebar to open "Secrets"
   - Click "Add new secret"
   - Set "Name" as: authtoken
   - Set "Value" as: your-ngrok-token-here
   - Click "Add"
5. Make sure to use a GPU runtime (Runtime -> Change runtime type -> GPU)

## 1. Install Required Packages


In [4]:
# Install Ollama
!curl https://ollama.ai/install.sh | sh

# Install CUDA drivers
!echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections
!sudo apt-get update && sudo apt-get install -y cuda-drivers

# Install pyngrok
!pip install pyngrok


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13281    0 13281    0     0  55376      0 --:--:-- --:--:-- --:--:-- 55569
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
Hit:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Get:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:3 https://cli.github.com/packages stable InRelease [3,917 B]
Hit:4 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:5 https://ppa.launchpadc

## 2. Import Dependencies and Configure Environment


In [5]:
from pyngrok import ngrok
from google.colab import userdata
import os
import subprocess
import threading
import time

# Verify ngrok token exists
try:
    token = userdata.get('authtoken')
    if not token:
        raise ValueError("No authtoken found in Colab secrets")
    ngrok.set_auth_token(token)
except Exception as e:
    print("ERROR: Could not find ngrok authtoken in Colab secrets!")
    print("Please follow the instructions at the top of this notebook to set up your ngrok authtoken")
    print("Then restart the runtime and run again")
    raise e

# Set LD_LIBRARY_PATH for NVIDIA library
os.environ.update({'LD_LIBRARY_PATH': '/usr/lib64-nvidia'})




## 3. Helper Functions


In [6]:
def run_and_print_output(process):
    """Helper function to continuously read and print process output"""
    while True:
        output = process.stdout.readline()
        if output == '' and process.poll() is not None:
            break
        if output:
            print(output.strip())


## 4. Start Ollama Server


In [7]:
# Start ollama serve
print('>>> starting ollama serve')
ollama_serve = subprocess.Popen(
    ['ollama', 'serve'],
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    universal_newlines=True
)

# Start thread to print ollama serve output
serve_thread = threading.Thread(target=run_and_print_output, args=(ollama_serve,))
serve_thread.daemon = True
serve_thread.start()

# Give ollama serve a moment to start up
time.sleep(5)


>>> starting ollama serve
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOjuDi15m/U5pvYxugI93evrSxLejT0IKpB+XexbZ656

time=2025-09-24T08:06:24.224Z level=INFO source=routes.go:1475 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127

## 5. Pull Ollama Model


In [8]:
# Pull the model
print('>>> starting ollama pull qwen2.5-coder:32b')
# ****************************************************
# Change the model name to the one you want to pull
# See model library on Ollama website
# ****************************************************
pull_process = subprocess.Popen(
    ['ollama', 'pull', 'qwen2.5-coder:32b'],
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    universal_newlines=True
)

# Print pull process output
while True:
    output = pull_process.stdout.readline()
    if output == '' and pull_process.poll() is not None:
        break
    if output:
        print(output.strip())


[1;30;43m流式输出内容被截断，只能显示最后 5000 行内容。[0m
pulling 66b9ea09bd5b: 100% ▕██████████████████▏   68 B                         [K
pulling 1e65450c3067: 100% ▕██████████████████▏ 1.6 KB                         [K
pulling 832dd9e00a68: 100% ▕██████████████████▏  11 KB                         [K
pulling f0676bd3c336: 100% ▕██████████████████▏  488 B                         [K
verifying sha256 digest ⠼ [K[?25h[?2026l[?2026h[?25l[A[A[A[A[A[A[1Gpulling manifest [K
pulling ac3d1ba8aa77: 100% ▕██████████████████▏  19 GB                         [K
pulling 66b9ea09bd5b: 100% ▕██████████████████▏   68 B                         [K
pulling 1e65450c3067: 100% ▕██████████████████▏ 1.6 KB                         [K
pulling 832dd9e00a68: 100% ▕██████████████████▏  11 KB                         [K
pulling f0676bd3c336: 100% ▕██████████████████▏  488 B                         [K
verifying sha256 digest ⠴ [K[?25h[?2026l[?2026h[?25l[A[A[A[A[A[A[1Gpulling manifest [K
pulling ac3d

## 6. Start Ngrok Tunnel


In [11]:
# Start ngrok
print('>>> starting ngrok http server')
ngrok_process = subprocess.Popen(
    ['ngrok', 'http', '--log', 'stderr', '11434', '--host-header=localhost:11434'],
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    universal_newlines=True
)

# Start thread to print ngrok output
ngrok_thread = threading.Thread(target=run_and_print_output, args=(ngrok_process,))
ngrok_thread.daemon = True
ngrok_thread.start()


>>> starting ngrok http server


In [15]:


ngrok.ngrok_token = '336Cf8R7vDmgOBQQWBtPuGcd98c_2SMLk1tDHTHZdVJMT4ebS'  # 设置你的 ngrok 认证令牌
tunnel = ngrok.connect(4040)  # 启动隧道，端口号为8000
print(tunnel.public_url)  # 打印公共URL

https://pearlie-conjoined-swashingly.ngrok-free.dev


## 7. Get Public URL and Keep Server Running


In [None]:
# Get and display the public URL
time.sleep(5)  # Wait for ngrok to start
try:
    tunnels = ngrok.get_tunnels()
    if tunnels:
        print("\n=== Your Ollama server is available at ===")
        print(tunnels[0].public_url)
        print("=====================================")
    else:
        print("No active ngrok tunnels found")
except Exception as e:
    print("Error getting ngrok URL:", e)

# Keep the main process running
try:
    while True:
        time.sleep(1)
except KeyboardInterrupt:
    print("Shutting down...")
    ollama_serve.terminate()
    ngrok_process.terminate()



=== Your Ollama server is available at ===
https://pearlie-conjoined-swashingly.ngrok-free.dev


## Usage Instructions

1. Run all cells in order
2. Wait for the model to download (this may take a while)
3. Once complete, you'll see a public URL where your Ollama server is accessible
4. You can now use this URL to connect to your Ollama instance from anywhere

Note: The server will keep running until you stop the notebook execution or disconnect from the Colab runtime.
