# Ollama on Google Colab with Google Drive + Public Access

Run Ollama AI models in Google Colab with persistent storage via Google Drive and public web access via ngrok.

## üöÄ Quick Start

1. **Mount Google Drive** (for persistent model storage)
2. **Install Ollama** in the Colab VM
3. **Configure model directory** to use Drive
4. **Download models** (stored in Drive for reuse)
5. **Start Ollama server**
6. **Expose publicly** with ngrok tunnel

## ‚ö†Ô∏è Limitations

- **Session timeout**: ~90 min idle, ~12 hours max (free Colab)
- **Resource constraints**: Limited RAM/CPU - big models may not fit
- **Not production-safe**: Unpredictable uptime and latency
- **Treat as demo/sandbox**: Great for testing, not production use

## üìã Prerequisites

- Google account with Colab access
- Google Drive space for models (GGUF files can be large)
- ngrok account (free tier available)

---

## 1. Mount Google Drive

Mount your Google Drive to store Ollama models persistently. This prevents re-downloading large model files every session.

In [15]:
# Mount Google Drive
from google.colab import drive

drive.mount("/content/drive")

# Create Ollama directory in Drive (if it doesn't exist)
import os

ollama_drive_path = "/content/drive/MyDrive/ollama"
os.makedirs(ollama_drive_path, exist_ok=True)
os.makedirs(f"{ollama_drive_path}/models", exist_ok=True)

print(f"‚úÖ Google Drive mounted and Ollama directory created at: {ollama_drive_path}")
print(f"üìÅ Models will be stored in: {ollama_drive_path}/models")


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
‚úÖ Google Drive mounted and Ollama directory created at: /content/drive/MyDrive/ollama
üìÅ Models will be stored in: /content/drive/MyDrive/ollama/models


## 2. Install Ollama

Install Ollama in the Colab environment. We'll use the official installation script.

In [16]:
# Install Ollama
!curl -fsSL https://ollama.ai/install.sh | sh

# Verify installation
!ollama --version

print("‚úÖ Ollama installed successfully!")


>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
ollama version is 0.13.0
‚úÖ Ollama installed successfully!


## 3. Configure Ollama to Use Google Drive

Set up Ollama to use your Google Drive directory for model storage instead of the default Colab location.

In [17]:
# Set environment variables to use Drive for Ollama data
import os

ollama_drive_path = "/content/drive/MyDrive/ollama"

# Set OLLAMA_MODELS to point to Drive
os.environ["OLLAMA_MODELS"] = f"{ollama_drive_path}/models"

# Create symlink from default location to Drive (for compatibility)
default_models_path = "/root/.ollama/models"
os.makedirs("/root/.ollama", exist_ok=True)

# Remove existing symlink/directory if it exists
if os.path.exists(default_models_path):
    if os.path.islink(default_models_path):
        os.unlink(default_models_path)
    else:
        import shutil

        shutil.rmtree(default_models_path)

# Create symlink to Drive
os.symlink(f"{ollama_drive_path}/models", default_models_path)

print(f"‚úÖ Ollama configured to use Google Drive: {ollama_drive_path}/models")
print(f"üìÅ Symlink created: {default_models_path} -> {ollama_drive_path}/models")


‚úÖ Ollama configured to use Google Drive: /content/drive/MyDrive/ollama/models
üìÅ Symlink created: /root/.ollama/models -> /content/drive/MyDrive/ollama/models


## 4. Download Models

Download your desired models. Since they're stored in Google Drive, they'll persist across sessions.

**Popular lightweight models for Colab:**
- `qwen2.5:3b` - Fast, capable 3B parameter model
- `deepseek-coder:1.3b` - Code-focused model
- `gemma:2b` - Google's lightweight model
- `phi3:3.8b` - Microsoft's efficient model

**Note:** Large models (>7B) may not fit in Colab's RAM/CPU constraints.

In [18]:
# Choose your models (uncomment the ones you want)
models_to_download = [
    "qwen2.5:3b",  # Fast, capable general model
    "deepseek-coder:1.3b",  # Code-focused model
    "gemma:2b",  # Google's lightweight model
    # "phi3:3.8b",       # Microsoft's efficient model (uncomment if needed)
    # "llama3:8b",       # Large model - may not fit in Colab (uncomment if you have Pro)
]

print("üöÄ Starting model downloads...")
print(f"üìÅ Models will be saved to Google Drive: {ollama_drive_path}/models")
print()

for model in models_to_download:
    print(f"‚¨áÔ∏è Downloading {model}...")
    !ollama pull {model}
    print(f"‚úÖ {model} downloaded successfully!")
    print()

print("üéâ All models downloaded!")
print("üìã Available models:")
!ollama list


üöÄ Starting model downloads...
üìÅ Models will be saved to Google Drive: /content/drive/MyDrive/ollama/models

‚¨áÔ∏è Downloading qwen2.5:3b...
[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[A[A[A[A[A[A[A[1G[?25h[?2026l
‚úÖ qwen2.5:3b downloaded successfully!

‚¨áÔ∏è Downloading deepseek-coder:1.3b...
[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?

## 5. Start Ollama Server

Start the Ollama server in the background. It will run on localhost:11434.

In [25]:
# Start Ollama server in background
import subprocess
import time
import os

# Set environment variables to allow public access via ngrok
print("‚öôÔ∏è Configuring environment variables for public access...")
os.environ["OLLAMA_HOST"] = "0.0.0.0:11434"
os.environ["OLLAMA_ORIGINS"] = "*"

print("üîÑ Stopping any existing Ollama instances...")
!pkill ollama
time.sleep(2)

print("üöÄ Starting Ollama server...")

# Start server in background
process = subprocess.Popen(
    ["ollama", "serve"], stdout=subprocess.PIPE, stderr=subprocess.PIPE
)

# Wait a moment for server to start
time.sleep(5)

# Check if server is running
try:
    result = subprocess.run(
        ["curl", "-s", "http://localhost:11434/api/tags"],
        capture_output=True,
        text=True,
        timeout=5,
    )
    if result.returncode == 0:
        print("‚úÖ Ollama server started successfully!")
        print("üåê Server running on: http://localhost:11434")
    else:
        print("‚ùå Server may not be responding yet. Please wait a few seconds and try the test cell.")
except Exception as e:
    print(f"‚ö†Ô∏è Error checking server status: {e}")

# Test with a simple model locally
print("\nüß™ Testing locally with a simple model...")
!ollama run qwen2.5:3b "Hello!" --format json

‚öôÔ∏è Configuring environment variables for public access...
üîÑ Stopping any existing Ollama instances...
üöÄ Starting Ollama server...
‚úÖ Ollama server started successfully!
üåê Server running on: http://localhost:11434

üß™ Testing locally with a simple model...
[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h

## 6. Set Up Public Access with ngrok

Expose your Ollama server to the internet using ngrok. This gives you a public URL that forwards to your Colab instance.

**Requirements:**
- Free ngrok account (sign up at ngrok.com)
- ngrok auth token

In [22]:
# Install ngrok
!pip install pyngrok

from pyngrok import ngrok
import os

# Set your ngrok auth token (replace with your actual token)
# Get your token from: https://dashboard.ngrok.com/get-started/your-authtoken
NGROK_AUTH_TOKEN = "367SkOQHlBFw8AG1TsVNI0L9y46_3WJdDXhwNNLSJ1nn8JzCB"  # ‚ò¢ Replace with your token!

if not NGROK_AUTH_TOKEN or NGROK_AUTH_TOKEN == "YOUR_NGROK_AUTH_TOKEN_HERE":
    print("‚ùå Please set your ngrok auth token!")
    print("1. Sign up at https://ngrok.com")
    print("2. Get your auth token from https://dashboard.ngrok.com/get-started/your-authtoken")
    print("3. Replace the token value above with your actual token")
else:
    # Authenticate ngrok
    ngrok.set_auth_token(NGROK_AUTH_TOKEN)

    # Start tunnel to Ollama port
    print("üåê Starting ngrok tunnel to Ollama server...")
    tunnel = ngrok.connect(11434, "http")

    public_url = tunnel.public_url
    print(f"‚úÖ Public URL: {public_url}")
    print(f"üåç Your Ollama API is now accessible at: {public_url}/v1/chat/completions")
    print()
    print("üìã Example API call:")
    print(f"""curl -X POST {public_url}/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{{\"model\": \"qwen2.5:3b\", \"messages\": [{{\"role\": \"user\", \"content\": \"Hello!\"}}]}}'""")

    # Keep tunnel alive
    print("\n‚ÖÜ Tunnel is active. Keep this cell running to maintain the connection.")
    print("‚ö†Ô∏è Colab sessions timeout, so this URL will only work while the session is active.")


üåê Starting ngrok tunnel to Ollama server...
‚úÖ Public URL: https://thomasena-auxochromic-joziah.ngrok-free.dev
üåç Your Ollama API is now accessible at: https://thomasena-auxochromic-joziah.ngrok-free.dev/v1/chat/completions

üìã Example API call:
curl -X POST https://thomasena-auxochromic-joziah.ngrok-free.dev/v1/chat/completions   -H 'Content-Type: application/json'   -d '{"model": "qwen2.5:3b", "messages": [{"role": "user", "content": "Hello!"}]}'

‚ÖÜ Tunnel is active. Keep this cell running to maintain the connection.
‚ö†Ô∏è Colab sessions timeout, so this URL will only work while the session is active.


## 7. Alternative: Pinggy Tunnel (if ngrok doesn't work)

If ngrok has issues, you can use Pinggy as an alternative tunneling service.

In [None]:
# Alternative: Pinggy tunnel
# Download and install Pinggy
!wget https://pinggy.io/cli/pinggy_0.1.1_linux_amd64.tar.gz
!tar -xzf pinggy_0.1.1_linux_amd64.tar.gz
!chmod +x pinggy

print("üåê Starting Pinggy tunnel...")
print("üìã Your public URL will be displayed below:")
print("‚ö†Ô∏è Keep this cell running to maintain the tunnel")
print()

# Start Pinggy tunnel (this will run indefinitely)
!./pinggy -p 11434 http


## 8. Test Your Public API

Test that your public Ollama API is working correctly.

In [26]:
# Test the public API
import requests
import json

# Get the public URL (replace with your actual ngrok URL)
# If using ngrok, copy the URL from the previous cell
# If using Pinggy, copy the URL from the Pinggy output

public_url = "https://thomasena-auxochromic-joziah.ngrok-free.dev"  # Replace with your actual URL

if public_url == "YOUR_PUBLIC_URL_HERE":
    print("‚ùå Please set your public URL!")
    print("Copy the URL from the ngrok or Pinggy output above.")
else:
    try:
        # Test models endpoint
        response = requests.get(f"{public_url}/api/tags", timeout=10)
        if response.status_code == 200:
            models = response.json()
            print("‚úÖ Public API is working!")
            print("üìã Available models:")
            for model in models.get("models", []):
                print(f"  ‚Ä¢ {model['name']}")
        else:
            print(f"‚ùå API returned status code: {response.status_code}")

        # Test chat completion
        print("\nüß™ Testing chat completion...")
        payload = {
            "model": "qwen2.5:3b",
            "messages": [{"role": "user", "content": "Say hello in 10 words or less."}],
            "stream": False,
        }
        response = requests.post(
            f"{public_url}/v1/chat/completions", json=payload, timeout=30
        )
        if response.status_code == 200:
            result = response.json()
            content = result["choices"][0]["message"]["content"]
            print(f"ü§ñ Response: {content}")
            print("‚úÖ Chat completion working!")
        else:
            print(f"‚ùå Chat completion failed: {response.status_code}")

    except Exception as e:
        print(f"‚ùå Test failed: {e}")
        print("üí° Make sure:")
        print("   1. Ollama server is running")
        print("   2. The tunnel is active")
        print("   3. You copied the correct public URL")

‚úÖ Public API is working!
üìã Available models:
  ‚Ä¢ gemma:2b
  ‚Ä¢ deepseek-coder:1.3b
  ‚Ä¢ qwen2.5:3b

üß™ Testing chat completion...
ü§ñ Response: Hello! How can I assist you today?
‚úÖ Chat completion working!


## 9. Integration with Goblin Assistant

Configure your Goblin Assistant backend to use this Colab-hosted Ollama instance.

In [28]:
# Configuration for Goblin Assistant
public_url = "https://thomasena-auxochromic-joziah.ngrok-free.dev"  # Replace with your actual URL

if public_url != "YOUR_PUBLIC_URL_HERE":
    print("üîß Goblin Assistant Configuration:")
    print("=" * 50)
    print()
    print("# Automatic setup using the integration script:")
    print(f"cd /path/to/ForgeMonorepo")
    print(f"python3 setup_colab_ollama_integration.py --colab-url {public_url} --auto-test")
    print()
    print("# Or manually update providers.toml with:")
    print("[providers.ollama_colab]")
    print('name = "Ollama (Colab)"')
    print(f'endpoint = "{public_url}"')
    print('capabilities = ["chat", "reasoning", "code", "embedding"]')
    print('models = ["qwen2.5:3b", "deepseek-coder:1.3b", "gemma:2b"]')
    print("priority_tier = 0")
    print("cost_score = 0.0")
    print("default_timeout_ms = 30000")
    print("bandwidth_score = 0.5")
    print("supports_cot = false")
    print()
    print("‚ö†Ô∏è Remember: Colab sessions timeout, so this endpoint is temporary!")
    print("üîÑ Use the setup script to update the URL when you restart Colab.")
else:
    print("‚ùå Please set your public URL first!")

üîß Goblin Assistant Configuration:

# Automatic setup using the integration script:
cd /path/to/ForgeMonorepo
python3 setup_colab_ollama_integration.py --colab-url https://thomasena-auxochromic-joziah.ngrok-free.dev --auto-test

# Or manually update providers.toml with:
[providers.ollama_colab]
name = "Ollama (Colab)"
endpoint = "https://thomasena-auxochromic-joziah.ngrok-free.dev"
capabilities = ["chat", "reasoning", "code", "embedding"]
models = ["qwen2.5:3b", "deepseek-coder:1.3b", "gemma:2b"]
priority_tier = 0
cost_score = 0.0
default_timeout_ms = 30000
bandwidth_score = 0.5
supports_cot = false

‚ö†Ô∏è Remember: Colab sessions timeout, so this endpoint is temporary!
üîÑ Use the setup script to update the URL when you restart Colab.


## üìö Troubleshooting

### Common Issues:

1. **Colab session crashed/restarted**
   - Models in Drive persist, but you need to remount Drive and restart Ollama
   - Run cells 1-5 again to get back up and running

2. **Out of RAM/CPU**
   - Use smaller models (3B or less parameters)
   - Upgrade to Colab Pro for more resources
   - Close other Colab tabs

3. **ngrok/Pinggy connection issues**
   - Check your auth token is correct
   - Try the alternative tunneling service
   - Make sure Ollama is running on port 11434

4. **Models not persisting**
   - Ensure Drive is mounted correctly
   - Check that the symlink was created
   - Verify models are in `/content/drive/MyDrive/ollama/models`

### Performance Tips:

- Use lightweight models for best performance
- Keep Colab tab focused to prevent timeouts
- Use Colab Pro for longer sessions and more resources
- Test with simple prompts first

---

## üéâ Success!

You now have Ollama running in Colab with:
- ‚úÖ Persistent model storage via Google Drive
- ‚úÖ Public web access via ngrok/Pinggy
- ‚úÖ Integration ready for Goblin Assistant
- ‚úÖ Cost-effective AI inference (free Colab)

**Remember:** This is perfect for demos, testing, and development, but not production use due to session timeouts!

**Next steps:**
1. Copy your public URL
2. Configure your Goblin Assistant backend
3. Start chatting with your AI models! ü§ñ