<a href="https://www.kaggle.com/code/preunpatching/ollama-runtime-for-kaggle?scriptVersionId=226511346" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Ollama Runtime for Kaggle
This notebook allows you to quickly run Ollama on Kaggle's servers, which can be useful for most people who do not have a powerful enough PC to run certain models.
## How to run
### Start notebook
It's recommended that the runtime is configured to use 2x T4 GPUs. To do that, go to Settings > Accelerator and set it to "GPU T4 x2". The runtime may run with just a CPU, however this will cause serious performance degredations compared to with a GPU.
Once ready, press Ctrl+Shift+Alt+Enter, or go to Runtime > Run all.
### Start model
When the notebook asks you to select a compatible Ollama model to use, you may go to https://ollama.com/models to see the library of available models. **Be aware that not all models will run on the best free version of the runtime (29 GB RAM, 2x T4 GPU with 32 GB VRAM), so pick the correct model to use!**
## Bugs
*   None

Once done, you're ready to chat with your selected model!
Have fun!

In [3]:
# Check if we're using an Nvidia GPU.
import os
if os.path.isfile("/opt/bin/nvidia-smi"):
  print("Nvidia GPU detected.")
  # Install pciutils so that Ollama can auto-detect the Nvidia GPU.
  !sudo apt install pciutils
else:
  print("Couldn't detect Nvidia GPU. This will cause serious performance degredations when using a CPU only. It is recommended that you switch to a Nvidia GPU to prevent performance degredations.")
  input("Press ENTER to continue . . .")
# Install Ollama, as well as its API and BS4 for model search function.
!pip install ollama bs4
!curl -fsSL https://ollama.com/install.sh | sh
print("\033[0m", end='')
# Start Ollama server in a seperate process.
import subprocess
import signal
process = subprocess.Popen("ollama serve", shell=True)
# Wait for Ollama server to initialize.
import time
time.sleep(0.5)
# Prompt user for which Ollama model to use.
model = ""
while not model:
  model = input("Select compatible Ollama model to use: ")
# Pull selected model.
from tqdm import tqdm
from ollama import pull
current_digest, bars = '', {}
for progress in pull(model, stream=True):
  digest = progress.get('digest', '')
  if digest != current_digest and current_digest in bars:
    bars[current_digest].close()

  if not digest:
    print(progress.get('status'))
    continue

  if digest not in bars and (total := progress.get('total')):
    bars[digest] = tqdm(total=total, desc=f'pulling {digest[7:19]}', unit='B', unit_scale=True)

  if completed := progress.get('completed'):
    bars[digest].update(completed - bars[digest].n)

  current_digest = digest
# Start model.
from ollama import chat
messages = []
while True:
  try:
    user_input = input('>>> ')
    if user_input.lower() in ["/bye"]:
      print("Terminating Ollama server...")
      process.send_signal(signal.SIGTERM)
      try:
          process.wait(timeout=5)  # Wait up to 5 seconds
          print("Ollama server terminated gracefully.")
      except subprocess.TimeoutExpired:
          print("Terminating Ollama server forcefully...")
          process.send_signal(signal.SIGKILL) # Force termination
          process.wait()
          print("Ollama server terminated forcefully.")
      break
    elif user_input.lower() in ["/clear"]:
      messages = []
      print("Cleared session context")
    else:
      response = ''
      for part in chat(
        model,
        messages=messages
        + [
          {'role': 'user', 'content': user_input},
        ],
        stream=True,
      ):
        response = response + part['message']['content']
        print(part['message']['content'], end='', flush=True)

      # Add the response to the messages to maintain the history.
      messages += [
        {'role': 'user', 'content': user_input},
        {'role': 'assistant', 'content': response},
      ]
  except KeyboardInterrupt:
    print("Terminating Ollama server...")
    process.send_signal(signal.SIGTERM)
    try:
        process.wait(timeout=5)  # Wait up to 5 seconds
        print("Ollama server terminated gracefully.")
    except subprocess.TimeoutExpired:
        print("Terminating Ollama server forcefully...")
        process.send_signal(signal.SIGKILL) # Force termination
        process.wait()
        print("Ollama server terminated forcefully.")
    break

Couldn't detect Nvidia GPU. This will cause serious performance degredations when using a CPU only. It is recommended that you switch to a Nvidia GPU to prevent performance degredations.


Press ENTER to continue . . . 


>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
############################################################################################# 100.0%                                                       16.3%#######################                                                                     28.4%###############################                                                     45.7%#######################################################                                   65.5%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
[0m

Select compatible Ollama model to use:  
Select compatible Ollama model to use:  smollm2


pulling manifest


pulling 4d2396b16114: 100%|██████████| 1.82G/1.82G [00:05<00:00, 337MB/s] 
pulling fbacade46b4d: 100%|██████████| 68.0/68.0 [00:01<00:00, 57.1B/s]
pulling dfebd0343bdd: 100%|██████████| 1.83k/1.83k [00:01<00:00, 1.56kB/s]
pulling 58d1e17ffe51: 100%|██████████| 11.4k/11.4k [00:01<00:00, 9.74kB/s]
pulling f02dd72bb242: 100%|██████████| 59.0/59.0 [00:01<00:00, 49.7B/s]
pulling 6c6b9193c417: 100%|██████████| 559/559 [00:00<00:00, 564B/s]  


verifying sha256 digest
writing manifest
success


>>>  What is pi?


Pi (π) is a mathematical constant that represents the ratio of a circle's circumference to its diameter. It is approximately equal to 3.14159 but is an irrational number, meaning it cannot be expressed as a simple fraction and its decimal representation goes on indefinitely without repeating. Pi is used in various mathematical formulas, particularly those dealing with circles or spheres, such as calculating the area of a circle (A = πr^2) or the volume of a sphere (V = 4/3πr^3).

>>>  /bye


Terminating Ollama server...
Ollama server terminated gracefully.
