# Welcome to the Small Language Model (SLM) Tutorial - Ollama
**Credits:** Section 1- 6 is the work done by Pamela Fox, Python Cloud Advocate, Microsoft. You can read the original notebook here - https://github.com/pamelafox/ollama-python-playground/blob/main/ollama.ipynb

Section 0 is written by Balaji Alwar, Service Lead, Datahub.

Tested tinyllama model on July 8th on https://data100.datahub.berkeley.edu/ and https://datahub.berkeley.edu/. We recommend using 2 GB/4 GB RAM to execute the cells in the notebook.

**Overview**
In this tutorial, you will learn how to install and interact with an Ollama framework service instance running on a Jupyter server. This hands-on guide will take you through the necessary steps to set up the environment, download the framework and a model, and perform basic operations.

**What You Will Learn**
- **Introduction to Ollama:** Understand what Ollama is and its applications
- **Setting Up the Environment:** Learn how to set up your Jupyter server environment to work with Ollama.
- **Downloading and Installing Ollama:** Step-by-step instructions on downloading the Ollama executable file. Making the downloaded file executable.
- **Running Ollama Commands:** Execute basic Ollama management commands from within the Jupyter notebook. Interact with the an Ollama-served model to perform specific tasks.
- **Using Ollama for Machine Learning:** Load a pre-trained model using Ollama. Use the model to via the Ollama framework API to make predictions or analyze data.
- **Practical Examples:** Walkthrough of practical examples to solidify your understanding. Apply an Ollama-served model to real-world datasets.

**Prerequisites**
- **Basic Knowledge of Command Line:** Familiarity with basic command-line operations will be helpful.
- **Python Basics:** Understanding basic Python programming concepts. 
- **Jupyter Notebook Usage:** Basic knowledge of how to navigate and use Jupyter notebooks.

Let's Get Started!

By the end of this tutorial, you'll have a solid understanding of how to set up and use Ollama within a Jupyter environment.

## 0. Install the Ollama framework and a supported model in Jupyterhub

The OpenAI Python package is a powerful tool that facilitates interactions with OpenAI-style API endpoints to access state-of-the-art machine learning models, including language models like GPT-3. This package provides a convenient way to integrate advanced AI capabilities into your projects, enabling you to perform tasks such as natural language processing, text generation, translation, and more. Commands below install OpenAI package if it is not installed previously.

In [None]:
try:
    import openai
except ImportError:
    !pip install openai
    import openai

`Import os package` The below command imports the os module in Python, which provides a way to interact with the operating system. The os module allows you to execute system commands, manipulate the file system, and perform other OS-level operations.

In [None]:
import os
import requests

The exclamation mark `!` is used in Jupyter notebooks to indicate that the following command should be executed in the shell (i.e., as a command-line instruction). This allows users to run shell commands directly from a Jupyter notebook cell. The below command navigates to your home directory in the Jupyter server

In [None]:
!cd

This section of the script defines the name and download URL for the Ollama framework Linux binary, and constructs the paths for the shared and current directories. Ideally, binaries will already be present in the shared directory. If it is not the case then a method defined below will help with downloading the binaries to the current working directory.

In [None]:
# Define the binary name
binary_name = 'ollama-linux-amd64'
# Define the URL to download the binary
binary_url = 'https://github.com/ollama/ollama/releases/download/v0.1.48/ollama-linux-amd64'  # replace with the actual URL
# Construct path for shared drive
shared_drive_path = '/home/jovyan/shared'
shared_binary_path = os.path.join(shared_drive_path, binary_name)
# Construct path for current drive
current_binary_path = os.path.join(os.getcwd(),binary_name )

This check_and_download_binary() function checks for the presence of Ollama framework Linux binary file in a shared directory. If the binary is not found, it downloads the binary from https://github.com/ollama/ollama/releases/download/v0.1.48/ollama-linux-amd64 to current working directory instead, and makes it executable.

In [None]:
# Function to check and download the required binary files
def check_and_download_binary():
    if not os.path.exists(shared_binary_path):
        print(f"{binary_name} not found in shared drive. Downloading now...")
        response = requests.get(binary_url)
        if response.status_code == 200:
            with open(current_binary_path, 'wb') as file:
                file.write(response.content)
            print(f"{binary_name} downloaded successfully to {current_binary_path}.")
            # Below command makes the file ollama-linux-amd64 executable. 
            os.system("chmod +x ollama-linux-amd64")
        else:
            print(f"Failed to download {binary_name}. HTTP Status Code: {response.status_code}")
    else:
        print(f"{binary_name} is already present in the shared drive.")
        os.chdir("/home/jovyan/shared")

In [None]:
# Create the drive directory if it doesn't exist
os.makedirs(shared_drive_path, exist_ok=True)

# Check and download the binary
check_and_download_binary()

The `pwd` command prints the current working directory. If the Ollama executables are present in the shared working directory, it would display `/home/jovyan/shared`. Otherwise, it will show the directory where your notebook is currently located.

In [None]:
!pwd

`ls` command lists available files in your current directory. Check whether ollama-linux-amd64 binary file is available in your home directory

In [None]:
!ls

The below command tells the operating system to run the file ollama-linux-amd64. The ./ at the beginning specifies that the file is in the current directory. serve: This is an argument passed to the ollama-linux-amd64 program. It tells the program to start a service or server. &: This symbol tells the operating system to run the program in the background as you execute other cells in a notebook. 

In [None]:
os.system("./ollama-linux-amd64 serve&")

The command below pulls the Phi3.5 model from ollama library and launches the model in your Jupyter server. Ollama supports a list of models available on ollama.com/library. Phi3.5 is a compact model with only 3B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.

NOTE: 'ollama-linux-amd64 run...' also launches the simple terminal chat interface app! What happens in the notebook context?

In [None]:
os.system("./ollama-linux-amd64 run phi3.5")

The command below lists the models that are currently installed in your Jupyter server

In [None]:
os.system("./ollama-linux-amd64 list")

## 1. Specify the model name

If you pulled in a different model than "phi3.5", change the value in the cell below.
That variable will be used in code throughout the notebook.

In [None]:
MODEL_NAME = "phi3.5"

## 2. Set up the Open AI client

Typically the OpenAI client is used with OpenAI.com or Azure OpenAI to interact with large language models.
However, it can also be used with Ollama, since Ollama provides an OpenAI-compatible endpoint at "http://localhost:11434/v1".

In [None]:
import openai

client = openai.OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="nokeyneeded",
)

## 3. Generate a chat completion

Now we can use the OpenAI SDK to generate a response for a conversation. This request should generate a haiku about cats:

In [None]:
response = client.chat.completions.create(
    model=MODEL_NAME,
    temperature=0.7,
    n=1,
    messages=[
        {"role": "user", "content": "Write a haiku about a hungry cat"},
    ],
)

print("Response:")
print(response.choices[0].message.content)


## 4. Include a hidden "system message" at the start of the conversation, before the user prompt

In [None]:
SYSTEM_MESSAGE = """
Dobby is a small, friendly house-elf who lives in the Hogwarts castle garden.
He has an irritating habit of speaking in short, clipped sentences that are often difficult to understand.
Please role-play as Dobby.
Ok, begin!
"""

USER_MESSAGE = """
How are you?
"""

response = client.chat.completions.create(
    model=MODEL_NAME,
    temperature=0.7,
    n=1,
    max_tokens=100,
    messages=[
        {"role": "system", "content": SYSTEM_MESSAGE},
        {"role": "user", "content": USER_MESSAGE},
    ],
)

print("Response:")
print(response.choices[0].message.content)


## 5. "Few-shot" examples

Another way to guide a language model is to provide "few shots", a sequence of sample prompt/response (or system/user) pairs that establish a pattern to the conversation; our model will statistically tend to follow this sample pattern when it gets a new prompt following these examples.

The "Few shot" label is commonly used for this technique, but, in truth, this is simply a "pre-loaded" initial conversation in which both sample prompts *and* sample responses were written beforehand by the developer; when the real user engages in a new conversation via your application, they do not know that their first prompt is *appended* to this this hidden, pre-written conversation by your application.

The example below tries to get a language model to act like a teaching assistant by providing a few examples of questions that a student might ask, but each sample question has been given a response as a TA might give: each sample response includes a question that a TA might ask in response to the student to help lead them toward an answer.

The script then includes a new prompt, a question that a student might ask. We hope that the model will respond to subsequent prompts akin to the way it responded to our hard-coded samples.

Try it first, and then modify the `SYSTEM_MESSAGE`, `EXAMPLES`, and `USER_MESSAGE` for a new scenario.



In [None]:
SYSTEM_MESSAGE = """
You are a helpful assistant that helps students with their homework by asking questions.
Instead of providing the answer to a question, you respond by asking a question to help the student work it out!
Never resond with a direct answer. Always respond with a new, related question.
Here is a question to help you work it out! ```What is...```
"""

EXAMPLES = [
    (
        "What is the capital of France?",
        "Here is a question to help you work it out! ```What is the name of the city that is known for the Eiffel Tower?```"
    ),
    (
        "What is the square root of 144?",
        "Here is a question to help you work it out! ```What is the number that when multiplied by itself equals 144?```"
    ),
    (   "What is the atomic number of oxygen?",
        "Here is a question to help you work it out! ```What is the count of protons that an oxygen atom has?```"
    ),
]

USER_MESSAGE = "What is the largest planet in our solar system?"


response = client.chat.completions.create(
    model=MODEL_NAME,
    temperature=0.7,
    n=1,
    messages=[
        {"role": "system", "content": SYSTEM_MESSAGE},
        {"role": "user", "content": EXAMPLES[0][0]},
        {"role": "assistant", "content": EXAMPLES[0][1]},
        {"role": "user", "content": EXAMPLES[1][0]},
        {"role": "assistant", "content": EXAMPLES[1][1]},
        {"role": "user", "content": EXAMPLES[2][0]},
        {"role": "assistant", "content": EXAMPLES[2][1]},
        {"role": "user", "content": USER_MESSAGE},
    ],
)


print("Response:")
print(response.choices[0].message.content)

## 6. Retrieval Augmented Generation

RAG (Retrieval Augmented Generation) is a technique of having your application rewrite user queries by first searching your pre-loaded data store for text that may be related to the user's query, and then bundling likely-related "search hits" from your application's data store together with the user's query; the prompt that your application delivers to the LLM on the user's behalf is a bundling of data store search hits *plus* the query written by the user. This is an approach you may use if your application is meant to help the user with exploring documentation, a book, a database, or some other source of text of your choosing as the application developer.

The concept is that you have your application prompt the LLM with something like, "Please answser this user's question question that follows, but also be informed by these several pieces of related data from our private data store: ..." rather than having the user interact directly with the LLM.

The user may feel like the LLM has "read the documentation" or "studied the book," but your application is simply doing a pre-**Generation** step of **Augmenting** the users's query with some data that your application **Retrieved** from your data store; hence the name **Retrieval Augmented Generation**.

We have provided a local CSV file with data about hybrid cars. The code below compares the strings in the user question with the data in the CSV file. Each CSV file data row that has a string match to a word in the user question is bundled into the LLM prompt, along with the original user question.

There are more complicated and sophisticated ways to have your application match the user's query to elements of your data set, but this CSV example demonstrates the application workflow.

If you notice the answer is still not grounded in the data, you can try system engineering or try other models. Generally, RAG is more effective with either larger models or with fine-tuned versions of SLMs.

In [None]:
import csv

SYSTEM_MESSAGE = """
You are a helpful assistant that answers questions about hybrid cars.
You will be given related data from a hybrid car database to answer the question.
Please favor using the provided data, rather than information that is not provided.
"""

QUESTION = "What are the earliest and latest prius years in the provided data?"

# Open the CSV and store in a list
with open("hybrid.csv", "r") as file:
    reader = csv.reader(file)
    rows = list(reader)

# Normalize the user question to replace punctuation and make lowercase
normalized_question = QUESTION.lower().replace("?", "").replace("(", " ").replace(")", " ")
print(f"Normalized question:\n\t{normalized_question}")

# Search the CSV for user question using very naive search
words = normalized_question.split()
matches = []
for row in rows[1:]:
    # if the word matches any word in row, add the row to the matches
    if any(word in row[0].lower().split() for word in words) or any(word in row[5].lower().split() for word in words):
        matches.append(row)

# Format as a markdown table, since language models understand markdown
matches_table = " | ".join(rows[0]) + "\n" + " | ".join(" --- " for _ in range(len(rows[0]))) + "\n"
matches_table += "\n".join(" | ".join(row) for row in matches)
print(f"Number of matches from csv:\n\t{len(matches)} matches\n")

USER_MESSAGE = QUESTION + "\nProvided data: " + matches_table
print(f"USER_MESSAGE sent to LLM:\n\n {USER_MESSAGE}\n\n")

# Now we can use the matches to generate a response
response = client.chat.completions.create(
    model=MODEL_NAME,
    temperature=0.7,
    n=1,
    messages=[
        {"role": "system", "content": SYSTEM_MESSAGE},
        {"role": "user", "content": QUESTION + "\nProvided data: " + matches_table},
    ],
)

print("Response:\n")
print(response.choices[0].message.content)