# DDW GenAI Workshop Setup

## Jupyter Basics

This is a Jupyter notebook.
It is made up of cells that contain either Markdown text or Python code.

Code cells can be executed.
To select a cell click it with the mouse.

To execute the currently selected cell click the Play Icon in the toolbar at the top of the page or hold the Shift key and press Enter.
When you execute a text cell it renders the markdown.
When you execute a code cell it executes the Python code.

## Check for GPU or Metal acceleration

If you only have CPU available, you may want to stick with gemma:2b or other smaller models.

In [1]:
import torch
if torch.cuda.is_available():
    MY_DEVICE = "cuda"
elif torch.backends.mps.is_available():
    MY_DEVICE = "mps"
else:
    MY_DEVICE = "cpu"

print(f"pytorch available device is {MY_DEVICE}")

pytorch available device is cpu


## Start Ollama Server

Ollama lets you run LLMs locally on your own computer.

- You should already have Ollama installed.
  - If not you can download the installer from <https://ollama.com>.
- Make sure the Ollama server is running.
  - On Mac it will show as a Llama head icon in the menu bar, right hand side.
  - On Windows it will show in the Task Manager area as an icon.
- Start Ollama server if needed.
  - On Mac, open Spotlight (click the magnifying glass in top right corner or press Command-Space bar) and search for "Ollama.app" and select it.
  - On Windows click in the search box at the bottom of the screen and type Ollama, then click the Ollama app result.


## Models

The models we are going to be using with Ollama are:

### Large Language Models
- [Mistral](https://ollama.com/library/mistral) - Takes 4.1 gb
- [Gemma:2b](https://ollama.com/library/gemma:2b) - About 2 gb
- OPTIONAL (largest) [Llama3](https://ollama.com/library/llama3) - About 4.7 gb

### Embedding Models
- [Snowflake Arctic Embed](https://ollama.com/library/snowflake-arctic-embed) - About 700 mb
- [Nomic Embed Text](https://ollama.com/library/nomic-embed-text) - About 300 mg

## Download the Models

We need to download the models we will be using with Ollama.

The cells below will do this from this Jupyter notebook.  When a Python notebook cell has a line starting with an ! or % that line is run as a command line.

If you prefer you can open terminal or command prompt, run these commands:

```bash
ollama pull gemma:2b
ollama pull mistral
ollama pull llama3
ollama pull nomic-embed-text
ollama pull snowflake-arctic-embed
```

NOTE: if you have trouble with the ollama commands, most likely the Ollama directory is not in your PATH environment variable.
You can check your path in the command line/terminal like this:

For Windows:
```bash
echo %PATH% 
```
For macOS/Linux:
```bash
echo $PATH
```

You can also browse for and try additional models at <https://ollama.com/library>.

In [2]:
from subprocess import run


def run_cmd_helper(cmd):
    data = run(cmd, capture_output=True, shell=True)
    output = data.stdout.decode('utf-8')
    errors = data.stderr.decode('utf-8')
    exit_code = data.returncode
    return (exit_code, output, errors)
    
def run_cmd(cmd, quiet=True):
    if not quiet:
        print(f"Running: {cmd}")
    ec, std_out, std_err = run_cmd_helper(cmd)
    if ec != 0 or not quiet:
        print(std_out, std_err)
    return "Success" if ec == 0 else "Failed"

In [3]:
# Verify Ollama is in the PATH and working.
run_cmd("ollama --version",quiet=False)

Running: ollama --version
ollama version is 0.1.32
 


'Success'

In [4]:
MODELS = ['gemma:2b', 'mistral', 'nomic-embed-text', 'snowflake-arctic-embed']
for model in MODELS:
    print(f"Pulling {model}")
    if not run_cmd(f"ollama pull {model}"):
        break

Pulling gemma:2b
Pulling mistral
Pulling nomic-embed-text
Pulling snowflake-arctic-embed


In [5]:
# This one is larger, and optional.
run_cmd('ollama pull llama3')

'Success'

In [6]:
run_cmd('ollama list',quiet=False);

Running: ollama list
NAME                         	ID          	SIZE  	MODIFIED               
gemma:2b                     	b50d6c999e59	1.7 GB	38 seconds ago        	
llama3:latest                	a6990ed6be41	4.7 GB	Less than a second ago	
mistral:latest               	61e88e884507	4.1 GB	22 seconds ago        	
nomic-embed-text:latest      	0a109f422b47	274 MB	20 seconds ago        	
snowflake-arctic-embed:latest	21ab8b9b0545	669 MB	17 seconds ago        	
 


## Test Ollama Endpoint

This sends a request to the local Ollama server to verify it's working.

In [7]:
import requests
# sends a simple request directly to the server
url = "http://localhost:11434/api/chat"
payload =  { 
    "model": "gemma:2b", 
    "stream": False, 
    "messages": [ { "role": "user", "content": "Why is the sky blue?" } ]
}

response = requests.post(url, json = payload)
response.text

'{"model":"gemma:2b","created_at":"2024-04-27T22:33:55.7277255Z","message":{"role":"assistant","content":"The sky appears blue due to Rayleigh scattering. Rayleigh scattering is the scattering of light by particles smaller than the wavelength of light. Blue light has a shorter wavelength than other colors of light, so it is scattered more strongly by air molecules. This is why the sky appears blue."},"done":true,"total_duration":114053853000,"load_duration":7318479800,"prompt_eval_count":15,"prompt_eval_duration":11394857000,"eval_count":56,"eval_duration":95332457000}'

## Use LangChain library to interact with the model

In [8]:
# Try out Ollama with a model to verify working
from langchain_community.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# change the model name to try other models
MODEL_ID = "gemma:2b"

llm = Ollama(model=MODEL_ID, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))

In [9]:
llm.invoke("What are the first eight elements of the periodic table")

Sure, here are the first eight elements of the periodic table:

| # | Element | Symbol | Atomic Number | Atomic Mass | Classification |
|---|---|---|---|---|---|
| 1 | Hydrogen | H | 1 | 1.008 | Nonmetal |
| 2 | Helium | He | 2 | 4.0026 | Noble gas |
| 3 | Lithium | Li | 3 | 6.941 | Alkali metal |
| 4 | Beryllium | Be | 4 | 9.0122 | Alkaline earth metal |
| 5 | Boron | B | 5 | 10.811 | Metalloid |
| 6 | Carbon | C | 6 | 12.011 | Nonmetal |
| 7 | Nitrogen | N | 7 | 14.007 | Nonmetal |
| 8 | Oxygen | O | 8 | 15.999 | Nonmetal |

'Sure, here are the first eight elements of the periodic table:\n\n| # | Element | Symbol | Atomic Number | Atomic Mass | Classification |\n|---|---|---|---|---|---|\n| 1 | Hydrogen | H | 1 | 1.008 | Nonmetal |\n| 2 | Helium | He | 2 | 4.0026 | Noble gas |\n| 3 | Lithium | Li | 3 | 6.941 | Alkali metal |\n| 4 | Beryllium | Be | 4 | 9.0122 | Alkaline earth metal |\n| 5 | Boron | B | 5 | 10.811 | Metalloid |\n| 6 | Carbon | C | 6 | 12.011 | Nonmetal |\n| 7 | Nitrogen | N | 7 | 14.007 | Nonmetal |\n| 8 | Oxygen | O | 8 | 15.999 | Nonmetal |'

## Setup Chromadb Vectorstore

Let'store some text embeddings in Chromadb to verify it is working correctly.

In [10]:
# First we need  some text documents.

text1 = """
Gastroenterology is a branch of medicine that focuses on the digestive system and its disorders.
It deals with the diagnosis, treatment, and prevention of diseases affecting the gastrointestinal tract,
which includes organs such as the esophagus, stomach, small intestine, large intestine (colon),
liver, pancreas, and gallbladder. Gastroenterologists are medical professionals who specialize in this
field and are trained to perform various procedures like endoscopy and colonoscopy to examine and treat
conditions related to the digestive system.
"""

text2 = """
Hepatology is a subspecialty of gastroenterology that focuses specifically on the study, diagnosis,
management, and treatment of diseases related to the liver, biliary tract, and pancreas. This includes
conditions such as hepatitis, cirrhosis, liver cancer, gallstones, and pancreatitis. Hepatologists are
medical professionals who specialize in this field and possess extensive knowledge about the physiology
and pathology of the liver and its associated organs.
"""

text3 = """
Some of the most common conditions that people see gastroenterologists for include:

Gastroesophageal reflux disease (GERD)
Irritable bowel syndrome (IBS)
Inflammatory bowel disease (IBD), which includes Crohn's disease and ulcerative colitis
Peptic ulcers
Celiac disease
Colorectal cancer screening and prevention
Chronic constipation or diarrhea
Gallbladder and bile duct disorders, such as gallstones or cholecystitis
Liver diseases, including hepatitis, cirrhosis, and fatty liver disease
Pancreatic diseases, such as pancreatitis or pancreatic cancer
"""

documents=[text1, text2, text3]
titles=["gastroenterology", "hepatology", "conditions"]


In [23]:
# Setup embedding function using Ollama and create 

from chromadb.utils.embedding_functions import OllamaEmbeddingFunction

# create EF with custom endpoint
ef = OllamaEmbeddingFunction(
    model_name="nomic-embed-text",
    url="http://localhost:11434/api/embeddings",
)

# uncomment this if you want to see an example of the embedding function produces.
#from pprint import pprint
#pprint(ef(["Where did you find that chrome plated llama?."]))

In [47]:
import chromadb
from pprint import pprint

client = chromadb.PersistentClient(path="ollama")

COLLECTION_NAME = "my_collection"
try:
    collection = client.get_collection(
        name=COLLECTION_NAME,
        embedding_function=ef
    )
except ValueError:
    # We have not created the collection yet.
    collection = client.create_collection(
        name=COLLECTION_NAME,
        embedding_function=ef,
        metadata={"hnsw:space": "cosine"},
    )
    collection.add(
        documents=documents,
        ids=[f"id{i}" for i in range(len(documents))],
        metadatas=[{"title": t} for t in titles]
    )
results = collection.query(query_texts=["What are the most common GI conditions that require a doctor"], n_results=1)
pprint(results['documents'][0])

['\n'
 'Some of the most common conditions that people see gastroenterologists for '
 'include:\n'
 '\n'
 'Gastroesophageal reflux disease (GERD)\n'
 'Irritable bowel syndrome (IBS)\n'
 "Inflammatory bowel disease (IBD), which includes Crohn's disease and "
 'ulcerative colitis\n'
 'Peptic ulcers\n'
 'Celiac disease\n'
 'Colorectal cancer screening and prevention\n'
 'Chronic constipation or diarrhea\n'
 'Gallbladder and bile duct disorders, such as gallstones or cholecystitis\n'
 'Liver diseases, including hepatitis, cirrhosis, and fatty liver disease\n'
 'Pancreatic diseases, such as pancreatitis or pancreatic cancer\n']


{'data': None,
 'distances': [[0.46491573843456724]],
 'documents': [['\n'
                'Hepatology is a subspecialty of gastroenterology that focuses '
                'specifically on the study, diagnosis,\n'
                'management, and treatment of diseases related to the liver, '
                'biliary tract, and pancreas. This includes\n'
                'conditions such as hepatitis, cirrhosis, liver cancer, '
                'gallstones, and pancreatitis. Hepatologists are\n'
                'medical professionals who specialize in this field and '
                'possess extensive knowledge about the physiology\n'
                'and pathology of the liver and its associated organs.\n']],
 'embeddings': None,
 'ids': [['id1']],
 'metadatas': [[{'title': 'hepatology'}]],
 'uris': None}


## Setup HuggingFace API

You will need to create a HuggingFace Hub account unless you already have one.
The prerequisite instructions cover how to do this.


In [33]:
import os
import toml

token = None

# check for a secrets toml file in the working directory with contents like:
# HUGGINGFACE_TOKEN="your_actual_token_instead"
try:
    with open('secrets.toml', 'r') as f:
        config = toml.load(f)
    if not 'HUGGINGFACE_TOKEN':
        print("'HUGGINGFACE_TOKEN' not in secrets.toml")
    else:
        token = config['HUGGINGFACE_TOKEN']
except FileNotFoundError:
    print("No secret file found")


if not token:
    print("ERROR! No Hugging Face Hub API token found!")
else:
    os.environ["HF_TOKEN"] = token
    print("Found Hugging Face Hub token")

Found Hugging Face Hub token


In [34]:
from huggingface_hub import login
login(token=token)

Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to C:\Users\ranton\.cache\huggingface\token
Login successful


In [37]:
# call inference endpoint to verify token is working
from huggingface_hub import InferenceClient

HF_MODEL_ID = "HuggingFaceH4/zephyr-7b-beta"
# You can find other models to try at https://huggingface.co/models.
    
client = InferenceClient(model=HF_MODEL_ID, token=token)

prompt="List the neon gases in order of their atomic numbers."
for out_token in client.text_generation(prompt=prompt, max_new_tokens=512, stream=True):
    print(out_token, end='')



Neon is a noble gas, which means it has a full valence shell and is therefore relatively unreactive. The neon gases are all noble gases, and their atomic numbers are as follows:

1. Neon (Ne) - atomic number 10
2. Helium (He) - atomic number 2
3. Argon (Ar) - atomic number 18
4. Krypton (Kr) - atomic number 36
5. Xenon (Xe) - atomic number 54

Therefore, the neon gases in order of their atomic numbers are:

1. Helium (He)
2. Neon (Ne)
3. Argon (Ar)
4. Krypton (Kr)
5. Xenon (Xe)

However, neon is not commonly used as a neon gas in neon signs or lighting due to its low vapor pressure at standard temperatures and pressures. Instead, neon lamps typically use a mixture of neon and other noble gases, such as argon or helium, to improve their performance.</s>