# LLMs

[ollama.com](https://ollama.com)  
[ollama Github doc](https://github.com/ollama/ollama)  
[ollama Python doc](https://github.com/ollama/ollama-python)  
[markdown doc](https://python-markdown.github.io/reference/)

In [None]:
import ollama
import IPython
import markdown
import numpy as np

In [None]:
# https://ollama.com/library/gemma3
model_name = "gemma3:270m"

In [None]:
# test if the model is downloaded, if not pull from the server
if model_name not in [m.model for m in ollama.list().models]:
    ollama.pull(model_name)

response: ollama.ChatResponse = ollama.chat(model=model_name, messages=[
  { "role": "user", "content": "Why is the sky blue?" },
])
print(response["message"]["content"])
# or access fields directly from the response object
print(response.message.content)

In [None]:
print(response)

## Note: handle markdown

In [None]:
def md2html(text):
    return markdown.markdown(text)

def print_html(raw_html):
    IPython.display.display_html(raw_html, raw=True)

print_html(md2html(response.message.content))

## Gradual printing / streaming responses

In [None]:
stream = ollama.chat(
    model=model_name,
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    stream=True,
)

for chunk in stream:
  print(chunk["message"]["content"], end="", flush=True)

In [None]:
stream = ollama.chat(
    model=model_name,
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    stream=True,
)

import time

text = ""
for chunk in stream:
    text += chunk["message"]["content"]
    IPython.display.clear_output(wait=True)
    print_html(md2html(text))

## Embeddings

In [None]:
# https://ollama.com/library/all-minilm
embed_model_name = "all-minilm"

# test if the model is downloaded, if not pull from the server
if embed_model_name not in [m.model for m in ollama.list().models]:
    ollama.pull(embed_model_name)
    
response: ollama.EmbedResponse = ollama.embed(
    model=embed_model_name,
    input=["Why is the sky blue?"], # can be a single string, or a list of strings
)

# or access fields directly from the response object
print(response.embeddings)

## Cosine Similarity

[wiki](https://en.wikipedia.org/wiki/Cosine_similarity#Definition)  

Exquation: $\frac{A \cdot B}{||A||||B||}$.

**Note**:  
You can also find it implemented in `scikit-learn`, you can add it with `uv` and then import it:
```python
from sklearn.metrics.pairwise import cosine_similarity
```

In [None]:
def cosine_similarity(vec1, vec2):
    """
    See here: https://gist.github.com/robert-mcdermott/5957ef1ddcfc7c3ba898d800531b2aa7
    """
    vec1 = np.array(vec1)
    vec2 = np.array(vec2)
    
    dot_product = np.dot(vec1, vec2)
    norm1 = np.linalg.norm(vec1)
    norm2 = np.linalg.norm(vec2)
    
    cosine_similarity = dot_product / (norm1 * norm2)
    
    return cosine_similarity

In [None]:
sentences = [
    "Why is the sky blue?",
    "Why is the sky orange?",
    "Tonight I'll be eating soup"
]

response: ollama.EmbedResponse = ollama.embed(
    model=embed_model_name,
    input=sentences,
)

# or access fields directly from the response object
print(len(response.embeddings))

In [None]:
def sentences_similarities(s1_id, s2_id, embeddings):
    print(f"Similarity between:")
    print(f" - '{sentences[s1_id]}'")
    print(f" - '{sentences[s2_id]}'")
    print(f"   => {cosine_similarity(embeddings[s1_id], embeddings[s2_id])}")

In [None]:
sentences_similarities(0, 1, response.embeddings)

In [None]:
sentences_similarities(0, 2, response.embeddings)

In [None]:
sentences_similarities(1, 2, response.embeddings)