# Ollama Query Examples

Questo notebook mostra come interagire con Ollama in locale per:
- Fare query ai modelli LLM
- Generare embeddings
- Usare modelli vision

Documentazione: https://ollama.com/blog/embedding-models

## Setup e Import

In [1]:
import requests
import json
import numpy as np
from typing import List, Dict, Optional

## Configurazione

In [2]:
# Ollama API endpoint (default locale)
OLLAMA_BASE_URL = "http://localhost:11434"

# Modelli disponibili sul sistema
AVAILABLE_MODELS = [
    "qwen3-vl:32b",          # Vision-Language model (20GB)
    "qwen3-vl:8b",           # Vision-Language model più piccolo (6.1GB)
    "nomic-embed-text:v1.5", # Embedding model (274MB)
    "qwen3:latest",          # LLM (5.2GB)
    "deepseek-r1:8b",        # LLM (5.2GB)
    "moondream:1.8b",        # Vision model (1.7GB)
    "llava:13b",             # Vision-Language model (8GB)
]

## 1. Lista Modelli Disponibili

In [6]:
def list_models():
    """Lista tutti i modelli disponibili in Ollama"""
    response = requests.get(f"{OLLAMA_BASE_URL}/api/tags")
    if response.status_code == 200:
        models = response.json().get('models', [])
        print(f"Found {len(models)} models:\n")
        for model in models:
            name = model.get('name', 'Unknown')
            size = model.get('size', 0) / (1024**3)  # Convert to GB
            modified = model.get('modified_at', 'Unknown')
            print(f"  - {name:30s} ({size:6.2f} GB) - Modified: {modified}")
        return models
    else:
        print(f"Error: {response.status_code}")
        return None

# Lista modelli
models = list_models()

Found 14 models:

  - x/flux2-klein:latest           (  5.33 GB) - Modified: 2026-01-21T09:09:00.331986544Z
  - qwen3-vl:32b                   ( 19.47 GB) - Modified: 2026-01-20T17:26:48.395274756Z
  - nomic-embed-text:v1.5          (  0.26 GB) - Modified: 2026-01-19T18:44:47.091835921Z
  - moondream:1.8b                 (  1.62 GB) - Modified: 2026-01-11T00:15:36.591414258Z
  - qwen3-vl:8b                    (  5.72 GB) - Modified: 2026-01-11T00:12:49.90249849Z
  - nemotron-3-nano:30b            ( 22.61 GB) - Modified: 2025-12-26T11:40:18.119084142Z
  - nemotron-3-nano:latest         ( 22.61 GB) - Modified: 2025-12-25T17:09:04.334692225Z
  - llama4:16x17b                  ( 62.81 GB) - Modified: 2025-12-24T01:02:15.885941871Z
  - llava:13b                      (  7.46 GB) - Modified: 2025-10-13T12:46:53.755449209Z
  - qwen3:235b                     (132.39 GB) - Modified: 2025-09-24T18:16:10.146065174Z
  - qwen3:latest                   (  4.87 GB) - Modified: 2025-09-24T18:01:23.5108

## 2. Query Semplice a un LLM

In [4]:
def query_llm(prompt: str, model: str = "qwen3:latest", stream: bool = False) -> str:
    """
    Fa una query a un modello LLM.
    
    Args:
        prompt: Il prompt da inviare al modello
        model: Nome del modello da usare
        stream: Se True, stampa la risposta in streaming
    
    Returns:
        La risposta completa del modello
    """
    url = f"{OLLAMA_BASE_URL}/api/generate"
    
    payload = {
        "model": model,
        "prompt": prompt,
        "stream": stream
    }
    
    response = requests.post(url, json=payload, stream=stream)
    
    if stream:
        # Streaming mode
        full_response = ""
        for line in response.iter_lines():
            if line:
                data = json.loads(line)
                chunk = data.get('response', '')
                print(chunk, end='', flush=True)
                full_response += chunk
                if data.get('done', False):
                    break
        print()  # New line
        return full_response
    else:
        # Non-streaming mode
        if response.status_code == 200:
            result = response.json()
            return result.get('response', '')
        else:
            return f"Error: {response.status_code}"

# Esempio di query
prompt = "Explain what is federated learning in 2 sentences."
print(f"Prompt: {prompt}\n")
print("Response:")
response = query_llm(prompt, model="qwen3:latest", stream=True)

Prompt: Explain what is federated learning in 2 sentences.

Response:
Federated learning is a machine learning approach where multiple devices or servers collaboratively train a model without sharing their data, enhancing privacy and security. The model is trained locally on each device, with only aggregated model updates shared centrally, allowing for collaborative learning without exposing raw data.


## 3. Chat Conversation

In [None]:
def chat(messages: List[Dict[str, str]], model: str = "qwen3:latest") -> str:
    """
    Invia una conversazione multi-turn a un modello.
    
    Args:
        messages: Lista di messaggi [{'role': 'user|assistant', 'content': 'text'}]
        model: Nome del modello da usare
    
    Returns:
        La risposta del modello
    """
    url = f"{OLLAMA_BASE_URL}/api/chat"
    
    payload = {
        "model": model,
        "messages": messages,
        "stream": False
    }
    
    response = requests.post(url, json=payload)
    
    if response.status_code == 200:
        result = response.json()
        return result.get('message', {}).get('content', '')
    else:
        return f"Error: {response.status_code}"

# Esempio di conversazione
conversation = [
    {"role": "user", "content": "What is knowledge distillation?"},
    {"role": "assistant", "content": "Knowledge distillation is a technique where a smaller student model learns to mimic a larger teacher model."},
    {"role": "user", "content": "Can you give me an example in federated learning?"}
]

print("Conversation:")
for msg in conversation:
    print(f"{msg['role'].upper()}: {msg['content']}")

print("\nASSISTANT:", chat(conversation, model="qwen3:latest"))

## 4. Generate Embeddings

In [5]:
def generate_embeddings(texts: List[str], model: str = "nomic-embed-text:v1.5") -> np.ndarray:
    """
    Genera embeddings per una lista di testi.
    
    Args:
        texts: Lista di testi da embeddare
        model: Modello di embedding da usare
    
    Returns:
        Array numpy di embeddings (shape: [num_texts, embedding_dim])
    """
    url = f"{OLLAMA_BASE_URL}/api/embeddings"
    
    embeddings = []
    
    for text in texts:
        payload = {
            "model": model,
            "prompt": text
        }
        
        response = requests.post(url, json=payload)
        
        if response.status_code == 200:
            result = response.json()
            embedding = result.get('embedding', [])
            embeddings.append(embedding)
        else:
            print(f"Error for text '{text[:50]}...': {response.status_code}")
            embeddings.append(None)
    
    # Filter out None values and convert to numpy
    embeddings = [e for e in embeddings if e is not None]
    return np.array(embeddings)

# Esempio di embedding generation
texts = [
    "Federated learning is a distributed machine learning approach.",
    "Knowledge distillation transfers knowledge from a teacher to a student model.",
    "The weather is nice today."
]

print("Generating embeddings for texts:")
for i, text in enumerate(texts):
    print(f"  {i+1}. {text}")

embeddings = generate_embeddings(texts)
print(f"\nGenerated embeddings shape: {embeddings.shape}")
print(f"Embedding dimension: {embeddings.shape[1]}")

Generating embeddings for texts:
  1. Federated learning is a distributed machine learning approach.
  2. Knowledge distillation transfers knowledge from a teacher to a student model.
  3. The weather is nice today.

Generated embeddings shape: (3, 768)
Embedding dimension: 768


## 5. Calcola Similarità tra Embeddings

In [None]:
def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
    """Calcola la similarità coseno tra due vettori"""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Calcola similarità tra i testi
print("\nSimilarity matrix:")
print("\n" + " " * 10 + "Text 1   Text 2   Text 3")
for i in range(len(embeddings)):
    similarities = []
    for j in range(len(embeddings)):
        sim = cosine_similarity(embeddings[i], embeddings[j])
        similarities.append(sim)
    print(f"Text {i+1}:  {similarities[0]:.4f}   {similarities[1]:.4f}   {similarities[2]:.4f}")

print("\nInterpretation:")
sim_1_2 = cosine_similarity(embeddings[0], embeddings[1])
sim_1_3 = cosine_similarity(embeddings[0], embeddings[2])
print(f"Similarity between Text 1 and Text 2 (both ML-related): {sim_1_2:.4f}")
print(f"Similarity between Text 1 and Text 3 (unrelated): {sim_1_3:.4f}")

## 6. Vision Model Query (con immagine)

In [None]:
import base64
from pathlib import Path

def query_vision_model(prompt: str, image_path: str, model: str = "qwen3-vl:8b") -> str:
    """
    Fa una query a un modello vision con un'immagine.
    
    Args:
        prompt: Il prompt testuale
        image_path: Path all'immagine
        model: Nome del modello vision da usare
    
    Returns:
        La risposta del modello
    """
    url = f"{OLLAMA_BASE_URL}/api/generate"
    
    # Leggi e codifica l'immagine in base64
    with open(image_path, 'rb') as img_file:
        image_data = base64.b64encode(img_file.read()).decode('utf-8')
    
    payload = {
        "model": model,
        "prompt": prompt,
        "images": [image_data],
        "stream": False
    }
    
    response = requests.post(url, json=payload)
    
    if response.status_code == 200:
        result = response.json()
        return result.get('response', '')
    else:
        return f"Error: {response.status_code}"

# Esempio (richiede un'immagine)
# Decommentare e fornire un path valido
# image_path = "/path/to/your/image.jpg"
# if Path(image_path).exists():
#     response = query_vision_model(
#         prompt="Describe what you see in this image.",
#         image_path=image_path,
#         model="qwen3-vl:8b"
#     )
#     print(f"Vision Model Response:\n{response}")
# else:
#     print(f"Image not found: {image_path}")

print("Vision model query example (commented out - provide an image path to use)")

## 7. Batch Query per Federated Learning Research

In [None]:
def research_query_batch(questions: List[str], model: str = "qwen3:latest") -> Dict[str, str]:
    """
    Esegue un batch di query di ricerca.
    
    Args:
        questions: Lista di domande
        model: Modello da usare
    
    Returns:
        Dizionario {domanda: risposta}
    """
    results = {}
    
    for i, question in enumerate(questions, 1):
        print(f"\n[{i}/{len(questions)}] Processing: {question}")
        response = query_llm(question, model=model, stream=False)
        results[question] = response
        print(f"Response preview: {response[:150]}...")
    
    return results

# Esempio di batch query per ricerca FL
research_questions = [
    "What are the main challenges in federated learning with heterogeneous data?",
    "How does knowledge distillation help in federated learning scenarios?",
    "Explain the concept of adapter modules in transfer learning."
]

print("Starting batch research queries...")
results = research_query_batch(research_questions, model="qwen3:latest")

print("\n" + "="*80)
print("SUMMARY OF RESULTS")
print("="*80)
for question, answer in results.items():
    print(f"\nQ: {question}")
    print(f"A: {answer[:200]}...\n")

## 8. Semantic Search con Embeddings

In [None]:
def semantic_search(query: str, corpus: List[str], model: str = "nomic-embed-text:v1.5", top_k: int = 3):
    """
    Semantic search: trova i documenti più simili alla query.
    
    Args:
        query: Testo della query
        corpus: Lista di documenti
        model: Modello di embedding
        top_k: Numero di risultati da restituire
    
    Returns:
        Lista di tuple (documento, similarità)
    """
    # Genera embeddings per query e corpus
    print(f"Generating embeddings for query and {len(corpus)} documents...")
    query_embedding = generate_embeddings([query], model=model)[0]
    corpus_embeddings = generate_embeddings(corpus, model=model)
    
    # Calcola similarità
    similarities = []
    for i, doc_embedding in enumerate(corpus_embeddings):
        sim = cosine_similarity(query_embedding, doc_embedding)
        similarities.append((corpus[i], sim))
    
    # Ordina per similarità decrescente
    similarities.sort(key=lambda x: x[1], reverse=True)
    
    return similarities[:top_k]

# Esempio di semantic search
corpus_docs = [
    "Federated learning enables training models across decentralized data without sharing raw data.",
    "Knowledge distillation compresses large neural networks into smaller ones.",
    "Adapter modules allow efficient fine-tuning of pre-trained models.",
    "The VEGAS dataset contains audio-visual data for multimodal learning.",
    "Differential privacy protects individual data points in machine learning.",
    "Generative models like VAE can create synthetic training samples.",
    "Multi-modal learning combines information from different data modalities.",
]

query = "How to train models on distributed data?"

print(f"\nQuery: '{query}'\n")
print("Searching in corpus...\n")

top_results = semantic_search(query, corpus_docs, top_k=3)

print("\nTop 3 most relevant documents:")
for i, (doc, similarity) in enumerate(top_results, 1):
    print(f"\n{i}. [Similarity: {similarity:.4f}]")
    print(f"   {doc}")

## 9. Utilità: Model Info

In [None]:
def get_model_info(model_name: str):
    """Ottiene informazioni dettagliate su un modello"""
    url = f"{OLLAMA_BASE_URL}/api/show"
    
    payload = {
        "name": model_name
    }
    
    response = requests.post(url, json=payload)
    
    if response.status_code == 200:
        info = response.json()
        print(f"\nModel: {model_name}")
        print("=" * 60)
        
        if 'modelfile' in info:
            print(f"\nModelfile:\n{info['modelfile'][:500]}...")
        
        if 'parameters' in info:
            print(f"\nParameters: {info['parameters']}")
        
        if 'template' in info:
            print(f"\nTemplate: {info['template'][:200]}...")
        
        return info
    else:
        print(f"Error: {response.status_code}")
        return None

# Esempio: info su embedding model
model_info = get_model_info("nomic-embed-text:v1.5")

## 10. Esempio Completo: RAG (Retrieval Augmented Generation)

In [None]:
def rag_query(question: str, knowledge_base: List[str], 
               embedding_model: str = "nomic-embed-text:v1.5",
               llm_model: str = "qwen3:latest",
               top_k: int = 2):
    """
    RAG: Retrieval Augmented Generation
    1. Cerca documenti rilevanti nella knowledge base
    2. Usa i documenti come contesto per generare la risposta
    """
    print(f"Question: {question}\n")
    
    # Step 1: Retrieval
    print("Step 1: Retrieving relevant documents...")
    relevant_docs = semantic_search(question, knowledge_base, 
                                   model=embedding_model, top_k=top_k)
    
    print(f"Found {len(relevant_docs)} relevant documents\n")
    
    # Step 2: Augmentation (costruisci prompt con contesto)
    context = "\n".join([doc for doc, _ in relevant_docs])
    
    augmented_prompt = f"""Context:
{context}

Question: {question}

Answer the question based on the context provided above. If the context doesn't contain enough information, say so.
"""
    
    print("Step 2: Generating answer with context...\n")
    
    # Step 3: Generation
    answer = query_llm(augmented_prompt, model=llm_model, stream=False)
    
    print("="*80)
    print("FINAL ANSWER")
    print("="*80)
    print(answer)
    
    return answer, relevant_docs

# Knowledge base per federated learning
fl_knowledge_base = [
    "Federated learning (FL) is a machine learning technique that trains models across multiple decentralized devices or servers holding local data samples, without exchanging them.",
    "In federated learning, only model updates (gradients or parameters) are shared with a central server, preserving data privacy.",
    "Knowledge distillation in FL allows a student model to learn from multiple teacher models distributed across nodes, improving generalization.",
    "Adapter modules are small neural network components that can be fine-tuned while keeping the main model frozen, enabling efficient transfer learning.",
    "The main challenges in FL include non-IID data distribution, communication costs, and model personalization.",
    "Synthetic data generation using VAEs or GANs can help address data scarcity in federated settings.",
    "Aggregation methods in FL include FedAvg (averaging model weights) and more advanced techniques like FedProx and SCAFFOLD.",
]

# Esempio RAG
question = "How can knowledge distillation improve federated learning?"
answer, context_docs = rag_query(question, fl_knowledge_base, top_k=2)

print("\n" + "="*80)
print("CONTEXT USED")
print("="*80)
for i, (doc, sim) in enumerate(context_docs, 1):
    print(f"\n{i}. [Similarity: {sim:.4f}]")
    print(f"   {doc}")

## Note Finali

### Modelli Disponibili:
- **LLM**: `qwen3:latest`, `deepseek-r1:8b` - Per query generali
- **Embedding**: `nomic-embed-text:v1.5` - Per semantic search e RAG
- **Vision**: `qwen3-vl:8b`, `qwen3-vl:32b`, `llava:13b` - Per analisi immagini

### API Endpoints:
- `/api/generate` - Generazione testo
- `/api/chat` - Conversazioni multi-turn
- `/api/embeddings` - Generazione embeddings
- `/api/tags` - Lista modelli
- `/api/show` - Info modello

### Documentazione:
- https://ollama.com/blog/embedding-models
- https://github.com/ollama/ollama/blob/main/docs/api.md