# Step 3: RAG Query Pipeline with Llama

This notebook brings everything together:
1. **Retrieve** - Search ChromaDB for relevant services
2. **Augment** - Add those services as context
3. **Generate** - Use Llama to create helpful responses

**Make sure Ollama is running:** Open a terminal and run `ollama serve`

## Install Required Package

In [1]:
# Only need to run once, will install quietly 
!pip install ollama -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.3[0m[39;49m -> [0m[32;49m26.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Load Our Existing Components

In [2]:
import json
import chromadb
from sentence_transformers import SentenceTransformer
import ollama
import warnings
warnings.filterwarnings("ignore")

# Load embedding model (same one from notebook 02)
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
print("Embedding model loaded!")

# Connect to ChromaDB (created in notebook 02)
chroma_client = chromadb.PersistentClient(path="../data/chroma_db")
collection = chroma_client.get_collection("services")
print(f"Connected to ChromaDB with {collection.count()} services")

# Load full service data (for detailed responses)
with open('../data/homeless_services_hackathon.json', 'r') as f:
    all_services = json.load(f)
print(f"Loaded {len(all_services)} full service records")

Embedding model loaded!
Connected to ChromaDB with 1719 services
Loaded 1719 full service records


## Test Ollama Connection

In [3]:
# Test that Ollama is running
try:
    response = ollama.chat(model='llama3.2', messages=[
        {'role': 'user', 'content': 'Say "Hello, I am ready!" and nothing else.'}
    ])
    print("Ollama connection successful!")
    print(f"Response: {response['message']['content']}")
except Exception as e:
    print(f"Error: {e}")
    print("Make sure Ollama is running: open a terminal and run 'ollama serve'")

Ollama connection successful!
Response: Hello, I am ready!


## Build the RAG Pipeline

In [4]:
def search_services(query, n_results=5):
    """
    Search for services matching the query using semantic similarity.
    """
    query_embedding = embedding_model.encode(query).tolist()
    
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=n_results
    )
    
    return results


def format_services_for_llm(search_results):
    """
    Format search results into a clear context string for the LLM.
    """
    services_text = ""
    
    for i, (meta, doc) in enumerate(zip(search_results['metadatas'][0], search_results['documents'][0])):
        services_text += f"""
---
SERVICE {i+1}: {meta.get('service_name', 'Unknown')}
Organization: {meta.get('organization', 'N/A')}
Phone: {meta.get('phone', 'N/A')}
Address: {meta.get('address', 'N/A')}
Types: {meta.get('types', 'N/A')}
Area Served: {meta.get('area_served', 'N/A')}

Full Details:
{doc[:1500]}
"""
    
    return services_text


def ask_case_manager_assistant(user_query, n_services=5):
    """
    Main RAG function: Search for services and generate a helpful response.
    """
    # Step 1: RETRIEVE - Search for relevant services
    search_results = search_services(user_query, n_results=n_services)
    
    # Step 2: AUGMENT - Format services as context
    services_context = format_services_for_llm(search_results)
    
    # Step 3: GENERATE - Create the prompt and get LLM response
    system_prompt = """You are a helpful assistant for case managers working with homeless and at-risk populations in San Diego.

Your job is to:
1. Analyze the services provided in the context
2. Recommend the most relevant services for the client's situation
3. Explain eligibility requirements clearly
4. Provide contact information and next steps
5. Note any important details (hours, documents needed, etc.)

Be concise but thorough. If a service doesn't seem like a good match, say so.
Always prioritize the client's immediate needs."""

    user_message = f"""A case manager is asking: "{user_query}"

Here are the relevant services from our database:
{services_context}

Based on these services, provide helpful recommendations for the case manager."""

    # Call Llama via Ollama
    response = ollama.chat(
        model='llama3.2',
        messages=[
            {'role': 'system', 'content': system_prompt},
            {'role': 'user', 'content': user_message}
        ]
    )
    
    return {
        'answer': response['message']['content'],
        'services_found': [m['service_name'] for m in search_results['metadatas'][0]]
    }

print("RAG pipeline ready!")

RAG pipeline ready!


## Test the RAG Pipeline

In [5]:
# Test query 1: Veteran shelter
query = "I have a homeless veteran who needs emergency shelter tonight"

print(f"QUERY: {query}")
print("=" * 70)

result = ask_case_manager_assistant(query)

print("\nSERVICES FOUND:")
for svc in result['services_found']:
    print(f"  - {svc}")

print("\n" + "=" * 70)
print("ASSISTANT RESPONSE:")
print("=" * 70)
print(result['answer'])

QUERY: I have a homeless veteran who needs emergency shelter tonight

SERVICES FOUND:
  - National Call Center for Homeless Veterans
  - Harm Reduction Shelter
  - Emergency Adult Shelter VVSD
  - Emergency Adult Shelter VVSD
  - Homeless Veterans' Reintegration Program

ASSISTANT RESPONSE:
**Recommendations for the Homeless Veteran**

Given the urgency of providing emergency shelter for the homeless veteran, I recommend:

1. **Service 2: Harm Reduction Shelter**: This service is specifically designed to provide a safe haven and support services for individuals experiencing homelessness with substance use conditions, including veterans. It's likely that this individual would benefit from the harm reduction approach and access to partner service providers.
2. **Service 3: Emergency Adult Shelter VVSD (Male)**: As the veteran is male, this shelter may be a more suitable option. However, please note that this shelter has an income limitation, which might not be relevant for emergency situ

In [6]:
# Test query 2: Family with children
query = "Single mother with 2 kids facing eviction, needs rent assistance and possibly shelter"

print(f"QUERY: {query}")
print("=" * 70)

result = ask_case_manager_assistant(query)

print("\nSERVICES FOUND:")
for svc in result['services_found']:
    print(f"  - {svc}")

print("\n" + "=" * 70)
print("ASSISTANT RESPONSE:")
print("=" * 70)
print(result['answer'])

QUERY: Single mother with 2 kids facing eviction, needs rent assistance and possibly shelter

SERVICES FOUND:
  - Long Term Transitional Housing
  - Rent and Utility Payment Assistance
  - City of San Diego Eviction Prevention Program
  - Emergency Family Shelter
  - Transitional Housing for Families, St Vincent de Paul Village

ASSISTANT RESPONSE:
After reviewing the provided services, I recommend the following:

1. **City of San Diego Eviction Prevention Program**: Given the family's immediate need to prevent eviction, this program would be an excellent starting point. The case manager can help the family navigate the application process and ensure they meet the eligibility criteria. This service will provide the necessary assistance to prevent homelessness.
2. **Long Term Transitional Housing at Solutions for Change**: Although it offers a longer-term solution, I recommend considering transitional housing as an option. It provides essential support services, such as case management,

In [None]:
# Test query 3: Mental health
query = "Young adult age 20 experiencing mental health crisis, needs counseling and possibly housing"

print(f"QUERY: {query}")
print("=" * 70)

result = ask_case_manager_assistant(query)

print("\nSERVICES FOUND:")
for svc in result['services_found']:
    print(f"  - {svc}")

print("\n" + "=" * 70)
print("ASSISTANT RESPONSE:")
print("=" * 70)
print(result['answer'])

QUERY: Young adult age 20 experiencing mental health crisis, needs counseling and possibly housing


## Interactive Query Mode

Run this cell to ask your own questions:

In [None]:
# Interactive mode - change this query and run the cell
your_query = "elderly person needs help with food and utilities"

print(f"QUERY: {your_query}")
print("=" * 70)

result = ask_case_manager_assistant(your_query)

print("\nSERVICES FOUND:")
for svc in result['services_found']:
    print(f"  - {svc}")

print("\n" + "=" * 70)
print("ASSISTANT RESPONSE:")
print("=" * 70)
print(result['answer'])

## Advanced: Compare Services Side-by-Side

In [None]:
def compare_services(query, client_profile):
    """
    Ask the LLM to compare services and recommend the best fit.
    """
    search_results = search_services(query, n_results=5)
    services_context = format_services_for_llm(search_results)
    
    prompt = f"""Compare these services for this client:

CLIENT PROFILE:
{client_profile}

AVAILABLE SERVICES:
{services_context}

Please:
1. Create a comparison table of the top 3 most relevant services
2. Recommend which service is the BEST fit and why
3. List any services that are NOT a good fit and why
4. Suggest a step-by-step action plan for the case manager"""

    response = ollama.chat(
        model='llama3.2',
        messages=[{'role': 'user', 'content': prompt}]
    )
    
    return response['message']['content']

# Example comparison
client = """
- 45 year old male
- Veteran (Army, 8 years)
- Currently sleeping in car
- Has part-time job at warehouse
- No mental health or substance issues
- Needs: stable housing, maybe help with deposit
"""

print("CLIENT PROFILE:")
print(client)
print("=" * 70)
print("\nCOMPARISON & RECOMMENDATIONS:")
print("=" * 70)
print(compare_services("veteran housing assistance", client))

## Summary

We now have a working RAG pipeline that:

1. **Searches** our services database semantically
2. **Generates** helpful, contextual responses using Llama
3. **Compares** services for specific client profiles

### Next Steps (Future Enhancements)

| Component | Purpose | Model |
|-----------|---------|-------|
| Information Extraction | Structure messy fields (eligibility, hours) | DistilBERT / spaCy |
| Classification | Auto-tag services, predict eligibility | DistilBERT / BERT |
| Recommendation Ranking | Score services for specific clients | XGBoost + embeddings |
| Web Interface | Let case managers use this easily | Streamlit / Gradio |