## Adopting a Test-First Mindset
By approaching your generative AI project with a test-first mindset, you can ensure you are thinking about desired outcomes from the outset of your application development. In this section, we will explore some simple ways to consider the context of your data and define some ideal question-answer pairs to use as a testing guide.

### Step 1 - Inspect a sample patient encounter
Everything in a Retrieval-Augmented Generation (RAG) pipeline begins with understanding the raw data. Skimming a concrete record grounds the questions you’ll create next and reminds you where answers actually live (structured fields and unstructured notes).

Let's load one sample encounter that we can build a test case around.

In [None]:
# Step 1 – Load a sample encounter

# Required imports
import os, pandas as pd
from sentence_transformers import SentenceTransformer
from sqlalchemy import create_engine, text

from dotenv import load_dotenv
load_dotenv(override=True)

username = 'SuperUser'
password = 'SYS'
hostname = 'localhost'
port = 1972
namespace = 'IRISAPP'
CONNECTION_STRING = f"iris://{username}:{password}@{hostname}:{port}/{namespace}"
engine = create_engine(CONNECTION_STRING)

# Retrieve sample encounter 254
with engine.connect() as conn:
    with conn.begin():
        sql = text("""
            SELECT Encounter_ID, Description, Clinical_Notes, DESCRIPTION_OBSERVATIONS FROM GenAI.encounters WHERE Encounter_ID='254'
        """)
        results = conn.execute(sql).fetchall()

# Display results
df = pd.DataFrame(results)
pd.set_option("display.max_colwidth", None)
df.head()

### Step 2 - Create two realistic questions
RAG systems succeed or fail on the questions people really ask. Crafting them yourself ensures your future tests reflect authentic clinical or analytic needs.

Pretend you’re a clinician or researcher looking at the clinical notes from Encounter 254 above. Add exactly two natural-language questions to the qa_pairs list. Aim for specificity (“Which comorbidities existed before this admission?”) instead of broad asks (“Tell me about the patient”). Add the questions in the placeholder code below, then run the module.

In [None]:
# Step 2 – Add your two questions below
qa_pairs = [
    # Example:
    # {"question": "What comorbidities does this patient have prior to this encounter?"},
    # {"question": "Which medications were administered during this visit?"}
]

qa_pairs

### Step 3 - Define one "gold answer"
A single high-quality Q&A pair acts as a “canary” -- if later tweaks break this exact fact retrieval, you’ll see it immediately. Ground truth before you code.

Pick one of your two questions and paste the precise answer text straight from the encounter. Fill these into the `gold_answer` dictionary.

1. `question` – copy the question text.
2. `expected_answer` – paste the exact snippet (≤ 3 sentences) from the encounter note/fields.
3. `source` – list the encounter ID that proves the answer.

In [None]:
# Step 3 – Fill in your gold answer
gold_answer = {
    "question": "",           # copy one question here
    "expected_answer": "",    # paste the exact text snippet here
    "sources": ""             # e.g., "encounter 254"
}

gold_answer

### Step 4 - Run baseline retrieval + generation
Execute the stubbed retrieve_and_generate() function for your questions. You’ll see the model’s answer, the chunks it cited, and an automatic green ✅ / red ❌ check against your gold answer.

This is your baseline. All future tuning (different indexes, prompt tweaks, agents) must keep the green light on. Seeing a failure now tells you where the pipeline needs help later.

In [None]:
## THIS IS GEN-AI CREATED; THIS MAY NOT WORK BUT KEEPING HERE IN CASE IT TRIGGERS ANY IDEAS.

# Step 4 – Run baseline retrieval and generation (stub)
def retrieve_and_generate(question):
    """    TODO: Implement this with your own retrieval + generation pipeline.
    It should return a dict: {'answer': str, 'sources': [str]}
    """
    return {"answer": "placeholder answer", "sources": ["E123|note_chunk_7"]}

# Show answers for both questions
for q in qa_pairs:
    text = q["question"] if isinstance(q, dict) else q
    result = retrieve_and_generate(text)
    print("Q:", text)
    print("A:", result['answer'])
    print("Sources:", result['sources'])
    print("-" * 80)

# Simple pass/fail for the gold answer
if gold_answer["question"]:
    res = retrieve_and_generate(gold_answer["question"])
    hit = any(src in res["sources"] for src in gold_answer["sources"])
    status = "✅ PASS" if hit else "❌ FAIL"
    print(status, "- expected source(s)", "found." if hit else "NOT found.")

### Step 5 - Save your tiny test set
From this moment on, every notebook you touch can import this test file to ensure retrieval remains correct. It’s your regression guardrail for the rest of the workshop -- and a template you can copy into your own projects.

Run the module below to write `tests/tiny_test.json`, which stores your two questions and one gold answer.

In [None]:
## THIS IS GEN-AI CREATED; THIS MAY NOT WORK BUT KEEPING HERE IN CASE IT TRIGGERS ANY IDEAS.

# Step 5 – Save tiny_test.json
import json, os, pathlib, datetime

tiny_test = {
    "created": datetime.datetime.utcnow().isoformat(),
    "qa_pairs": qa_pairs,
    "gold_answer": gold_answer
}

os.makedirs("tests", exist_ok=True)
path = pathlib.Path("tests/tiny_test.json")
path.write_text(json.dumps(tiny_test, indent=2))
print(f"Saved {path.resolve()}")
