---

### ðŸŽ“ **Professor**: Apostolos Filippas

### ðŸ“˜ **Class**: AI Engineering

### ðŸ“‹ **Homework 5**: RAG System for Fordham University

### ðŸ“… **Due Date**: Day of Lecture 7, 11:59 PM

### Difficulty: â˜…â˜…â˜…â˜…â˜†


**Note**: You are not allowed to share the contents of this notebook with anyone outside this class without written permission by the professor.

---

In this homework, you'll complete the RAG (Retrieval Augmented Generation) system you started building in Lecture 5. You will build an end-to-end pipeline that can answer questions about Fordham University using real data scraped from the Fordham website.

This is an open-ended assignment â€” there is no single right implementation, and you're encouraged to experiment with chunking strategies, embedding models, prompt design, and retrieval parameters to improve your system. **I will grade your system by testing it with specific questions that I know the answer to.**

---

## Instructions

- You may use ChatGPT, Claude, documentation, Stack Overflow, etc. When using external resources, briefly cite them in a comment.
- Your submission must include **pre-computed embeddings** and any other artifacts needed so that I can run your RAG system without recomputing anything expensive. Share it with us in a way that makes sense.
- Run all cells before submitting to ensure they work.

**Submission:**
1. Create a branch called `homework-5`
2. Commit and push your work (notebook + Streamlit app + saved embeddings/artifacts in `temp/`)
3. Create a PR and merge to main
4. Submit the `.ipynb` file on Blackboard

---

## Step 1: Load and Chunk the Fordham Website Data

In `data/fordham-website.zip` you'll find **~9,500 Markdown files** scraped from Fordham's website. Each file is one page â€” admissions info, program descriptions, faculty pages, financial aid, campus life, and more. The first line of every file is the **URL** of the page it was scraped from. The rest is the page content in Markdown.

Think about: chunk size, what to split on (paragraphs, headers, fixed length, etc.), whether chunks should overlap, and how to track which page each chunk came from.

In [2]:
# YOUR CODE HERE
# Placeholder for your implementation
from pathlib import Path
import zipfile

zip_path = Path("fordham-website.zip")

def clean_page_name(path_str: str) -> str:
    """
    Turn 'about/index.html' or 'news-events.html' into a clean page name.
    """
    # Take just the file name (last component)
    fname = Path(path_str).name
    # Strip extension
    stem = Path(fname).stem
    # Replace separators with spaces and title-case
    clean = stem.replace("-", " ").replace("_", " ").strip().title()
    return clean

docs = []

with zipfile.ZipFile(zip_path, "r") as zf:
    for info in zf.infolist():
        # Skip directories
        if info.is_dir():
            continue
        
        # Read file bytes and decode as text
        content_bytes = zf.read(info.filename)
        try:
            text = content_bytes.decode("utf-8")
        except UnicodeDecodeError:
            # Fallback if some file has a different encoding
            text = content_bytes.decode("latin-1", errors="ignore")
        
        page_name = clean_page_name(info.filename)
        
        docs.append({
            "page_name": page_name,   # clean page name / fieldname
            "content": text           # full file content as a string
        })

len(docs), docs[:3]  # quick sanity check


(9560,
 [{'page_name': 'Index',
   'content': 'https://www.fordham.edu/\n\n## Doing Good That Becomes Greater As The Jesuit University of New York\n\nWeâ€™re located in New York Cityâ€”driven by our Jesuit values and tackling todayâ€™s most pressing issues at the center of the world stage.\n\n## We Are Leaders, Dreamers, Achievers, And Doers\n\nWith sound hearts, strong minds, and the wisdom to take charge, generations of Rams have found what they have needed to growâ€”the opportunities, connections, and support of this community.\n\n## From Winding Elms to Bustling City Blocks\n\nWith residential campuses in the Bronx and Manhattan, as well as campuses in Westchester and London, Fordham provides endless opportunities to start working toward your career and building the life you want.\n\n\n## Weâ€™re Drawn to Where Weâ€™re Needed Most\n\nExplore how our values come to life: how Fordhamâ€™s students, faculty, and alumni contribute to society and make lives better.\n\n**Notice of Nondisc

In [3]:
def chunk_documents(
    docs: list[dict],
    chunk_size: int = 900,
    overlap: int = 0,
) -> list[dict]:
    """
    Split documents into fixed-size character chunks (default 900 chars).
    Each chunk keeps metadata so you know which document it came from.

    Args:
        docs: List of dicts with "page_name" and "content" keys
        chunk_size: Target number of characters per chunk
        overlap: Number of characters to overlap between chunks (0 = no overlap)

    Returns:
        List of chunk dicts with page_name, source_url, chunk_index, content
    """
    chunks = []
    chunk_id = 0

    for doc in docs:
        page_name = doc["page_name"]
        text = doc["content"]

        # First line is the URL; rest is content
        lines = text.strip().split("\n")
        source_url = lines[0].strip() if lines else ""
        body = "\n".join(lines[1:]) if len(lines) > 1 else ""

        i = 0
        chunk_index = 0

        while i < len(body):
            chunk_text = body[i: i + chunk_size]
            chunk_dict = {
                "page_name": page_name,
                "source_url": source_url,
                "chunk_index": chunk_index,
                "content": chunk_text,
            }
            chunks.append(chunk_dict)
            chunk_index += 1
            if overlap > 0:
                i += chunk_size - overlap
            else:
                i += chunk_size

    return chunks

# Run chunking on docs (from cell above)
chunks = chunk_documents(docs)
print(f"Created {len(chunks):,} chunks from {len(docs):,} documents")
print(f"Example chunk: {chunks[0]}")


Created 52,694 chunks from 9,560 documents
Example chunk: {'page_name': 'Index', 'source_url': 'https://www.fordham.edu/', 'chunk_index': 0, 'content': '\n## Doing Good That Becomes Greater As The Jesuit University of New York\n\nWeâ€™re located in New York Cityâ€”driven by our Jesuit values and tackling todayâ€™s most pressing issues at the center of the world stage.\n\n## We Are Leaders, Dreamers, Achievers, And Doers\n\nWith sound hearts, strong minds, and the wisdom to take charge, generations of Rams have found what they have needed to growâ€”the opportunities, connections, and support of this community.\n\n## From Winding Elms to Bustling City Blocks\n\nWith residential campuses in the Bronx and Manhattan, as well as campuses in Westchester and London, Fordham provides endless opportunities to start working toward your career and building the life you want.\n\n\n## Weâ€™re Drawn to Where Weâ€™re Needed Most\n\nExplore how our values come to life: how Fordhamâ€™s students, faculty

---

## Step 2: Embed the Chunks

Turn each chunk into a vector so you can search over them. You can use a local model or an API model â€” your choice.

Once you've created your embeddings, **save them somewhere** so you (and I) don't have to redo this step. Save the chunk metadata too (text, source URL, etc.).

In [9]:
# YOUR CODE HERE
import sys
from pathlib import Path

# Ensure helpers is on path (run notebook from lectures/ or project root)
_lectures = Path.cwd() / "lectures"
if (_lectures / "helpers.py").exists():
    sys.path.insert(0, str(_lectures))
elif not (Path.cwd() / "helpers.py").exists():
    sys.path.insert(0, ".")

from helpers import get_local_model, batch_embed_local, batch_embed_openai, batch_cosine_similarity
from dotenv import load_dotenv
import numpy as np

load_dotenv()

# "local" = Hugging Face (default) | "openai" = OpenAI API
EMBED_MODEL_TYPE = "openai"

LOCAL_MODEL_NAME = "all-MiniLM-L6-v2"
OPENAI_MODEL_NAME = "text-embedding-3-small"


def get_embedding_model(model_type: str = EMBED_MODEL_TYPE):
    """Load the embedding model. Use same model at search time."""
    if model_type == "local":
        return get_local_model(LOCAL_MODEL_NAME)
    if model_type == "openai":
        return None  # API handles calls; no model object
    raise ValueError(f"model_type must be 'local' or 'openai', got {model_type}")


def _truncate_for_openai(text: str, max_chars: int = 8000) -> str:
    """Simple safeguard: trim very long chunks before sending to OpenAI."""
    if len(text) <= max_chars:
        return text
    return text[:max_chars]


def embed_texts(texts: list[str], model_type: str = EMBED_MODEL_TYPE, show_progress: bool = True) -> np.ndarray:
    """Embed a list of texts. Returns (n_texts, dim) numpy array."""
    if model_type == "local":
        return batch_embed_local(texts, model_name=LOCAL_MODEL_NAME, show_progress=show_progress)
    if model_type == "openai":
        safe_texts = [_truncate_for_openai(t) for t in texts]
        return batch_embed_openai(safe_texts, model=OPENAI_MODEL_NAME, batch_size=50, verbose=show_progress)
    raise ValueError(f"model_type must be 'local' or 'openai', got {model_type}")


# Set to True to only embed 100 chunks for a quick test
TEST_MODE = False
embeddings_path = Path("chunk_embeddings.npy")

if embeddings_path.exists():
    # Load previously computed embeddings to avoid re-embedding
    chunk_embeddings = np.load(embeddings_path)
    n_chunks = chunk_embeddings.shape[0]
    chunks_to_embed = chunks[:n_chunks]
    print(f"Loaded existing embeddings from {embeddings_path} with shape {chunk_embeddings.shape}")
else:
    # Compute embeddings (optionally in TEST_MODE) and save them
    n_chunks = 100 if TEST_MODE else len(chunks)
    chunks_to_embed = chunks[:n_chunks]

    texts = [c["content"] for c in chunks_to_embed]
    chunk_embeddings = embed_texts(texts, model_type=EMBED_MODEL_TYPE)
    print(f"Embedded {len(chunks_to_embed):,} chunks â†’ shape {chunk_embeddings.shape}")

    # Save embeddings so you don't redo this step
    np.save(embeddings_path, chunk_embeddings)
    print(f"Saved embeddings to {embeddings_path}")


Loaded existing embeddings from chunk_embeddings.npy with shape (52694, 1536)


---

## Step 3: Retrieve

Build the **R** in RAG. Write a function that takes a question and returns the most relevant chunks. You can use semantic search, BM25, hybrid â€” whatever you think works best.

Test it on a few questions and eyeball whether the results make sense.

In [10]:
# YOUR CODE HERE
from typing import List, Dict, Any


def retrieve_chunks(
    question: str,
    k: int = 5,
    model_type: str = EMBED_MODEL_TYPE,
) -> List[Dict[str, Any]]:
    """Retrieve the top-k most relevant chunks for a question.

    Args:
        question: User question in plain text
        k: Number of chunks to return
        model_type: "local" or "openai" (must match how `chunk_embeddings` were created)

    Returns:
        List of chunk dicts augmented with rank and similarity score
    """
    # 1. Embed the question using the same model type
    if model_type == "local":
        model = get_embedding_model("local")
        query_emb = model.encode(question, convert_to_numpy=True)
    elif model_type == "openai":
        q_text = _truncate_for_openai(question)
        q_embs = batch_embed_openai([q_text], model=OPENAI_MODEL_NAME, batch_size=1, verbose=False)
        query_emb = q_embs[0]
    else:
        raise ValueError("model_type must be 'local' or 'openai'")

    # 2. Compute cosine similarity against all chunk embeddings
    scores = batch_cosine_similarity(query_emb, chunk_embeddings)

    # 3. Get top-k indices
    top_idx = np.argsort(-scores)[:k]

    results: List[Dict[str, Any]] = []
    for rank, idx in enumerate(top_idx, start=1):
        chunk = chunks_to_embed[idx]
        results.append(
            {
                "rank": rank,
                "score": float(scores[idx]),
                **chunk,
            }
        )

    return results


# Quick sanity check
# sample_question = "How do I apply for financial aid?"
# retrieved = retrieve_chunks(sample_question, k=3, model_type=EMBED_MODEL_TYPE)
# for r in retrieved:
#     print(f"Rank {r['rank']} | score={r['score']:.4f} | page={r['page_name']}")
#     print(r["content"][:200].replace("\n", " "), "...\n")
#     print("-" * 80)



---

## Step 4: Generate

Build the **G** in RAG. Write a function that takes a question and the retrieved chunks, builds a prompt, and calls an LLM to generate an answer.

Think about: how to structure the prompt, what the LLM should do when the context doesn't contain the answer, and which model to use.

In [11]:
# YOUR CODE HERE
import litellm

def generate_answer(
    question: str,
    retrieved_chunks: list[dict],
    model: str = "gpt-4o-mini",
) -> str:
    """Build a prompt from question + chunks, call LLM, return answer."""
    context_parts = []
    for c in retrieved_chunks:
        context_parts.append(f"[Source: {c.get('page_name', 'Unknown')}]\n{c['content']}")
    context = "\n\n---\n\n".join(context_parts)

    system = (
        "You are a helpful assistant that answers questions about Fordham University "
        "using only the provided context. If the context does not contain the answer, "
        "say so clearly. Do not make up information."
    )
    user = f"Context:\n\n{context}\n\nQuestion: {question}"

    response = litellm.completion(
        model=model,
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": user},
        ],
    )
    return response.choices[0].message.content.strip()


---

## Step 5: Wire it Together

Combine the previous steps into a single `rag(question)` function. Question in, answer out.

In [12]:
# YOUR CODE HERE
def rag(question: str, k: int = 5, model: str = "gpt-4o-mini") -> str:
    """User's question flows in â†’ retrieve â†’ generate_answer. Question in, answer out."""
    chunks = retrieve_chunks(question, k=k, model_type=EMBED_MODEL_TYPE)
    return generate_answer(question, chunks, model=model)


# Example: question is the input; it flows to retrieve, then to generate_answer
user_question = input("Enter your question about Fordham University: ")
answer = rag(user_question)
print(answer)


Apostolos Filippas is an assistant professor at Fordham University's Gabelli School of Business. He is the Christopher Blake Distinguished Research Scholar in Business and specializes in Information, Technology, and Operations. He holds a Ph.D. in Business Administration from the NYU Stern School of Business and is an economist focusing on market design and the economics of technology, particularly in relation to online platforms. His teaching includes courses on Web Analytics and E-Commerce. Filippas also conducts research on topics such as reputation and pricing systems in online marketplaces and the economic implications of the "sharing economy."


In [13]:
# Demonstrate your RAG system

demo_questions = [
    "What programs does the Gabelli School of Business offer?",
    "How do I apply for financial aid at Fordham?",
    "What is the tuition for undergraduate students?",
    "Tell me about Fordham's campus locations.",
    "What research opportunities are available for students?",
]

for q in demo_questions:
    print(f"Q: {q}")
    answer = rag(q)
    print(f"A: {answer}")
    print("-" * 80)

Q: What programs does the Gabelli School of Business offer?
A: The Gabelli School of Business offers the following graduate business programs:

1. Full-time cohort M.B.A.
2. Professional M.B.A. (including part-time)
3. Executive M.B.A.

Additionally, it provides undergraduate business degrees.
--------------------------------------------------------------------------------
Q: How do I apply for financial aid at Fordham?
A: To apply for financial aid at Fordham University, follow these steps:

1. Complete the Free Application for Federal Student Aid (FAFSA) using Fordhamâ€™s school code: 002722. Sign your FAFSA application electronically using your Federal Student Aid (FSA) ID.
2. If you do not have an FSA ID, you will need to create one when logging in to the FAFSA.
3. Claim your Fordham ID, which will give you access to the financial aid channel of your student portal.
4. Complete the Direct Loan Request Form, which can be done electronically in your student portal once you are admitt

---

## Step 6: Evaluate Your RAG System

A working RAG system is great â€” but how do you know it's actually *good*? You can't improve what you can't measure. In this step you'll build an evaluation framework using concepts from Lecture 6.

There are two things to evaluate in a RAG system:
- **Retrieval quality**: Are you finding the right chunks?
- **Answer quality**: Is the generated answer correct and grounded in the context?

### Build a test set

Create a test set of at least **10 question-answer pairs**. For each pair, provide the question and the expected answer (look it up in the data). Cover a range of question types â€” factual, procedural, about specific programs, etc.

### Evaluate retrieval

For each question, check whether the retrieved chunks actually contain relevant information. You can do this manually or automatically (e.g., use an LLM to judge relevance). Compute **context relevance** â€” the fraction of retrieved chunks that are actually useful.

### Evaluate answers with LLM-as-judge

Use an LLM to evaluate your system's answers on two dimensions:

1. **Faithfulness**: Does the answer only use information from the retrieved context? (No hallucination)
2. **Correctness**: Is the answer factually correct compared to the expected answer?

Use **structured outputs** (Pydantic) to get consistent scores from the judge. A starting schema is provided below â€” feel free to modify it.

In [16]:
from pydantic import BaseModel, Field

class RAGEvaluation(BaseModel):
    faithfulness_score: int = Field(
        ..., ge=1, le=5,
        description="1=completely hallucinated, 5=fully grounded in context"
    )
    faithfulness_reasoning: str = Field(
        ..., description="Brief explanation of the faithfulness score"
    )
    correctness_score: int = Field(
        ..., ge=1, le=5,
        description="1=completely wrong, 5=fully correct and complete"
    )
    correctness_reasoning: str = Field(
        ..., description="Brief explanation of the correctness score"
    )

In [18]:
# Build a small evaluation harness for the RAG system

import re
import json
from statistics import mean
from typing import List, Dict

# --- 1. Test set of questionâ€“answer pairs ---
# These answers are short "gold" references the judge will compare against.
TEST_SET: List[Dict[str, str]] = [
    {
        "question": "What programs does the Gabelli School of Business offer?",
        "expected_answer": "Gabelli offers undergraduate business degrees and several MBA programs, including full-time cohort, professional/part-time, and executive MBAs.",
    },
    {
        "question": "How do I apply for financial aid at Fordham?",
        "expected_answer": "Complete the FAFSA using Fordham's school code and follow the steps in the student portal and financial aid pages.",
    },
    {
        "question": "What is the tuition for undergraduate students at Fordham?",
        "expected_answer": "Undergraduate tuition is specified on Fordham's tuition and fees page for the current academic year.",
    },
    {
        "question": "Where are Fordham's main campuses located?",
        "expected_answer": "Fordham's main campuses are Rose Hill in the Bronx and Lincoln Center in Manhattan, with additional locations such as Westchester and London.",
    },
    {
        "question": "What research opportunities are available for students at Fordham?",
        "expected_answer": "Fordham offers undergraduate and graduate research opportunities, including working with faculty, research centers, and funded projects.",
    },
    {
        "question": "How do I apply as an international undergraduate student to Fordham?",
        "expected_answer": "International applicants submit the application, transcripts, English proficiency scores, and other required documents listed on the international admission page.",
    },
    {
        "question": "What housing options are available for first-year students?",
        "expected_answer": "Fordham provides on-campus residence halls for first-year students, primarily at Rose Hill and Lincoln Center.",
    },
    {
        "question": "Does Fordham offer study abroad programs?",
        "expected_answer": "Fordham offers study abroad programs and has a campus in London along with partnerships in other locations.",
    },
    {
        "question": "What scholarships are available to incoming undergraduates?",
        "expected_answer": "Fordham offers merit-based and need-based scholarships described on its undergraduate financial aid and scholarships pages.",
    },
    {
        "question": "What is Fordham's mission as a Jesuit university?",
        "expected_answer": "Fordham's mission emphasizes Jesuit values, educating students for justice, leadership, and service in New York City and beyond.",
    },
]


# --- 2. Simple retrieval-quality metric (context relevance) ---

_STOPWORDS = {
    "what",
    "how",
    "is",
    "are",
    "the",
    "a",
    "an",
    "of",
    "for",
    "at",
    "to",
    "in",
    "on",
    "about",
    "does",
    "do",
    "me",
    "tell",
    "students",
    "student",
    "fordham",
    "university",
}


def _extract_keywords(text: str) -> List[str]:
    """Very simple keyword extractor for retrieval evaluation."""
    tokens = re.findall(r"[a-zA-Z]+", text.lower())
    return [t for t in tokens if t not in _STOPWORDS and len(t) > 2]


def evaluate_retrieval_context_relevance(question: str, retrieved_chunks: List[Dict]) -> float:
    """Heuristic: fraction of retrieved chunks containing any question keyword.

    This is a cheap, deterministic proxy for context relevance. A more
    advanced version could use an LLM-as-judge, but this keeps evaluation
    fast and reproducible.
    """
    keywords = _extract_keywords(question)
    if not retrieved_chunks or not keywords:
        return 0.0

    relevant = 0
    for c in retrieved_chunks:
        content = c.get("content", "").lower()
        if any(kw in content for kw in keywords):
            relevant += 1
    return relevant / len(retrieved_chunks)


# --- 3. LLM-as-judge for answer quality (faithfulness + correctness) ---


def judge_answer_with_llm(
    question: str,
    expected_answer: str,
    system_answer: str,
    model: str = "gpt-4o-mini",
) -> RAGEvaluation:
    """Use an LLM to score faithfulness and correctness via RAGEvaluation.

    The model is instructed to return a JSON object that we validate with
    the RAGEvaluation Pydantic schema.
    """
    system_prompt = (
        "You are an evaluator for a RAG system. You will receive a user question, "
        "the system's answer, and a reference expected answer. "
        "Score faithfulness (grounding in the system answer's context is approximated "
        "by how cautious and non-hallucinatory it is) and correctness (alignment with "
        "the reference answer) on a 1â€“5 scale, and respond ONLY with a JSON object "
        "matching the RAGEvaluation schema."
    )

    user_prompt = {
        "question": question,
        "system_answer": system_answer,
        "expected_answer": expected_answer,
    }

    resp = litellm.completion(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": json.dumps(user_prompt)},
        ],
        response_format={"type": "json_object"},
    )

    raw = resp.choices[0].message.content
    data = json.loads(raw)

    # Map model keys into the schema fields, allowing for "faithfulness"/"correctness" short keys
    faith_score = data.get("faithfulness_score", data.get("faithfulness"))
    corr_score = data.get("correctness_score", data.get("correctness"))

    eval_data = {
        "faithfulness_score": faith_score,
        "correctness_score": corr_score,
        "faithfulness_reasoning": data.get(
            "faithfulness_reasoning",
            f"Model rated faithfulness as {faith_score}.",
        ),
        "correctness_reasoning": data.get(
            "correctness_reasoning",
            f"Model rated correctness as {corr_score}.",
        ),
    }

    return RAGEvaluation(**eval_data)


# --- 4. Run evaluation over the test set and summarize ---

retrieval_scores: List[float] = []
faithfulness_scores: List[int] = []
correctness_scores: List[int] = []

for i, item in enumerate(TEST_SET, start=1):
    q = item["question"]
    gold = item["expected_answer"]

    # Retrieve context
    retrieved = retrieve_chunks(q, k=5, model_type=EMBED_MODEL_TYPE)
    ctx_rel = evaluate_retrieval_context_relevance(q, retrieved)
    retrieval_scores.append(ctx_rel)

    # Generate system answer using our RAG pipeline
    system_answer = generate_answer(q, retrieved, model="gpt-4o-mini")

    # Judge with LLM
    eval_result = judge_answer_with_llm(q, gold, system_answer, model="gpt-4o-mini")
    faithfulness_scores.append(eval_result.faithfulness_score)
    correctness_scores.append(eval_result.correctness_score)

    print(f"===== Example {i} =====")
    print(f"Question: {q}")
    print(f"System answer: {system_answer[:400]}{'...' if len(system_answer) > 400 else ''}")
    print(f"Context relevance (fraction of useful chunks): {ctx_rel:.2f}")
    print(
        f"Faithfulness: {eval_result.faithfulness_score} | "
        f"Correctness: {eval_result.correctness_score}"
    )
    print("-")

print("\n===== Aggregate Evaluation =====")
print(f"Average context relevance: {mean(retrieval_scores):.2f}")
print(f"Average faithfulness score: {mean(faithfulness_scores):.2f} / 5")
print(f"Average correctness score: {mean(correctness_scores):.2f} / 5")


===== Example 1 =====
Question: What programs does the Gabelli School of Business offer?
System answer: The Gabelli School of Business offers three variants of the MBA program: 

1. Full-time cohort M.B.A.
2. Professional M.B.A. (including part-time options)
3. Executive M.B.A.

Additionally, they provide graduate business degrees and undergraduate business degrees, focusing on leveraging business for positive change.
Context relevance (fraction of useful chunks): 1.00
Faithfulness: 5 | Correctness: 5
-
===== Example 2 =====
Question: How do I apply for financial aid at Fordham?
System answer: To apply for financial aid at Fordham University, you need to follow these steps:

1. Complete the Free Application for Federal Student Aid (FAFSA) using Fordhamâ€™s school code: 002722.
2. Sign your FAFSA application electronically using your Federal Student Aid (FSA) ID. If you do not have an FSA ID, you will be prompted to create one while logging in.
3. Claim your Fordham ID to gain access to

---

## Step 7: Build a Streamlit App

Your RAG system lives inside a notebook â€” that's great for development, but nobody is going to use a Jupyter notebook to ask questions about Fordham. Turn it into a web app using [Streamlit](https://docs.streamlit.io/).

Create a `.py` file (e.g., `scripts/fordham_rag_app.py`) that:
1. Lets the user type a question about Fordham
2. Runs your RAG pipeline
3. Displays the answer and the source pages used

**Getting started:**
- Install: `uv pip install streamlit`
- Run: `streamlit run scripts/fordham_rag_app.py`

**Tip**: Use `@st.cache_resource` to avoid reloading embeddings on every interaction.

**Include a screenshot of your working app below.**

![alt text](RAG-streamlit-app.png)

---

## Step 8: How to Run Your System

Fill in the details below so that I can run and test your RAG system.

| Item | Your Answer |
|------|-------------|
| **Embedding model used** | |
| **LLM used for generation** | |
| **LLM used for evaluation (judge)** | |
| **Saved artifacts** | (list the files â€” e.g., `embeddings.npy`, `chunks.json`) |
| **How to start the Streamlit app** | (e.g., `streamlit run scripts/fordham_rag_app.py`) |
| **Any API keys or env vars needed** | (e.g., `OPENAI_API_KEY` in `.env`) |
| **Anything else I should know** | |

---

## Bonus: Experiment and Improve

Now that you have a working RAG system *and* a way to measure its quality, try to improve it. Use your evaluation framework to measure the impact of changes.

Ideas: different chunk sizes, different embedding models, hybrid search, better prompts, reranking, query rewriting. Document what you tried and show before/after evaluation scores.

In [None]:
# YOUR CODE HERE

---

## Git Submission

- [ ] Create a new branch called `homework-5`
- [ ] Commit your work (notebook + Streamlit app + saved artifacts in `temp/`)
- [ ] Push to GitHub
- [ ] Create a Pull Request and merge to main
- [ ] Submit the `.ipynb` file on Blackboard