# AI Course Copilot – Final Prototype (Phase 05)
# Team: Oscar Cortez, Judith Barrios, Ruben Valenzuela Alvarado
# Course: ITAI-2277 – Artificial Intelligence
# Description:
# This notebook contains the working RAG prototype, final integration,
# documentation links, and instructions for reproduction as required for Phase 05.


# 📦 Complete Project Package – Index

This project package includes:

### 1. Source Code
- RAG pipeline implementation  
- Retrieval system (FAISS)  
- Embedding generation  
- Re-ranking  
- Gradio UI prototype  

### 2. Documentation
- Technical Report (Phase 03)  
- System Integration Summary (Phase 04)  
- User Guide (Phase 05)  
- Architecture diagrams  

### 3. Data Artifacts
- Embeddings  
- Vector index  
- Chunked course documents  

### 4. Prototype
- Fully working Gradio interface  
- Query → Retrieval → Answer pipeline  

### 5. Reproducibility Files
- requirements.txt  
- config.yaml  
- Instructions for running the system  


# ▶️ How to Run the Prototype

1. Install dependencies  
2. Load FAISS index  
3. Initialize retriever  
4. Run Gradio interface  

All steps are included in this notebook for testing and evaluation.


In [1]:
!pip install sentence-transformers faiss-cpu gradio


Collecting faiss-cpu
  Downloading faiss_cpu-1.13.0-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.7 kB)
Downloading faiss_cpu-1.13.0-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (23.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.6/23.6 MB[0m [31m26.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.13.0


In [2]:
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
import gradio as gr
import textwrap

# === Real mini-corpus for ITAI-2277 (Capstone) ===
documents = [
    {
        "id": "doc1",
        "course": "ITAI 2277",
        "week": "Course Overview",
        "doc_type": "Syllabus",
        "title": "ITAI 2277 – Course Vision and Welcome",
        "text": """
Houston Community College’s vision is to deliver relevant, high-quality education that ensures
success for all students, the community, and the economy. ITAI 2277 – Artificial Intel Resource
(Capstone) is taught fully online (WW – Online Anytime).

The course is led by Professor Anna Devarakonda (Annapurna Rachapudi). The class focuses on
an AI Applications Capstone Project where students design and deploy real-world AI solutions in
domains such as healthcare, finance, sustainability, and more. Students work with tools like
TensorFlow, PyTorch, and cloud platforms, and practice AI application development, model deployment,
and professional collaboration.

The course does not use a traditional textbook. Instead, all instructional materials are provided
through Canvas, using curated, up-to-date articles, papers, videos, and resources. Students are
encouraged to check the course site and their HCC email at least once per day and to start from
the Modules section in Canvas.
"""
    },
    {
        "id": "doc2",
        "course": "ITAI 2277",
        "week": "Course Requirements",
        "doc_type": "Syllabus",
        "title": "ITAI 2277 – Assignments, Grading and Workload",
        "text": """
ITAI 2277 uses multiple graded components:

– Module Group Assignments (15%): change according to the lecture topic. Formats may include
  written Word/PDF documents, PowerPoint presentations, or multimedia submissions.

– Case Study Analysis (20%): tied to lecture topics and focused on applying concepts.

– Exams/Quizzes (20%): 4 separate assessments covering key concepts, using multiple-choice,
  true/false, and short-answer questions.

– Midterm (20%): group case study analysis that tests analytical skills related to ethical,
  philosophical, and practical applications of AI in a specific industry.

– Final Project (25%): students, working in groups, create a proposal for integrating AI into a
  new or existing process within a chosen sector. This is the main capstone-style deliverable.

There is also an optional Extra Credit Portfolio (5%) for uploading course work to GitHub to
continue building an AI portfolio.

The HCC grading system uses A (90–100), B (80–89), C (70–79), D (60–69), F or FX (failing),
W (withdrawn), and I (incomplete), according to standard HCC policies.
"""
    },
    {
        "id": "doc3",
        "course": "ITAI 2277",
        "week": "Policies",
        "doc_type": "Syllabus",
        "title": "ITAI 2277 – Incompletes, Attendance, Make-Up Work, Academic Integrity",
        "text": """
Incomplete grades (“I”) are only considered if the student has completed at least 85% of the work
in the course, and the instructor still has the discretion to decline the request.

Make-up exams and assignments are allowed only for documented emergencies, such as hospitalization
or auto accidents. They do not apply to reasons like forgetting the due date or being busy with work.
Documentation must be provided as soon as possible. All missed grades are recorded as zeros if
no approved make-up is arranged.

Online students must show satisfactory progress in the course. Students may be withdrawn if they
miss turning in assignments that total more than 12.5% of the course work before the final exam.
Students are responsible for contacting the instructor if they are having a problem.

Academic Integrity: Scholastic dishonesty results in referral to the Dean of Student Services.
Group work is allowed, but groups must not share the same files and then make minor changes to
submit as their own. Using copied work or unauthorized collaboration may result in a 0 on the
assignment and a disciplinary referral. Students must follow HCC academic integrity procedures.
"""
    },
    {
        "id": "doc4",
        "course": "ITAI 2277",
        "week": "Final Project",
        "doc_type": "Assignment",
        "title": "ITAI 2277 – Capstone Final Project and Presentation",
        "text": """
The Capstone Final Assignment for ITAI 2277 is titled “Capstone Project 2025.” Students must design,
develop, and submit a GitHub repository as their final class project, along with a PowerPoint or PDF.

This capstone course allows students to synthesize knowledge from the entire Associate degree in
Applied Technologies – AI and Robotics. Working in teams, students build a substantial project that
integrates multiple technologies from computer vision, natural language processing, robotics,
machine learning, deep learning, and related areas.

By the end of the course, students should be able to:
1. Plan and execute a comprehensive project that integrates multiple AI/ML technologies.
2. Apply knowledge from core courses to implement solutions to real-world problems.
3. Collaborate using industry-standard tools and professional practices.
4. Document and communicate technical work through professional reports and presentations.
5. Evaluate and refine solutions using technical metrics, stakeholder requirements, and ethical considerations.
6. Deliver a complete, portfolio-ready project that demonstrates readiness for professional AI/ML roles.

Final presentation requirements include:
– A public GitHub repo with a clear README and installation instructions.
– A formal project presentation (around 20 minutes per team).
– A live demonstration and Q&A session showcasing the system and its impact.
"""
    },
]


In [3]:
# load model
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# Convert text embedding
corpus_texts = [doc["text"] for doc in documents]
embeddings = model.encode(corpus_texts, convert_to_numpy=True, show_progress_bar=True)

# Create faiss
dim = embeddings.shape[1]
index = faiss.IndexFlatL2(dim)
index.add(embeddings)

print("Index size:", index.ntotal)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Index size: 4


In [4]:
def retrieve(query, k=3):
    """Return top-k docs for a query with similarity scores."""
    query_emb = model.encode([query], convert_to_numpy=True)
    distances, indices = index.search(query_emb, k)
    results = []
    for rank, (idx, dist) in enumerate(zip(indices[0], distances[0]), start=1):
        doc = documents[idx]
        results.append({
            "rank": rank,
            "score": float(dist),
            "id": doc["id"],
            "title": doc["title"],
            "course": doc["course"],
            "week": doc["week"],
            "doc_type": doc["doc_type"],
            "snippet": textwrap.shorten(doc["text"].replace("\n", " "), width=280)
        })
    return results


def format_answer(query, k=3):
    """Generate a simple answer using retrieved snippets + citations."""
    results = retrieve(query, k=k)
    if not results:
        return "I couldn't find any relevant course materials for this question."

    # Parte tipo "respuesta" (extractive)
    top = results[0]
    answer_intro = (
        f"Based on the course materials, here is a relevant explanation:\n\n"
        f"{top['snippet']}\n\n"
    )

    # Citations
    citations_lines = []
    for r in results:
        citations_lines.append(
            f"[{r['rank']}] {r['title']} "
            f"({r['course']}, {r['week']}, {r['doc_type']})"
        )
    citations_text = "Sources:\n" + "\n".join(citations_lines)

    return answer_intro + citations_text


# Prueba rápida en la consola
print(format_answer("What is the late work policy in ITAI 1370?"))


Based on the course materials, here is a relevant explanation:

ITAI 2277 uses multiple graded components: – Module Group Assignments (15%): change according to the lecture topic. Formats may include written Word/PDF documents, PowerPoint presentations, or multimedia submissions. – Case Study Analysis (20%): tied to lecture topics and [...]

Sources:
[1] ITAI 2277 – Assignments, Grading and Workload (ITAI 2277, Course Requirements, Syllabus)
[2] ITAI 2277 – Incompletes, Attendance, Make-Up Work, Academic Integrity (ITAI 2277, Policies, Syllabus)
[3] ITAI 2277 – Capstone Final Project and Presentation (ITAI 2277, Final Project, Assignment)


In [5]:
def rag_chat(query):
    return format_answer(query, k=3)

demo = gr.Interface(
    fn=rag_chat,
    inputs=gr.Textbox(lines=2, label="Ask a course-related question"),
    outputs=gr.Textbox(lines=12, label="AI Course Copilot Answer"),
    title="AI Course Copilot – RAG Prototype",
    description="Ask about course policies, assignments, or AI topics. The assistant answers using approved course materials and shows citations."
)

demo.launch()


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://6739650c6da26ebec3.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


