# COMP8420 Student Agent – Dataset Preparation

This notebook prepares the dataset for the Student Agent project. The dataset includes:

1. Lecture content extracted from COMP8420 PDF slides  
2. Practical files converted from Jupyter Notebooks (.ipynb)  
3. All content is stored in plain `.txt` files to be used in a Retrieval-Augmented Generation (RAG) pipeline


In [7]:
from langchain_community.document_loaders import PyMuPDFLoader
from pathlib import Path

# Set your input/output paths
pdf_folder = Path("/Users/shaimonrahman/Desktop/COMP8420/Lectures")  # Change this
output_folder = Path("/Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420/lectures")
output_folder.mkdir(parents=True, exist_ok=True)

# Extract each lecture PDF into a .txt file
for pdf_file in pdf_folder.glob("*.pdf"):
    loader = PyMuPDFLoader(str(pdf_file))
    documents = loader.load()
    full_text = "\n".join([doc.page_content for doc in documents])
    txt_filename = pdf_file.stem + ".txt"
    with open(output_folder / txt_filename, "w", encoding="utf-8") as f:
        f.write(full_text)
    print(f"Extracted: {pdf_file.name} → {txt_filename}")


Extracted: COMP8420-W6 - Dev LLMs - Fine-tuning.pdf → COMP8420-W6 - Dev LLMs - Fine-tuning.txt
Extracted: COMP8420-W9-v2.pdf → COMP8420-W9-v2.txt
Extracted: COMP8420-W4 - Use LLMs - Text processing.pdf → COMP8420-W4 - Use LLMs - Text processing.txt
Extracted: COMP8420-W1.pdf → COMP8420-W1.txt
Extracted: HEAL.pdf → HEAL.txt
Extracted: NLP_Guest_Lecture_HMC_V2.pdf → NLP_Guest_Lecture_HMC_V2.txt
Extracted: COMP8420-W13.pdf → COMP8420-W13.txt
Extracted: COMP8420-W10.pdf → COMP8420-W10.txt
Extracted: COMP8420-W7 - Und LLMs - Risk and future.pdf → COMP8420-W7 - Und LLMs - Risk and future.txt
Extracted: COMP8420-W8 - Dev LLMs - Humanoid AI - 2.pdf → COMP8420-W8 - Dev LLMs - Humanoid AI - 2.txt
Extracted: COMP8420-W5 - Dev LLMs - Multimodal LLMs.pdf → COMP8420-W5 - Dev LLMs - Multimodal LLMs.txt
Extracted: COMP8420-W11-v2.pdf → COMP8420-W11-v2.txt
Extracted: COMP8420-W2-v1.pdf → COMP8420-W2-v1.txt
Extracted: COMP8420-W3 - Und LLMs - Foundation models.pdf → COMP8420-W3 - Und LLMs - Foundation m

In [8]:
import nbformat

# Define paths for notebooks and output .txt
prac_folder = Path("/Users/shaimonrahman/Desktop/COMP8420/Prac")  # Folder with your .ipynb files
output_folder = Path("/Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420/practicals")
output_folder.mkdir(parents=True, exist_ok=True)

# Function to extract markdown + code
def extract_notebook_text(nb_path):
    nb = nbformat.read(open(nb_path, "r", encoding="utf-8"), as_version=4)
    content = []
    for cell in nb.cells:
        if cell.cell_type in ['markdown', 'code']:
            content.append(cell.source)
    return "\n\n".join(content)

# Convert all .ipynb files to .txt
for nb_file in prac_folder.glob("*.ipynb"):
    text = extract_notebook_text(nb_file)
    txt_name = nb_file.stem + ".txt"
    with open(output_folder / txt_name, "w", encoding="utf-8") as f:
        f.write(text)
    print(f"Converted: {nb_file.name} → {txt_name}")



Converted: COMP8420-week3-solution.ipynb → COMP8420-week3-solution.txt
Converted: COMP8420-week4-solution.ipynb → COMP8420-week4-solution.txt
Converted: COMP8420-week5-solution.ipynb → COMP8420-week5-solution.txt
Converted: COMP8420-week2-solution.ipynb → COMP8420-week2-solution.txt
Converted: COMP8420-week4-practice.ipynb → COMP8420-week4-practice.txt
Converted: COMP8420-week7-solution.ipynb → COMP8420-week7-solution.txt
Converted: COMP8420-week3-practice.ipynb → COMP8420-week3-practice.txt
Converted: COMP8420-week6-solution.ipynb → COMP8420-week6-solution.txt
Converted: COMP8420_week1_solution.ipynb → COMP8420_week1_solution.txt
Converted: COMP8420_week12_solution.ipynb → COMP8420_week12_solution.txt
Converted: COMP8420_week9_solution.ipynb → COMP8420_week9_solution.txt
Converted: COMP8420_week8_solution.ipynb → COMP8420_week8_solution.txt
Converted: COMP8420_week11_solution.ipynb → COMP8420_week11_solution.txt


In [9]:
import json
from pathlib import Path

# Output directory
output_dir = Path("/Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420")
output_dir.mkdir(parents=True, exist_ok=True)

# Define Q&A dataset
qa_data = [
    {
        "question": "What is a foundation model in NLP?",
        "answer": "A foundation model is a large pre-trained model trained on massive datasets, serving as the base for fine-tuning on specific NLP tasks."
    },
    {
        "question": "What are examples of foundation models?",
        "answer": "Examples include GPT-3.5, Claude, PaLM, BERT, LLaMA, and Mistral."
    },
    {
        "question": "What does fine-tuning mean in the context of LLMs?",
        "answer": "Fine-tuning is the process of continuing training on a pre-trained model using a smaller, task-specific dataset."
    },
    {
        "question": "What is the difference between prompt tuning and fine-tuning?",
        "answer": "Prompt tuning adjusts input formatting without altering the model, while fine-tuning retrains the model on new data."
    },
    {
        "question": "What was covered in Week 6 of COMP8420?",
        "answer": "Week 6 focused on fine-tuning large language models, covering parameter-efficient fine-tuning and adapter-based methods."
    },
    {
        "question": "When is the COMP8420 project presentation due?",
        "answer": "The presentation is scheduled for Week 13, Friday, June 6th, 2025."
    },
    {
        "question": "What is the final deadline for Assignment 3?",
        "answer": "Assignment 3 (code + report) is due during the exam period on June 17th, 2025."
    },
    {
        "question": "What type of model should we use for our project?",
        "answer": "You can use OpenAI’s GPT-3.5 or open-source models like LLaMA2 or Mistral, depending on your goals and data."
    },
    {
        "question": "How do we evaluate our NLP project?",
        "answer": "Use metrics like BLEU, ROUGE, accuracy, and ablation studies to compare performance against baselines or alternatives."
    },
    {
        "question": "What are embedding models used for?",
        "answer": "They convert text into vector form for similarity search and retrieval, commonly used in RAG pipelines."
    },
    {
        "question": "How do we retrieve answers in a RAG system?",
        "answer": "Text chunks are embedded into vectors and searched via similarity to retrieve relevant content for generation."
    },
    {
        "question": "How are lectures delivered in COMP8420?",
        "answer": "Lectures are delivered via PDFs and practical notebooks, combining theory and hands-on exercises."
    },
    {
        "question": "What technologies are used in this course?",
        "answer": "Hugging Face, PyTorch, LangChain, OpenAI APIs, and vector databases like FAISS."
    },
    {
        "question": "Can we use ChatGPT in our project?",
        "answer": "Yes, but your project must demonstrate additional engineering beyond just using the API."
    },
    {
        "question": "What is parameter-efficient fine-tuning?",
        "answer": "It refers to fine-tuning techniques like LoRA and Adapters that update only a small subset of the model."
    },
    {
        "question": "What should we cover in the presentation?",
        "answer": "Your project title, real-world problem, methodology, expected outcome, and team contributions."
    },
    {
        "question": "What’s the length of the presentation?",
        "answer": "You must present for 3–4 minutes during the Week 13 Practice Workshop."
    },
    {
        "question": "How can we improve our mark?",
        "answer": "Make your project novel, apply evaluation methods, and clearly explain your work in the report and presentation."
    },
    {
        "question": "What are some real-world NLP tasks?",
        "answer": "Text classification, summarization, question answering, entity recognition, translation, and dialogue generation."
    },
    {
        "question": "Can we build an agent using LangChain?",
        "answer": "Yes, LangChain is commonly used to implement RAG-based agents using LLMs and vector stores."
    }
]

# Save as qna.json
with open(output_dir / "qna.json", "w", encoding="utf-8") as f:
    json.dump(qa_data, f, indent=2)

print("qna.json saved successfully.")


qna.json saved successfully.


In [10]:
import json
from pathlib import Path

# Output directory (same as before)
output_dir = Path("/Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420")

# Define the deadlines
deadlines = [
    {
        "task": "Team Registration",
        "due_date": "2025-05-30",
        "type": "Workshop"
    },
    {
        "task": "Project Presentation (Week 13)",
        "due_date": "2025-06-06",
        "type": "Practice Workshop"
    },
    {
        "task": "Final Report + Code Submission",
        "due_date": "2025-06-17",
        "type": "Exam Period"
    }
]

# Save to JSON
with open(output_dir / "deadlines.json", "w", encoding="utf-8") as f:
    json.dump(deadlines, f, indent=2)

print("deadlines.json saved successfully.")


deadlines.json saved successfully.


In [11]:
course_info_text = """
Course Title: COMP8420 – Advanced Natural Language Processing (S1 2025)

Description:
This course teaches students how to apply modern Natural Language Processing (NLP) techniques using large language models (LLMs). It focuses on real-world applications and responsible development practices. Topics include foundation models, prompt engineering, fine-tuning, RAG pipelines, privacy, security, and AI agent design.

Teaching Team:
- Dr. Qiongkai Xu (Lecturer)
- Prof. Longbing Cao (Supervisor)
- Mr. Weijun Li (TA)

Key Technologies:
- Hugging Face, PyTorch, OpenAI APIs, LangChain, FAISS

Assessments:
- Assignment 1: Text Classification
- Assignment 2: Text Generation
- Assignment 3: Team Project (Presentation + Code + Report)

Objective:
Prepare students to build and deploy intelligent NLP systems with ethical awareness and practical skills.
"""

# Save to .txt
with open(output_dir / "course_info.txt", "w", encoding="utf-8") as f:
    f.write(course_info_text.strip())

print("course_info.txt saved successfully.")


course_info.txt saved successfully.


In [12]:
import json
from pathlib import Path

# Output directory
output_dir = Path("/Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420")

# Announcements dataset
announcements = [
    {
        "title": "Assignment 3 – Presentation Reminder",
        "content": "Don't forget to prepare a 3–4 minute talk about your project for the Week 13 workshop (Friday June 6th)."
    },
    {
        "title": "Assignment 3 Submission",
        "content": "Final code and report must be submitted by June 17th during the exam period. Late submissions will not be accepted without special consideration."
    },
    {
        "title": "Week 7 Workshop Topic",
        "content": "We'll explore risks and safety issues in LLMs. Please review the Week 7 slides before the workshop."
    },
    {
        "title": "Week 10 Team Registration",
        "content": "Make sure to form your group and register your project title by the Week 10 workshop."
    }
]

# Save as JSON
with open(output_dir / "announcements.json", "w", encoding="utf-8") as f:
    json.dump(announcements, f, indent=2)

print("announcements.json saved successfully.")


announcements.json saved successfully.


In [13]:
# Sample discussion Q&A
discussions = [
    {
        "question": "Do we have to use LangChain for Assignment 3?",
        "answer": "No, LangChain is optional. You can use any framework that supports RAG or LLM integration."
    },
    {
        "question": "Can we use ChatGPT API?",
        "answer": "Yes, but remember that you must demonstrate your own engineering effort in addition to using the API."
    },
    {
        "question": "Is it okay to work solo on Assignment 3?",
        "answer": "Projects should be completed in teams of two unless you’ve received special permission."
    },
    {
        "question": "What’s the expected length of the presentation?",
        "answer": "Each team should present for 3–4 minutes during the Week 13 practice workshop."
    },
    {
        "question": "Can we include public datasets in our project?",
        "answer": "Yes, you may include public datasets as long as they’re relevant to your topic and cited properly."
    }
]

# Save as JSON
with open(output_dir / "discussions.json", "w", encoding="utf-8") as f:
    json.dump(discussions, f, indent=2)

print("discussions.json saved successfully.")


discussions.json saved successfully.


In [11]:
import os
from pathlib import Path
import nbformat
from langchain_community.document_loaders import PyMuPDFLoader, JSONLoader, TextLoader
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma

# Set your base dataset path
dataset_path = Path("/Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420")
persist_path = dataset_path / "chroma_store"
persist_path.mkdir(parents=True, exist_ok=True)

text_splitter = RecursiveCharacterTextSplitter(chunk_size=600, chunk_overlap=100)
all_chunks = []

# 1. Load Lecture PDFs
lecture_path = dataset_path / "lectures"
for pdf_file in lecture_path.glob("*.pdf"):
    loader = PyMuPDFLoader(str(pdf_file))
    docs = loader.load()
    chunks = text_splitter.split_documents(docs)
    all_chunks.extend(chunks)

# 2. Load Practical Notebooks (.ipynb)
pracs_path = dataset_path / "practicals"
for ipynb_file in pracs_path.glob("*.ipynb"):
    nb = nbformat.read(open(ipynb_file, "r", encoding="utf-8"), as_version=4)
    text = ""
    for cell in nb.cells:
        if cell.cell_type in ["markdown", "code"]:
            text += cell.source + "\n\n"
    doc = Document(page_content=text, metadata={"source": ipynb_file.name})
    chunks = text_splitter.split_documents([doc])
    all_chunks.extend(chunks)

# 3. Load JSON Files (qna, deadlines, etc.)
json_files = ["qna.json", "deadlines.json", "announcements.json", "discussions.json"]
for json_file in json_files:
    loader = JSONLoader(
        file_path=str(dataset_path / json_file),
        jq_schema=".[]",
        text_content=False
    )
    docs = loader.load()
    chunks = text_splitter.split_documents(docs)
    all_chunks.extend(chunks)

# 4. Load course_info.txt
info_loader = TextLoader(str(dataset_path / "course_info.txt"))
docs = info_loader.load()
chunks = text_splitter.split_documents(docs)
all_chunks.extend(chunks)

# 5. Embed and save to Chroma
embedding = OpenAIEmbeddings(disallowed_special=())
vectorstore = Chroma.from_documents(
    documents=all_chunks,
    embedding=embedding,
    persist_directory=str(persist_path)
)
vectorstore.persist()

print(f"Done. Total chunks embedded: {len(all_chunks)}")


Done. Total chunks embedded: 34


  vectorstore.persist()


### Student QA Agent with Evaluation Metrics

This section sets up the Student QA Agent using a Retrieval-Augmented Generation (RAG) pipeline with LangChain and GPT-3.5. The agent answers student queries by retrieving relevant course content from the vector store and generating responses.

To assess the quality of the agent's answers, the following evaluation metrics are implemented:

- **Precision@k**: Checks whether the correct source document (e.g., `qna.json`) is among the top retrieved documents.
- **BLEU Score**: Measures textual overlap between the generated answer and a reference answer.
- **Cosine Similarity**: Computes semantic similarity between the generated and reference answers using sentence embeddings.
- **Response Time**: Tracks latency to measure system responsiveness.

This helps quantify the performance of the agent and ensures it produces accurate, relevant, and helpful responses to students.


In [2]:
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from nltk.translate.bleu_score import sentence_bleu
from sentence_transformers import SentenceTransformer, util
import time
import os
os.environ["OPENAI_API_KEY"] = "Your_Api_Key"
# === Setup ===
embedding = OpenAIEmbeddings(disallowed_special=())
persist_path = "/Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420/chroma_store"
vectorstore = Chroma(persist_directory=persist_path, embedding_function=embedding)
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 4})
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

# === Evaluation Utilities ===

def precision_at_k(source_docs, expected_source):
    for doc in source_docs:
        if expected_source in doc.metadata.get("source", ""):
            return 1
    return 0

def bleu_score_metric(reference_answer, generated_answer):
    reference = [reference_answer.split()]
    candidate = generated_answer.split()
    return sentence_bleu(reference, candidate)

def cosine_similarity_metric(reference_answer, generated_answer):
    model = SentenceTransformer('all-MiniLM-L6-v2')
    emb_ref = model.encode(reference_answer, convert_to_tensor=True)
    emb_gen = model.encode(generated_answer, convert_to_tensor=True)
    return util.cos_sim(emb_gen, emb_ref).item()

# === Query & Evaluation ===

query = "When is Assignment 3 due and how do we submit it?"
reference_answer = "Assignment 3 is due on June 17th, 2025. Submit your code and report during the exam period via iLearn."
expected_source = "qna.json"

start_time = time.time()
response = qa_chain.invoke({"query": query})
end_time = time.time()

# Output answer
print("Answer:")
print(response["result"])

# Output sources
print("\nSources:")
for doc in response["source_documents"]:
    print("-", doc.metadata.get("source", "Unknown"))

# Evaluation
prec = precision_at_k(response["source_documents"], expected_source)
bleu = bleu_score_metric(reference_answer, response["result"])
cos_sim = cosine_similarity_metric(reference_answer, response["result"])
latency = end_time - start_time

print("\nEvaluation Metrics:")
print(f"Precision@k       : {prec}")
print(f"BLEU Score        : {bleu:.2f}")
print(f"Cosine Similarity : {cos_sim:.2f}")
print(f"Response Time     : {latency:.2f} sec")


  vectorstore = Chroma(persist_directory=persist_path, embedding_function=embedding)


Answer:
Assignment 3 (code + report) is due during the exam period on June 17th, 2025. Final code and report must be submitted by June 17th during the exam period. Late submissions will not be accepted without special consideration.

Sources:
- /Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420/qna.json
- /Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420/announcements.json
- /Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420/discussions.json
- /Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420/qna.json


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]


Evaluation Metrics:
Precision@k       : 1
BLEU Score        : 0.17
Cosine Similarity : 0.83
Response Time     : 1.69 sec


### Teacher Agent – Quiz Generator with Evaluation

This section implements an intelligent quiz generator that creates multiple-choice questions (MCQs) from lecture content using OpenAI's GPT-3.5 model. The generator extracts key concepts from lecture `.txt` files and formulates 3–5 quiz questions per file, making it easier for teachers to assess student understanding.

In addition to generating quizzes, the system also performs **automatic structural evaluation** on the output to ensure quality and consistency. This includes checks for completeness, proper answer formatting, and option uniqueness.

**Inputs:**
- `.txt` file containing the lecture content
- Number of questions to generate (default is 5)

**Outputs:**
- A list of MCQs with four options (A–D) and a correct answer
- Optional raw and structured quiz files (`.txt` and `.json`)
- Evaluation metrics:
  - % of complete questions
  - % with valid answer format (A–D)
  - % with unique answer options

This tool is especially useful for teachers to **automate assessment** and ensure quiz quality before sharing with students.


In [3]:
from langchain.chat_models import ChatOpenAI
from pathlib import Path
import json

def generate_quiz_from_lecture(
    lecture_path,
    n_questions=5,
    model="gpt-3.5-turbo",
    save_txt_path=None,
    save_json_path=None,
    truncate_tokens=4000
):
    """
    Generate MCQs from lecture content using GPT-3.5 via LangChain and evaluate their structure.
    """
    lecture_path = Path(lecture_path)

    # === Read and truncate content ===
    with open(lecture_path, "r", encoding="utf-8") as f:
        content = f.read()
    if len(content) > truncate_tokens:
        content = content[:truncate_tokens] + "\n\n[Content truncated]"

    # === Prompt GPT ===
    prompt = f"""
You are a quiz-generating assistant. Based on the following lecture content, generate {n_questions} multiple-choice questions (MCQs). Each question should have:

- 1 clear question
- 4 options labeled A, B, C, and D
- The correct answer at the end, formatted like: Answer: B

Lecture content:
\"\"\"
{content}
\"\"\"
"""
    chat = ChatOpenAI(model_name=model, temperature=0)
    raw_output = chat.predict(prompt)

    # === Save raw text ===
    if save_txt_path:
        with open(save_txt_path, "w", encoding="utf-8") as f:
            f.write(raw_output)
        print(f"Saved raw quiz to: {save_txt_path}")

    # === Parse to JSON ===
    parsed_questions = []
    current_q = {}
    lines = raw_output.strip().splitlines()

    for line in lines:
        line = line.strip()
        if not line:
            continue
        if line.startswith("Q") or line[0].isdigit():
            if current_q:
                parsed_questions.append(current_q)
                current_q = {}
            current_q["question"] = line.split(":", 1)[-1].strip()
            current_q["options"] = {}
        elif line[0] in ["A", "B", "C", "D"] and line[1] == ".":
            key = line[0]
            current_q["options"][key] = line[2:].strip()
        elif line.lower().startswith("answer:"):
            current_q["answer"] = line.split(":")[-1].strip().upper()
    if current_q:
        parsed_questions.append(current_q)

    # === Save JSON ===
    if save_json_path:
        with open(save_json_path, "w", encoding="utf-8") as f:
            json.dump(parsed_questions, f, indent=2)
        print(f"Saved parsed quiz to: {save_json_path}")

    # === Evaluation ===
    def evaluate_questions(questions):
        total = len(questions)
        complete = 0
        valid_answer = 0
        unique_options = 0

        for q in questions:
            # Check completeness
            if "question" in q and "options" in q and len(q["options"]) == 4 and "answer" in q:
                complete += 1
            # Check answer is A/B/C/D
            if q.get("answer") in ["A", "B", "C", "D"]:
                valid_answer += 1
            # Check all options are unique
            opts = list(q.get("options", {}).values())
            if len(opts) == len(set(opts)):
                unique_options += 1

        return {
            "Total Questions": total,
            "Complete Questions": complete,
            "Valid Answer Format": valid_answer,
            "Unique Option Sets": unique_options,
            "Completeness (%)": round(100 * complete / total, 1) if total else 0,
            "Answer Validity (%)": round(100 * valid_answer / total, 1) if total else 0,
            "Option Uniqueness (%)": round(100 * unique_options / total, 1) if total else 0,
        }

    metrics = evaluate_questions(parsed_questions)

    # === Print metrics ===
    print("\nEvaluation Metrics for Generated Quiz:")
    for key, value in metrics.items():
        print(f"{key:25}: {value}")

    return parsed_questions, metrics


In [4]:
generate_quiz_from_lecture(
    lecture_path="/Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420/lectures/COMP8420-W6 - Dev LLMs - Fine-tuning.txt",
    n_questions=5,
    save_txt_path="week6_quiz_raw.txt",
    save_json_path="week6_quiz.json"
)


  chat = ChatOpenAI(model_name=model, temperature=0)
  raw_output = chat.predict(prompt)


Saved raw quiz to: week6_quiz_raw.txt
Saved parsed quiz to: week6_quiz.json

Evaluation Metrics for Generated Quiz:
Total Questions          : 5
Complete Questions       : 5
Valid Answer Format      : 5
Unique Option Sets       : 5
Completeness (%)         : 100.0
Answer Validity (%)      : 100.0
Option Uniqueness (%)    : 100.0


([{'question': '1. What is the focus of the lecture on developing LLMs?',
   'options': {'A': 'Data visualization techniques',
    'B': 'Training and fine-tuning LLMs',
    'C': 'Software development best practices',
    'D': 'Hardware optimization strategies'},
   'answer': 'B'},
  {'question': "2. Which technique is NOT mentioned in the agenda for this week's lecture?",
   'options': {'A': 'Prompt engineering',
    'B': 'Chain of Thoughts',
    'C': 'Image recognition algorithms',
    'D': 'Retrieval-augmented generation'},
   'answer': 'C'},
  {'question': '3. What is emphasized in prompt engineering for Large Language Models?',
   'options': {'A': 'Complex and ambiguous prompts',
    'B': 'Instructions without any description',
    'C': 'Vague and unclear prompts',
    'D': 'Clear and precise prompts'},
   'answer': 'D'},
  {'question': '4. Which of the following is NOT a component of prompt engineering discussed in the lecture?',
   'options': {'A': 'Role prompt',
    'B': 'One-sh

### Student Agent – Generate Demo Quiz from Topic

This feature allows students to instantly generate a short quiz on any course-related topic (e.g., *"fine-tuning"*, *"transformers"*, etc.). It helps reinforce understanding through self-assessment.

**How It Works:**
- Takes a student query (topic or concept)
- Retrieves the most relevant course materials using **vector similarity search** (RAG)
- Uses **GPT-3.5** to generate **2–3 multiple-choice questions (MCQs)**
- Returns each question with:
  - A clear question statement
  - Four answer options (A–D)
  - One correct answer

**Quiz Quality Evaluation:**
After generating the quiz, the system automatically evaluates the structural integrity of the output:
- Are all questions complete?
- Do all questions have 4 unique options?
- Is the answer format valid (A/B/C/D)?
- Are the questions properly formatted?

This ensures quiz quality before students rely on it for revision or learning.

**Educational Value:**
This self-service agent enables personalized quiz creation for active learning, and supports **automated content reinforcement** — especially useful before exams or tutorials.


In [5]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.vectorstores import Chroma
import textwrap
import re

def generate_demo_quiz_from_query(query, vectorstore_path, model="gpt-3.5-turbo", n_questions=3):
    """
    Generate a short quiz based on a student's topic query using retrieved course content and GPT-3.5.

    Parameters:
    - query (str): Topic or question from the student
    - vectorstore_path (str): Path to Chroma vector database
    - model (str): OpenAI model name
    - n_questions (int): Number of quiz questions to generate

    Returns:
    - str: Generated quiz as text
    """
    # Load vector DB and embedding
    embedding = OpenAIEmbeddings()
    vectorstore = Chroma(persist_directory=vectorstore_path, embedding_function=embedding)
    retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 2})

    # Retrieve relevant documents
    relevant_docs = retriever.get_relevant_documents(query)
    context = "\n\n".join([doc.page_content for doc in relevant_docs])[:3500]

    # Prompt for quiz generation
    prompt = f"""
You are an AI teaching assistant. Based on the following course content, generate {n_questions} multiple-choice questions (MCQs) to test understanding. Each question should include:
- A clear question
- 4 answer options (A, B, C, D)
- The correct answer clearly labeled at the end (e.g., Answer: C)

Course content:
\"\"\"
{context}
\"\"\"
"""
    # Run GPT
    llm = ChatOpenAI(model_name=model, temperature=0)
    response = llm.invoke(prompt)

    # Extract and print quiz
    quiz_text = response.content.strip() if hasattr(response, "content") else str(response)
    print("Quiz Based on Your Query:\n")
    print(textwrap.indent(quiz_text, "  "))

    return quiz_text


def evaluate_generated_quiz(quiz_text):
    """
    Basic structural evaluation of a quiz generated from text.

    Returns a dictionary of evaluation metrics.
    """
    lines = quiz_text.strip().splitlines()
    questions = []
    current_q = {"options": {}, "answer": None}
    
    for line in lines:
        line = line.strip()
        if re.match(r"^\d+\.", line) or line.lower().startswith("q"):
            if current_q["options"] or current_q["answer"]:
                questions.append(current_q)
                current_q = {"options": {}, "answer": None}
            current_q["question"] = line
        elif re.match(r"^[A-D]\)", line):
            key = line[0]
            current_q["options"][key] = line[2:].strip()
        elif "answer" in line.lower():
            answer_match = re.search(r"([A-D])", line.upper())
            if answer_match:
                current_q["answer"] = answer_match.group(1)

    if current_q.get("options") or current_q.get("answer"):
        questions.append(current_q)

    # Evaluate
    total = len(questions)
    complete = sum(1 for q in questions if len(q["options"]) == 4 and q["answer"] in ["A", "B", "C", "D"])
    unique_options = sum(1 for q in questions if len(set(q["options"].values())) == 4)

    results = {
        "Total Questions": total,
        "Complete Questions": complete,
        "Valid Answer Format": complete,
        "Unique Option Sets": unique_options,
        "Completeness (%)": round(complete / total * 100, 2) if total else 0,
        "Answer Validity (%)": round(complete / total * 100, 2) if total else 0,
        "Option Uniqueness (%)": round(unique_options / total * 100, 2) if total else 0
    }

    print("\nEvaluation of Generated Quiz:")
    for k, v in results.items():
        print(f"{k:25}: {v}")
    
    return results


In [6]:
quiz = generate_demo_quiz_from_query(
    query="fine-tuning",
    vectorstore_path="/Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420/chroma_store"
)

evaluate_generated_quiz(quiz)


  relevant_docs = retriever.get_relevant_documents(query)


Quiz Based on Your Query:

  1. What does fine-tuning mean in the context of LLMs?
  A) Fine-tuning is the process of training a model from scratch
  B) Fine-tuning is the process of training a model on a large, general dataset
  C) Fine-tuning is the process of continuing training on a pre-trained model using a smaller, task-specific dataset
  D) Fine-tuning is the process of freezing all model parameters

  Answer: C

  2. What is parameter-efficient fine-tuning?
  A) It refers to training a model with a large number of parameters
  B) It refers to fine-tuning a model on a large, general dataset
  C) It refers to fine-tuning techniques like LoRA and Adapters that update only a small subset of the model
  D) It refers to fine-tuning a model without updating any parameters

  Answer: C

  3. Which fine-tuning technique updates only a small subset of the model?
  A) Gradient Descent
  B) LoRA
  C) Random Forest
  D) Support Vector Machines

  Answer: B

Evaluation of Generated Quiz:
Tot

{'Total Questions': 3,
 'Complete Questions': 3,
 'Valid Answer Format': 3,
 'Unique Option Sets': 3,
 'Completeness (%)': 100.0,
 'Answer Validity (%)': 100.0,
 'Option Uniqueness (%)': 100.0}

### Admin Agent – LMS Statistics Dashboard

This section provides a summary of LMS activity across different sources:
- `qna.json` – student Q&A data
- `announcements.json` – instructor/admin posts
- `discussions.json` – forum or general discussion threads

The dashboard gives:
- Total number of entries per source
- Frequency of key academic terms (e.g., “assignment”, “quiz”, “exam”)
- Basic engagement insights

This helps administrative users understand where student attention is focused and identify recurring topics.


In [8]:
import json
from pathlib import Path
from collections import Counter, defaultdict
import pandas as pd
import re
from IPython.display import display

# === Configuration ===
data_dir = Path("/Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420/")
filenames = ["qna.json", "announcements.json", "discussions.json"]
keywords = ["assignment", "quiz", "exam", "deadline", "submission", "feedback"]

# === Helper Functions ===
def load_json_file(filepath):
    with open(filepath, "r", encoding="utf-8") as f:
        return json.load(f)

def count_keywords(text, keywords):
    text = text.lower()
    return {kw: text.count(kw) for kw in keywords}

# === Processing ===
summary = []
keyword_totals = defaultdict(int)

for fname in filenames:
    path = data_dir / fname
    data = load_json_file(path)

    if not isinstance(data, list):
        continue

    total_entries = len(data)
    all_text = " ".join(json.dumps(item).lower() for item in data)
    keyword_counts = count_keywords(all_text, keywords)
    
    for kw, count in keyword_counts.items():
        keyword_totals[kw] += count
    
    summary.append({
        "File": fname,
        "Total Entries": total_entries,
        **keyword_counts
    })

# === Display Results ===
df_summary = pd.DataFrame(summary)
print("LMS Statistics Dashboard")
display(df_summary)

# === Keyword Total Summary (across all files) ===
print("\nTotal Keyword Mentions Across All Sources:")
for kw in keywords:
    print(f"{kw.title():<12}: {keyword_totals[kw]}")


LMS Statistics Dashboard


Unnamed: 0,File,Total Entries,assignment,quiz,exam,deadline,submission,feedback
0,qna.json,20,2,0,3,1,0,0
1,announcements.json,4,2,0,1,0,2,0
2,discussions.json,5,2,0,0,0,0,0



Total Keyword Mentions Across All Sources:
Assignment  : 6
Quiz        : 0
Exam        : 4
Deadline    : 1
Submission  : 2
Feedback    : 0


### Admin Agent – Natural Language Query & Summary

This component allows administrators or instructors to ask natural language questions about the course data (e.g., Q&A posts, announcements, discussions). It uses Retrieval-Augmented Generation (RAG) to fetch the most relevant content from embedded LMS sources and generates summaries using GPT-3.5.

**Key Features:**
- Enables free-form queries like:
  - "What are students asking about Assignment 3?"
  - "Summarize concerns related to exams"
  - "Are there any posts discussing submission issues?"
- Uses vector similarity search to retrieve top-matching documents
- Summarizes retrieved content with a large language model (LLM)

**How It Works:**
- Embedded LMS content is stored in a Chroma vector database
- Queries are matched to the most relevant content using vector similarity
- GPT-3.5 generates a concise response based on retrieved information

This makes it easy for instructors to understand student concerns, spot common topics, and make data-informed decisions without reading every post manually.


In [10]:
from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import RetrievalQA
import time

# === Configuration ===
vectorstore_path = "/Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420/chroma_store"
embedding = OpenAIEmbeddings()
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# Load vector store & retriever
vectorstore = Chroma(persist_directory=vectorstore_path, embedding_function=embedding)
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 4})

# Setup QA Chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

# === Query and Evaluation ===
query = "What are students asking about Assignment 3?"
start_time = time.time()
response = qa_chain.invoke({"query": query})
end_time = time.time()

# Extract result and sources
result = response["result"]
sources = [doc.metadata.get("source", "Unknown") for doc in response["source_documents"]]
unique_sources = set(sources)

# Print answer
print("Answer:\n")
print(result)

# Print sources
print("\nSource Files:")
for src in unique_sources:
    print("-", src)

# === Second Admin Query ===
query2 = "Are there any common concerns about the exam?"
start_time2 = time.time()
response2 = qa_chain.invoke({"query": query2})
end_time2 = time.time()

result2 = response2["result"]
sources2 = [doc.metadata.get("source", "Unknown") for doc in response2["source_documents"]]
unique_sources2 = set(sources2)

# Print second answer
print("\n\nAnswer (Query: Exam Concerns):\n")
print(result2)

# Print second sources
print("\nSource Files:")
for src in unique_sources2:
    print("-", src)

# Second evaluation metrics
latency2 = round(end_time2 - start_time2, 2)
source_count2 = len(unique_sources2)

print("\nEvaluation Metrics (Second Query):")
print(f"Response Time (sec)     : {latency2}")
print(f"Unique Source Files     : {source_count2}")
print(f"Total Chunks Retrieved  : {len(response2['source_documents'])}")


# Evaluation Metrics
latency = round(end_time - start_time, 2)
source_count = len(unique_sources)

print("\nEvaluation Metrics:")
print(f"Response Time (sec)     : {latency}")
print(f"Unique Source Files     : {source_count}")
print(f"Total Chunks Retrieved  : {len(response['source_documents'])}")


Answer:

Students are asking about the final deadline for Assignment 3, whether it is okay to work solo on the assignment, if LangChain is required for Assignment 3, and details about the submission requirements for Assignment 3.

Source Files:
- /Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420/discussions.json
- /Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420/qna.json
- /Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420/announcements.json


Answer (Query: Exam Concerns):

I don't know.

Source Files:
- /Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420/qna.json
- /Users/shaimonrahman/Desktop/COMP8420/Assignment_3/StudentAgentDataset/COMP8420/announcements.json

Evaluation Metrics (Second Query):
Response Time (sec)     : 1.32
Unique Source Files     : 2
Total Chunks Retrieved  : 4

Evaluation Metrics:
Response Time (sec)     : 1.98
Unique Source Files     : 3
Total 