# 🎯 Use Case: AI-Powered Interview Agent

Hiring technical talent often involves **manually assessing** candidate answers during interviews.  
This process is often:

- 🐌 **Slow**
- 🙅 **Subjective**
- 🔁 **Inconsistent**

---

## 💡 What This Notebook Does

This notebook demonstrates a **GenAI-powered Interview Agent** using a local large language model (LLM).  
It can:

🧠 **1. Ask** technical questions from a curated dataset  
💬 **2. Accept** a candidate’s answer (real or synthetic)  
🧾 **3. Evaluate** the response with:

- ✅ Strengths
- ❌ Weaknesses
- 📈 Suggestions
- 🧠 Score out of 10

---

## ⚙️ Powered By

- 🔍 `google/flan-t5-large` LLM  
- 🤗 Hugging Face Transformers pipeline  
- ✨ Few-shot prompting + structured feedback generation  
- 📊 Synthetic dataset creation for scalable evaluation
---

## Project Overview

This project simulates an AI-driven interview evaluation system. It leverages Generative AI models to assess candidate responses to technical interview questions and provide feedback. The process includes generating realistic candidate answers, comparing them to ideal answers, and offering personalized suggestions for improvement.

Key features of the project:
- **Feedback Generation**: Using the Flan-T5 Large model, it generates structured feedback based on the candidate's answer, suggesting areas for improvement.
- **Semantic Similarity Matching**: A sentence transformer model calculates semantic similarity between ideal and candidate answers to provide more accurate feedback.
- **Few-shot Prompting**: Utilizes few-shot prompting to guide the model in generating helpful and relevant feedback based on examples of previous interview evaluations.
- **LoRA Fine-Tuning**: Fine-tunes the model with a custom LoRA adapter to enhance its performance in generating specialized feedback.

This approach offers a comprehensive solution to automate the feedback process for interview assessments and aims to make interview evaluations more efficient, consistent, and informative for both candidates and interviewers.


## GenAI Capabilities Used

The notebook leverages several GenAI capabilities:

- **Text Generation** ✍️: Uses Flan-T5 Large model to generate structured feedback on interview responses, controlling output generation based on the question and ideal answer.
- **Few-shot Prompting** 🎯: Implements few-shot prompting techniques to guide the model in generating structured feedback, using example-based input for better results.
- **Document Understanding** 📄: Analyzes interview questions and candidate answers to generate relevant and coherent feedback based on ideal answers.
- **Semantic Similarity Matching** 🔍: Employs a sentence transformer model to calculate the semantic similarity between the candidate’s answer and ideal answers for more accurate feedback.
- **Embeddings** 🧠: Converts the candidate’s answer and ideal response into embeddings for semantic comparison using vector space representations.
- **Controlled Generation** 🎛️: Generates feedback with structured suggestions and scores, ensuring output aligns with the defined structure for improvement.
- **Fine-tuning** 🔧: Implements LoRA (Low-Rank Adaptation) to fine-tune a model for personalized interview feedback generation.
- **GenAI Evaluation** 🔍: This project uses **GenAI evaluation** to assess candidate answers against ideal responses. It leverages pre-trained models like Flan-T5 for structured feedback and semantic similarity matching to evaluate the relevance and quality of answers, ensuring consistent and data-driven insights.


# 📦 Dependencies and API Access

---

# 🛠️ Installation Commands

To ensure all dependencies are available when running the notebook, the following installation commands are included:

---


In [None]:
# Install latest Hugging Face libraries
!pip install -U -q transformers datasets accelerate peft

## 🔧 Package Imports and Dependencies

The **Interview Agent** project relies on several Python libraries and frameworks.  
Here is s a consolidated list of all the imports and dependencies used throughout the notebook:

---

In [None]:
# Core data handling and utility libraries
import numpy as np
import pandas as pd
import random
import os
import re
import shutil
import torch
from tqdm import tqdm

# Hugging Face Transformers ecosystem
from transformers import pipeline
from transformers import T5Tokenizer, T5ForConditionalGeneration
from transformers import TrainingArguments, Trainer
from datasets import Dataset

# Parameter-Efficient Fine-Tuning (PEFT)
from peft import get_peft_model, LoraConfig, TaskType
from peft import PeftModel

# Semantic similarity and embeddings
from sentence_transformers import SentenceTransformer, util

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## Step 1: Load and Inspect Interview Dataset

We start by loading a structured dataset of technical interview questions. This dataset will serve as the backbone for our AI-driven interview assistant.

---
### 📦 Dataset Columns:
- **`question_id`**: Unique identifier for each question  
- **`question`**: The interview question text  
- **`ideal_answer`**: Concise reference answer, used for feedback comparison  
- **`candidate_answer`**: The user's response (used during evaluation)  
- **`difficulty`**: Labeled as Easy, Medium, or Hard  
- **`answer_type`**: Type of answer expected (e.g., Descriptive, Code, Conceptual)  
- **`feedback`**: Expert-generated feedback (used for training)

---
### 🎯 Purpose:
This dataset forms the foundation for both evaluating candidate answers and generating intelligent feedback. We’ll also inspect the difficulty distribution to understand how diverse and balanced the questions are.


In [None]:
# SECTION: Load Interview Questions CSV
# Load the dataset (update the filename/path as needed)
df = pd.read_csv('/kaggle/input/software-engineering-interview-questions-dataset/Software Questions.csv',encoding='latin1')

# Show unique difficulty levels
print("Unique Difficulty Levels:", df['Difficulty'].unique())

# Preview dataset
df.head()


## Step 2: Setup, Generate, and Refine Feedback 🔧

### Purpose:
This step implements the feedback generation system using the Flan-T5 model, which is fine-tuned for few-shot prompting. The system processes candidate responses and generates structured feedback based on different answer scenarios. 

### Process:
- **Input Handling**: The function handles various answer types, such as:
  - **Empty Responses**: If no answer is provided, it generates feedback highlighting the need for a more complete response.
  - **Brief Responses**: For short or incomplete answers, feedback is generated to prompt the candidate to elaborate on key points.
  - **Detailed Responses**: For longer, more detailed answers, feedback is provided on the clarity, structure, and depth of the response.

- **Feedback Components**:
  - **Strengths** 💪: Identifies positive aspects of the response.
  - **Weaknesses** ⚠️: Points out areas for improvement.
  - **Suggestions** 💡: Offers actionable suggestions to enhance the response.
  - **Score** 🏆: Provides a numerical score or a rating based on predefined criteria.


---

This step leverages the power of few-shot learning and semantic analysis to ensure that feedback is both accurate and tailored to the candidate's unique responses. It helps users understand where they excel and where they need to improve, leading to more effective preparation.


In [None]:
# Load the dataset
df = pd.read_csv('/kaggle/input/software-engineering-interview-questions-dataset/Software Questions.csv', encoding='latin1')

# Load a larger model for better generation quality
feedback_model = pipeline(
    "text2text-generation",
    model="google/flan-t5-large",  # More capable than t5-small
    do_sample=True,
    temperature=0.7,
    top_p=0.9
)

# Improved Prompt with clear feedback and edge case handling
few_shot_prompt = """
You are an AI interview coach. Your task is to evaluate a candidate's technical interview answer.

Provide the feedback in the following sections:
1. ✅ Strengths: List the strong points of the candidate's answer.
2. ❌ Weaknesses or Misconceptions: Identify areas where the answer is lacking or incorrect.
3. 📈 Suggestions for Improvement: Provide specific suggestions on how the candidate can improve their answer.
4. 🧠 Score: Give a score out of 10 based on the overall quality of the answer.

Make sure to fill each section with detailed and constructive feedback based on the candidate's answer.

If the candidate says "I don't know" or provides a vague or incomplete answer, kindly provide the correct answer or suggest areas where they can improve. Encourage learning and provide helpful feedback.

Example 1:
Question: What is the difference between compilation and interpretation?
Ideal Answer: Compilation translates source code into machine code creating an executable file. Interpretation translates and executes code line by line without an executable.
Candidate Answer: Compilation turns source code to machine code; interpretation runs code line by line.
Feedback:
✅ Strengths: The candidate correctly identified the key difference between compile-time and interpretation.
❌ Weaknesses or Misconceptions: The candidate missed that compilation creates an executable file.
📈 Suggestions for Improvement: The candidate could mention real-world examples like C (compiled) vs Python (interpreted).
🧠 Score: 7/10

Example 2:
Question: What is polymorphism?
Ideal Answer: Polymorphism allows methods to have different behaviors based on the object calling them.
Candidate Answer: It allows a function to behave differently.
Feedback:
✅ Strengths: The candidate has a basic understanding of polymorphism.
❌ Weaknesses or Misconceptions: The candidate's answer is vague and does not mention key aspects such as objects, overriding, or overloading.
📈 Suggestions for Improvement: The candidate could provide examples from object-oriented programming or use analogies to clarify the concept.
🧠 Score: 6/10

Now evaluate this:
"""

# Function to generate feedback using the new, clear prompt with edge case handling
def generate_feedback(question, candidate_answer, ideal_answer):
    # Edge case handling: Check if the answer is empty or something like "I don't know"
    if candidate_answer.strip().lower() in ['i don\'t know', 'na', '', 'no idea', 'dont know']:
        feedback = f"""
✅ Strengths: The candidate was honest in stating that they don't know the answer. This is an important trait in interviews.
❌ Weaknesses or Misconceptions: The candidate did not attempt to answer the question.
📈 Suggestions for Improvement: It would be helpful to study the key concepts related to this topic. Here's the correct answer:
{ideal_answer}
🧠 Score: 3/10 (No answer provided, but this can be improved with more preparation.)
"""
    # If the answer is valid but incomplete, provide suggestions to expand it
    elif len(candidate_answer.split()) < 10:
        feedback = f"""
✅ Strengths: The candidate has made an attempt to answer the question.
❌ Weaknesses or Misconceptions: The answer lacks detail and does not fully address the question.
📈 Suggestions for Improvement: Please elaborate more on the key concepts and provide additional examples or explanations.
🧠 Score: 5/10 (The answer is too brief, but it could be improved with more information.)
"""
    else:
        # Generate feedback using the model
        prompt = f"""{few_shot_prompt}
Question: {question}
Ideal Answer: {ideal_answer}
Candidate Answer: {candidate_answer}
Feedback:"""
        result = feedback_model(prompt, max_length=512, truncation=True)
        feedback = result[0]['generated_text'].strip()

    return feedback
# Live Q&A loop
while True:
    random_row = df.sample(n=1).iloc[0]
    question = random_row['Question']
    ideal_answer = random_row['Answer']

    print("\n🧠 Random Interview Question:")
    print(question)

    candidate_answer = input("\n👤 Your Answer (type 'stop' to quit): ")
    if candidate_answer.strip().lower() == 'stop':
        print("🛑 Session ended.")
        break

    print("\n✅ AI Feedback:")
    feedback = generate_feedback(question, candidate_answer, ideal_answer)
    print(feedback)


## Step 3: Creating a Synthetic Dataset for Model Fine-tuning 🧑‍💻

### Purpose:
Generates a synthetic dataset by creating diverse candidate answers for each interview question. This dataset is crucial for fine-tuning the feedback model, ensuring it handles various response types.

---
### Process:
- **"I Don’t Know" Variations**: Adds multiple "I don't know" responses (e.g., "I'm not sure about this one") to simulate real candidate uncertainty.
- **Synthetic Answer Generation**: For each question, the system generates:
  - A random "I don’t know" answer.
  - Several **realistic answers** using a pre-trained model, providing diverse, humanlike responses.

---
### Dataset Structure:
Each generated answer is stored with:
- **question_id**, **question**, **candidate_answer**, **ideal_answer**, **difficulty**, and **answer_type** ("unknown" or "humanlike").

This synthetic dataset provides varied training examples, enhancing the model's ability to generate accurate feedback for diverse responses.

---

In [None]:
# Load your dataset
df = pd.read_csv("/kaggle/input/software-engineering-interview-questions-dataset/Software Questions.csv",encoding='latin1')

# Load Flan-T5 large model
generator = pipeline("text2text-generation", model="google/flan-t5-large", max_length=200)

# Create diverse "I don't know" variations
dont_know_variants = [
    "I'm not sure about this one.",
    "I don't know the answer to this.",
    "Sorry, I can't recall this.",
    "I'm unsure, but I'd like to learn more about it.",
    "I don't know."
]

# Prepare your data
answer_data = []

def build_prompt(question):
    return f"You are a job candidate answering this interview question naturally and honestly:\nQuestion: {question}\nAnswer:"

for _, row in tqdm(df.iterrows(), total=len(df)):
    qid = row['Question Number']
    question = row['Question']
    ideal = row['Answer']
    difficulty = row['Difficulty']

    # Add one "I don't know" variation
    answer_data.append({
        "question_id": qid,
        "question": question,
        "candidate_answer": random.choice(dont_know_variants),
        "ideal_answer": ideal,
        "difficulty": difficulty,
        "answer_type": "unknown"
    })

    # Generate 4 realistic answers
    prompt = build_prompt(question)
    responses = generator(prompt, num_return_sequences=4, do_sample=True, top_p=0.9, temperature=0.7)

    for response in responses:
        answer_data.append({
            "question_id": qid,
            "question": question,
            "candidate_answer": response['generated_text'].strip(),
            "ideal_answer": ideal,
            "difficulty": difficulty,
            "answer_type": "humanlike"
        })

# Convert to DataFrame
gen_df = pd.DataFrame(answer_data)

# Save to CSV
gen_df.to_csv("/kaggle/working/finetune_feedback_dataset.csv", index=False)


## Step 4: Generating Feedback for the Training Dataset 🧠

### Purpose:
Enhances the synthetic dataset by generating feedback for each question-answer pair, forming complete training triplets: **question**, **candidate answer**, and **model-generated feedback**.

---
### Process:
- Iterates through each row in the dataset.
- For every candidate answer:
  - Applies the `generate_feedback()` function using the **question**, **candidate_answer**, and **ideal_answer**.
  - Appends the structured feedback (including strengths, weaknesses, suggestions, and score) to a new list.
- Adds the feedback as a new column in the dataset.
- Saves the enriched dataset as a CSV file, ready for fine-tuning the model.

This step transforms the synthetic dataset into a supervised format, enabling the model to learn how to generate high-quality feedback from real examples.

---

In [None]:
# Load the dataset
df = pd.read_csv('/kaggle/input/finetune/finetune_feedback_dataset.csv',encoding='latin1')

# Load a capable model for feedback generation
feedback_model = pipeline(
    "text2text-generation",
    model="google/flan-t5-large",
    do_sample=True,
    temperature=0.7,
    top_p=0.9
)

# Few-shot feedback prompt template
few_shot_prompt = """
You are an AI interview coach. Your task is to evaluate a candidate's technical interview answer.

Provide the feedback in the following sections:
1. ✅ Strengths: List the strong points of the candidate's answer.
2. ❌ Weaknesses or Misconceptions: Identify areas where the answer is lacking or incorrect.
3. 📈 Suggestions for Improvement: Provide specific suggestions on how the candidate can improve their answer.
4. 🧠 Score: Give a score out of 10 based on the overall quality of the answer.

Make sure to fill each section with detailed and constructive feedback based on the candidate's answer.

If the candidate says "I don't know" or provides a vague or incomplete answer, kindly provide the correct answer or suggest areas where they can improve. Encourage learning and provide helpful feedback.

Example 1:
Question: What is the difference between compilation and interpretation?
Ideal Answer: Compilation translates source code into machine code creating an executable file. Interpretation translates and executes code line by line without an executable.
Candidate Answer: Compilation turns source code to machine code; interpretation runs code line by line.
Feedback:
✅ Strengths: The candidate correctly identified the key difference between compile-time and interpretation.
❌ Weaknesses or Misconceptions: The candidate missed that compilation creates an executable file.
📈 Suggestions for Improvement: The candidate could mention real-world examples like C (compiled) vs Python (interpreted).
🧠 Score: 7/10

Example 2:
Question: What is polymorphism?
Ideal Answer: Polymorphism allows methods to have different behaviors based on the object calling them.
Candidate Answer: It allows a function to behave differently.
Feedback:
✅ Strengths: The candidate has a basic understanding of polymorphism.
❌ Weaknesses or Misconceptions: The candidate's answer is vague and does not mention key aspects such as objects, overriding, or overloading.
📈 Suggestions for Improvement: The candidate could provide examples from object-oriented programming or use analogies to clarify the concept.
🧠 Score: 6/10

Now evaluate this:
"""

# Function to generate feedback
def generate_feedback(question, candidate_answer, ideal_answer):
    # Ensure candidate_answer is a string and normalize it
    candidate_answer = str(candidate_answer).strip().lower()
    
    # Edge case: Empty or unknown response
    if candidate_answer in ['i don\'t know', 'na', '', 'no idea', 'dont know']:
        feedback = f"""
✅ Strengths: The candidate was honest in stating that they don't know the answer. This is an important trait in interviews.
❌ Weaknesses or Misconceptions: The candidate did not attempt to answer the question.
📈 Suggestions for Improvement: It would be helpful to study the key concepts related to this topic. Here's the correct answer:
{ideal_answer}
🧠 Score: 3/10 (No answer provided, but this can be improved with more preparation.)
"""
    elif len(candidate_answer.split()) < 10:
        feedback = f"""
✅ Strengths: The candidate has made an attempt to answer the question.
❌ Weaknesses or Misconceptions: The answer lacks detail and does not fully address the question.
📈 Suggestions for Improvement: Please elaborate more on the key concepts and provide additional examples or explanations.
🧠 Score: 5/10 (The answer is too brief, but it could be improved with more information.)
"""
    else:
        # Use original answer casing for prompt
        original_answer = str(candidate_answer)
        prompt = f"""{few_shot_prompt}
Question: {question}
Ideal Answer: {ideal_answer}
Candidate Answer: {original_answer}
Feedback:"""
        result = feedback_model(prompt, max_length=512, truncation=True)
        feedback = result[0]['generated_text'].strip()

    return feedback


# Generate feedback for all rows and store in list
feedbacks = []
for _, row in tqdm(df.iterrows(), total=len(df)):
    question = row['question']
    candidate_answer = row['candidate_answer']
    ideal_answer = row['ideal_answer']
    
    feedback = generate_feedback(question, candidate_answer, ideal_answer)
    feedbacks.append(feedback)

# Add feedback column
df['feedback'] = feedbacks

# Save final dataset with feedback column
df.to_csv('/kaggle/working/finetune_feedback_dataset_with_feedback.csv', index=False)
print("✅ Feedback generation complete! Saved to: finetune_feedback_dataset_with_feedback.csv")


## Step 5: Fine-tuning the T5 Model with LoRA 🧪

### Purpose:
Applies **LoRA-based fine-tuning** to adapt the Flan-T5 model for generating structured interview feedback, using minimal compute.

---
### Process:
- **Data Prep**: Loads and formats the dataset into input-output pairs (question, candidate answer, ideal answer → feedback).
- **Tokenization**: Converts text into model-ready tokens using the T5 tokenizer.
- **LoRA Setup**: Applies Low-Rank Adaptation to specific layers (`q`, `v`) of the base model for efficient tuning.
- **Training**: Trains using HuggingFace’s `Trainer` with lightweight settings (small batch, mixed precision).
- **Save Adapters**: Saves only the LoRA adapter weights for compact deployment.

This step makes the model feedback-aware without needing to retrain the full T5 architecture. ✅

---

In [None]:
# === Load and clean the dataset ===
df = pd.read_csv("/kaggle/input/finetune/finetune_feedback_dataset_with_feedback.csv")
df = df.dropna(subset=['feedback'])
df = df[df['feedback'].str.strip() != '']

# Format for T5
df['input_text'] = df.apply(
    lambda row: f"Give feedback on this answer:\nQuestion: {row['question']}\nCandidate Answer: {row['candidate_answer']}\nIdeal Answer: {row['ideal_answer']}",
    axis=1
)
df['target_text'] = df['feedback']

# Load as HuggingFace Dataset
dataset = Dataset.from_pandas(df[['input_text', 'target_text']])
dataset = dataset.train_test_split(test_size=0.1)

# Tokenizer
model_name = "google/flan-t5-large"
tokenizer = T5Tokenizer.from_pretrained(model_name)

def tokenize(example):
    input_enc = tokenizer(example['input_text'], truncation=True, padding='max_length', max_length=512)
    target_enc = tokenizer(example['target_text'], truncation=True, padding='max_length', max_length=256)
    input_enc["labels"] = target_enc["input_ids"]
    return input_enc

tokenized = dataset.map(tokenize, batched=True)

# Load base model
base_model = T5ForConditionalGeneration.from_pretrained(model_name)

# === LoRA Configuration ===
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q", "v"],
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM
)

# Apply LoRA
model = get_peft_model(base_model, lora_config)

# === Training Arguments ===
training_args = TrainingArguments(
    output_dir="/kaggle/working/lora-t5",  
    per_device_train_batch_size=2,  
    gradient_accumulation_steps=4,  # Accumulate gradients over multiple steps
    num_train_epochs=1,
    logging_strategy="no",  
    save_strategy="no",     
    report_to="none",       
    fp16=torch.cuda.is_available() and not torch.cuda.is_bf16_supported(),
    bf16=torch.cuda.is_bf16_supported(),
    disable_tqdm=False      
)

# === Trainer Setup ===
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized["train"],
    eval_dataset=tokenized["test"],
    tokenizer=tokenizer,
)

# === Train ===
trainer.train()

# === Save LoRA Adapters ===
model.save_pretrained("/kaggle/working/lora-t5-adapter")
tokenizer.save_pretrained("/kaggle/working/lora-t5-adapter")

print("✅ Training complete! Adapters saved in /kaggle/working/lora-t5-adapter")


## Step 6: Loading the Fine-tuned Model for Inference 📥

### Purpose:
Loads the base **Flan-T5 model** along with the fine-tuned **LoRA adapter** for inference.

---
### Process:
- Loads tokenizer and base model (`google/flan-t5-large`).
- Integrates the trained **LoRA adapter** to restore fine-tuned weights.
- Moves the model to GPU (if available) for faster performance.
- Prepares the model for evaluation mode.

Ready for generating feedback on new candidate answers! 🚀

---

In [None]:
# Paths
base_model_name = "google/flan-t5-large"
adapter_path = "/kaggle/input/lora-adapter-checkpoint"  # your extracted LoRA files

# Load tokenizer and base model
tokenizer = T5Tokenizer.from_pretrained(base_model_name)
base_model = T5ForConditionalGeneration.from_pretrained(base_model_name)

# Load the fine-tuned LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_path)

# Use GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
model.eval()

print(f"✅ Model loaded on {device}")


## Step 7: Fine-tuning with LoRA 🛠️

### Purpose:
Implements Parameter-Efficient Fine-Tuning (PEFT) using **LoRA (Low-Rank Adaptation)** to adapt the T5 model for generating high-quality interview feedback. This approach enables fine-tuning with minimal computational overhead.

---

### Process:
- **Semantic Layer** 🔍:
  - Uses a SentenceTransformer (`all-MiniLM-L6-v2`) to compute semantic similarity between the **candidate answer** and **key phrases** extracted from the **ideal answer**.
  - Phrases are matched or marked as missing based on a cosine similarity threshold.

- **Hybrid Feedback Generation** ⚡:
  - Combines semantic comparison results (matched and missing phrases) with suggestions generated by the LoRA fine-tuned T5 model.
  - The prompt includes question context, the ideal answer, and the candidate’s response.
  - The model outputs **suggestions** and **scores**, which are merged with semantic results to form a complete feedback structure:
    - ✅ **Strengths**: Matched concepts
    - ❌ **Weaknesses**: Missing or unclear concepts
    - 📈 **Suggestions**: Model-generated improvements

---

### Result:
The feedback is detailed, accurate, and tailored—leveraging both semantic alignment and generative capabilities. This hybrid approach boosts the model's performance without requiring full-scale fine-tuning.

---

## Step 8: Interactive Evaluation with Hybrid Feedback Loop 🤖💬

### Purpose:
Simulates a **live interview scenario**, allowing users to enter responses and receive **real-time AI feedback** using the trained model.

---

### Process:
- Randomly selects an interview question and corresponding ideal answer.
- Accepts user input as the candidate’s answer.
- Checks for uncertainty phrases like “I don’t know.”
- Uses **semantic similarity** to identify matched/missing concepts.
- Generates personalized suggestions using the **fine-tuned LoRA model**.
- Filters out contradictions and redundant advice for clean feedback.

---

### Result:
Delivers actionable and structured feedback in real time, making it a powerful tool for **interview practice** and **self-assessment**. 🎯

---


In [None]:
# Load the dataset
df = pd.read_csv("/kaggle/input/software-engineering-interview-questions-dataset/Software Questions.csv", encoding='latin1')

# Load semantic similarity model
semantic_model = SentenceTransformer('all-MiniLM-L6-v2')

# Extract key phrases from the ideal answer
def extract_key_phrases(text):
    return [p.strip() for p in re.split(r'[.;\-•\n]', text) if len(p.strip()) > 5]

# Match phrases from ideal to candidate based on semantic similarity
def get_semantic_matches(ideal_phrases, candidate_answer, threshold=0.6):
    matches, misses = [], []
    answer_embedding = semantic_model.encode(candidate_answer, convert_to_tensor=True,show_progress_bar=False)

    for phrase in ideal_phrases:
        phrase_embedding = semantic_model.encode(phrase, convert_to_tensor=True,show_progress_bar=False)
        sim = util.pytorch_cos_sim(phrase_embedding, answer_embedding).item()
        if sim > threshold:
            matches.append(phrase)
        else:
            misses.append(phrase)
    return matches, misses

# Optional: Check for "I don't know"-type answers
def is_uncertain_answer(text):
    uncertain_keywords = ["i don't know", "not sure", "no idea", "can't say"]
    return any(kw in text.lower() for kw in uncertain_keywords)
    
few_shot_prompt = """
Below are examples of interview questions, ideal answers, candidate answers, and helpful feedback focusing only on suggestions for improvement and a score.

Example 1:
Question: What is the difference between an abstract class and an interface in Java?
Ideal Answer: An abstract class can have method implementations and member variables, while an interface can only have method declarations (before Java 8). A class can extend only one abstract class but implement multiple interfaces.
Candidate Answer: Abstract classes and interfaces are almost the same. Both can have abstract methods.
Feedback (only generate suggestion and score):
📈 Suggestions for Improvement: The candidate should mention that interfaces allow multiple inheritance and abstract classes allow implemented methods. Also, Java 8+ allows default methods in interfaces.
🧠 Score: 5/10

Example 2:
Question: What is normalization in databases?
Ideal Answer: Normalization is the process of organizing data to reduce redundancy and improve integrity by splitting data into related tables and defining relationships.
Candidate Answer: It’s about organizing a database.
Feedback (only generate suggestion and score):
📈 Suggestions for Improvement: The candidate should explain how normalization removes redundancy and mention normal forms (1NF, 2NF, etc.).
🧠 Score: 4/10
"""

# Feedback generation with hybrid semantic + fine-tuned model
def generate_feedback(question, candidate_answer, ideal_answer):
    if is_uncertain_answer(candidate_answer):
        return f"""
✅ \033[92mStrengths:\033[0m It's good that the candidate acknowledged their uncertainty.
❌ \033[91mWeaknesses or Misconceptions:\033[0m No relevant content was provided.
📈 \033[94mSuggestions for Improvement:\033[0m Here's a good explanation:
{ideal_answer}
🧠 \033[95mScore:\033[0m 2/10 (With more study, you can master this!)
"""

    # Semantic comparison
    ideal_phrases = extract_key_phrases(ideal_answer)
    matched, missing = get_semantic_matches(ideal_phrases, candidate_answer)

    # LLM suggestion only
    prompt = f"""{few_shot_prompt}
Question: {question}
Ideal Answer: {ideal_answer}
Candidate Answer: {candidate_answer}
Feedback (only generate suggestion and score):"""

    inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True, max_length=1024).to(device)
    outputs = model.generate(**inputs, max_length=512)
    suggestion_output = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()

    # Filter contradictions and repetitions from suggestions
    cleaned_lines = []
    for line in suggestion_output.splitlines():
        if any(kw in line.lower() for kw in ["good understanding", "poor understanding", "weaknesses"]):
            continue  # Remove contradiction
        if any(match.lower() in line.lower() for match in matched):
            continue  # Remove redundancy
        cleaned_lines.append(line.strip())

    final_suggestion = "\n".join(cleaned_lines).strip()

    # Format semantic feedback
    matched_str = ', '.join(matched) if matched else 'None'
    missing_str = ', '.join(missing) if missing else 'None'

    return f"""
✅ \033[92mStrengths:\033[0m Covered key ideas: {matched_str}
❌ \033[91mWeaknesses or Misconceptions:\033[0m Missing or unclear concepts: {missing_str}
📈 \033[94mSuggestions for Improvement:\033[0m
{final_suggestion if final_suggestion else suggestion_output}
"""

# ========== Live Q&A Loop ==========
while True:
    random_row = df.sample(n=1).iloc[0]
    question = random_row['Question']
    ideal_answer = random_row['Answer']

    print("\n🧠 Random Interview Question:")
    print(question)

    candidate_answer = input("\n👤 Your Answer (type 'stop' to quit): ")
    if candidate_answer.strip().lower() == 'stop':
        print("🛑 Session ended.")
        break

    print("\n✅ AI Feedback:")
    feedback = generate_feedback(question, candidate_answer, ideal_answer)
    print(feedback)


## ✅ Conclusion

This project introduces a hybrid feedback generation system that combines:

- **Semantic similarity matching** using `SentenceTransformer` to identify strengths and weaknesses in candidate responses.
- **LoRA fine-tuned T5 model** to generate constructive suggestions and score answers.

By blending semantic understanding with generative feedback, the system delivers high-quality, tailored feedback that supports learners in improving their interview performance. This parameter-efficient approach enables effective adaptation without extensive computational resources.

The method demonstrates strong potential for scalable, automated feedback in technical learning environments.

---


## 👤 Collaborators

This project was created by **Laksh Jain** as part of a solo initiative exploring intelligent interview feedback systems using semantic similarity and fine-tuned models.
