# AI-Based Answer Evaluation System
## Complete Implementation and Demonstration

This notebook demonstrates the complete AI-based written answer evaluation system for online examination platforms.

### Features:
- PDF text extraction
- NLP preprocessing (tokenization, lemmatization, stopword removal)
- Keyword-based matching with fuzzy logic
- Semantic similarity analysis using sentence transformers
- Concept coverage detection
- Automated marking with partial credit
- Constructive feedback generation
- Topic-wise performance analysis

### Table of Contents:
1. [Setup and Installation](#setup)
2. [Module Overview](#modules)
3. [Single Answer Evaluation](#single)
4. [Batch Evaluation](#batch)
5. [Performance Analysis](#performance)
6. [Custom Configuration](#config)

## 1. Setup and Installation <a id='setup'></a>

First, let's install all required dependencies and download necessary models.

In [None]:
# Install required packages (run once)
!pip install -q pdfplumber nltk spacy sentence-transformers scikit-learn fuzzywuzzy python-Levenshtein reportlab matplotlib seaborn

# Download spaCy model
!python -m spacy download en_core_web_sm

print("‚úì All dependencies installed!")

In [None]:
# Download NLTK data
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

print("‚úì NLTK data downloaded!")

In [None]:
# Create sample PDFs for testing
import sys
sys.path.append('.')

!python create_sample_pdfs.py

print("‚úì Sample PDFs created!")

## 2. Module Overview <a id='modules'></a>

Let's import and explore our evaluation system modules.

In [None]:
# Import all modules
import sys
from pathlib import Path

# Add src to path
sys.path.insert(0, str(Path('.').absolute() / 'src'))

from config import Config
from pdf_processor import PDFProcessor
from nlp_preprocessor import NLPPreprocessor
from keyword_matcher import KeywordMatcher
from semantic_analyzer import SemanticAnalyzer
from concept_detector import ConceptDetector
from evaluation_engine import EvaluationEngine
from feedback_generator import FeedbackGenerator
from performance_analyzer import PerformanceAnalyzer

import json
import warnings
warnings.filterwarnings('ignore')

print("‚úì All modules imported successfully!")
print(f"\nConfiguration:")
print(f"  Semantic Weight: {Config.SEMANTIC_WEIGHT*100}%")
print(f"  Keyword Weight: {Config.KEYWORD_WEIGHT*100}%")
print(f"  Concept Weight: {Config.CONCEPT_WEIGHT*100}%")
print(f"  Transformer Model: {Config.SENTENCE_TRANSFORMER_MODEL}")

### 2.1 Test Individual Modules

Let's test each module independently.

In [None]:
# Test PDF Processor
pdf_processor = PDFProcessor()
sample_pdf = "data/sample_student_answers/Q1_Student_Excellent.pdf"

result = pdf_processor.extract_text(sample_pdf)
print("üìÑ PDF Processing Test:")
print(f"  Status: {'‚úì Success' if result['success'] else '‚úó Failed'}")
print(f"  Pages: {result['pages']}")
print(f"  Text Length: {len(result['text'])} characters")
print(f"\n  First 200 characters:\n  {result['text'][:200]}...")

In [None]:
# Test NLP Preprocessor
preprocessor = NLPPreprocessor()
sample_text = result['text']

processed = preprocessor.preprocess(sample_text, pipeline=['clean', 'lemmatize', 'keywords'])

print("üî§ NLP Preprocessing Test:")
print(f"  Cleaned Text Length: {len(processed['cleaned'])} chars")
print(f"  Lemmatized Tokens: {len(processed['lemmas'])}")
print(f"\n  Top Keywords:")
for kw, freq in processed['keywords'][:10]:
    print(f"    {kw}: {freq}")

In [None]:
# Test Semantic Analyzer
print("üß† Loading Semantic Analyzer (this may take a minute)...")
semantic_analyzer = SemanticAnalyzer()

text1 = "Binary search tree is a data structure"
text2 = "BST is a hierarchical data structure"

similarity = semantic_analyzer.calculate_similarity(text1, text2)
print(f"\n  Similarity between texts: {similarity:.3f}")
print(f"  Status: ‚úì Semantic analyzer working!")

## 3. Single Answer Evaluation <a id='single'></a>

Now let's evaluate a single student answer end-to-end.

In [None]:
# Load knowledge base and model answer
with open('data/knowledge_base.json', 'r') as f:
    knowledge_base = json.load(f)

with open('data/model_answers/Q1_model_answer.txt', 'r') as f:
    model_answer = f.read()

# Load question data
question_data = knowledge_base['questions']['Q1']

print("üìö Loaded Knowledge Base")
print(f"\nQuestion: {question_data['question_text']}")
print(f"Topic: {question_data['topic']}")
print(f"Max Marks: {question_data['max_marks']}")

In [None]:
# Extract student answer from PDF
student_pdf = "data/sample_student_answers/Q1_Student_Good.pdf"
pdf_result = pdf_processor.extract_text(student_pdf)
student_answer = pdf_result['text']

print(f"üìù Student Answer Extracted:")
print(f"\n{student_answer}")

In [None]:
# Initialize evaluation engine
engine = EvaluationEngine(config=Config)
engine.concept_detector.knowledge_base = knowledge_base

print("üîß Evaluation Engine Initialized")
print("\n‚öôÔ∏è Evaluating answer...\n")

# Perform evaluation
evaluation_result = engine.evaluate(
    student_answer=student_answer,
    model_answer=model_answer,
    question_data=question_data,
    max_marks=10.0
)

print("‚úì Evaluation Complete!")

In [None]:
# Display evaluation results
print("‚ïê" * 70)
print("üìä EVALUATION RESULTS")
print("‚ïê" * 70)
print(f"\nMarks Obtained: {evaluation_result['marks_obtained']}/{evaluation_result['max_marks']}")
print(f"Percentage: {evaluation_result['percentage']}%")
print(f"Final Score: {evaluation_result['final_score']:.3f}")

print("\n" + "‚îÄ" * 70)
print("Component Breakdown:")
print("‚îÄ" * 70)

for component, data in evaluation_result['scores'].items():
    print(f"\n{component.upper()}:")
    print(f"  Score: {data['score']:.3f}")
    print(f"  Weight: {data['weight']*100}%")
    print(f"  Contribution: {data['contribution']:.3f}")

In [None]:
# Generate feedback
feedback_gen = FeedbackGenerator()
feedback = feedback_gen.generate_feedback(evaluation_result, verbose=False)

# Display formatted feedback
feedback_text = feedback_gen.format_feedback_text(feedback)
print(feedback_text)

## 4. Batch Evaluation <a id='batch'></a>

Evaluate multiple student answers at once.

In [None]:
# Get all student PDFs for Q1
from pathlib import Path

pdf_dir = Path("data/sample_student_answers")
student_pdfs = list(pdf_dir.glob("Q1_Student_*.pdf"))

print(f"üìÅ Found {len(student_pdfs)} student answers to evaluate:\n")
for pdf in student_pdfs:
    print(f"  ‚Ä¢ {pdf.name}")

In [None]:
# Batch evaluation
batch_results = []

print("\n‚öôÔ∏è Starting batch evaluation...\n")
print("‚îÄ" * 70)

for i, pdf_path in enumerate(sorted(student_pdfs), 1):
    print(f"\nEvaluating {pdf_path.name}...")
    
    # Extract text
    pdf_result = pdf_processor.extract_text(str(pdf_path))
    if not pdf_result['success']:
        print(f"  ‚úó Error: {pdf_result['error']}")
        continue
    
    # Evaluate
    result = engine.evaluate(
        student_answer=pdf_result['text'],
        model_answer=model_answer,
        question_data=question_data,
        max_marks=10.0
    )
    
    result['student_name'] = pdf_path.stem  # Use filename as student ID
    batch_results.append(result)
    
    print(f"  ‚úì Marks: {result['marks_obtained']}/10 ({result['percentage']:.1f}%)")

print("\n" + "‚îÄ" * 70)
print(f"‚úì Batch evaluation complete! Evaluated {len(batch_results)} answers.")

In [None]:
# Display batch results summary
import pandas as pd

summary_data = []
for result in batch_results:
    summary_data.append({
        'Student': result['student_name'].replace('Q1_Student_', ''),
        'Marks': f"{result['marks_obtained']}/10",
        'Percentage': f"{result['percentage']:.1f}%",
        'Semantic': f"{result['scores']['semantic']['score']*100:.1f}%",
        'Keyword': f"{result['scores']['keyword']['score']*100:.1f}%",
        'Concept': f"{result['scores']['concept']['score']*100:.1f}%"
    })

df = pd.DataFrame(summary_data)
print("\nüìä BATCH EVALUATION SUMMARY")
print("‚ïê" * 100)
print(df.to_string(index=False))
print("‚ïê" * 100)

## 5. Performance Analysis <a id='performance'></a>

Generate topic-wise performance analysis and student profiles.

In [None]:
# Performance analysis
perf_analyzer = PerformanceAnalyzer()

performance = perf_analyzer.analyze_single_exam(batch_results)

print("üìà PERFORMANCE ANALYSIS")
print("‚ïê" * 70)

overall = performance['overall_performance']
print(f"\nOverall Performance:")
print(f"  Total Marks: {overall['obtained_marks']}/{overall['total_marks']}")
print(f"  Percentage: {overall['percentage']}%")
print(f"  Grade: {overall['grade']}")

components = performance['component_performance']
print(f"\nComponent Averages:")
print(f"  Semantic: {components['semantic_avg']}%")
print(f"  Keyword: {components['keyword_avg']}%")
print(f"  Concept: {components['concept_avg']}%")

if performance['strong_areas']:
    print(f"\n‚úì Strong Areas:")
    for area in performance['strong_areas']:
        print(f"  ‚Ä¢ {area['concept']} ({area['average_coverage']}%)")

if performance['weak_areas']:
    print(f"\n‚úó Weak Areas:")
    for area in performance['weak_areas']:
        print(f"  ‚Ä¢ {area['concept']} ({area['average_coverage']}%)")

In [None]:
# Generate student profile for first student
if batch_results:
    # Create performance data for single student
    single_student = [batch_results[0]]
    student_perf = perf_analyzer.analyze_single_exam(single_student)
    
    profile = perf_analyzer.generate_student_profile(
        student_perf,
        student_id=batch_results[0]['student_name']
    )
    
    print("\nüë§ STUDENT PROFILE")
    print("‚ïê" * 70)
    print(f"Student ID: {profile['student_id']}")
    print(f"Grade: {profile['overall_grade']} ({profile['percentage']}%)")
    print(f"\n{profile['performance_summary']}")
    
    print(f"\n‚úì Strengths:")
    for strength in profile['strengths']:
        print(f"  ‚Ä¢ {strength}")
    
    print(f"\nüìö Recommendations:")
    for rec in profile['recommendations']:
        print(f"  [{rec['priority']}] {rec['area']}: {rec['suggestion']}")
    
    # Export profile
    output_path = "output/student_profile.json"
    Path("output").mkdir(exist_ok=True)
    perf_analyzer.export_report(profile, output_path)
    print(f"\n‚úì Profile exported to: {output_path}")

## 6. Custom Configuration <a id='config'></a>

Customize evaluation weights and parameters.

In [None]:
# Try different weight configurations
print("üîß Testing Different Weight Configurations\n")

test_configs = [
    {"name": "Semantic-Heavy", "semantic": 0.7, "keyword": 0.2, "concept": 0.1},
    {"name": "Balanced", "semantic": 0.5, "keyword": 0.3, "concept": 0.2},
    {"name": "Keyword-Heavy", "semantic": 0.4, "keyword": 0.4, "concept": 0.2},
]

# Use first student answer for testing
test_answer = batch_results[0]['student_answer']

comparison_results = []

for config in test_configs:
    # Update config
    Config.update_weights(
        semantic=config['semantic'],
        keyword=config['keyword'],
        concept=config['concept']
    )
    
    # Re-initialize engine
    test_engine = EvaluationEngine(config=Config)
    test_engine.concept_detector.knowledge_base = knowledge_base
    
    # Evaluate
    result = test_engine.evaluate(
        student_answer=test_answer,
        model_answer=model_answer,
        question_data=question_data,
        max_marks=10.0
    )
    
    comparison_results.append({
        'Configuration': config['name'],
        'Weights': f"S:{config['semantic']:.1f} K:{config['keyword']:.1f} C:{config['concept']:.1f}",
        'Marks': f"{result['marks_obtained']:.2f}/10",
        'Percentage': f"{result['percentage']:.1f}%"
    })

df_comparison = pd.DataFrame(comparison_results)
print(df_comparison.to_string(index=False))

print("\nüí° Different weight configurations can significantly impact final marks!")

## Summary

This notebook demonstrated the complete AI-based answer evaluation system with:

‚úì Automated PDF processing and text extraction  
‚úì Advanced NLP preprocessing  
‚úì Multi-faceted evaluation (semantic + keyword + concept)  
‚úì Constructive feedback generation  
‚úì Performance analysis and student profiling  
‚úì Configurable marking schemes  

### Next Steps:
1. Upload your own question bank and model answers
2. Customize the knowledge base for your subjects
3. Adjust weights based on your marking philosophy
4. Integrate with web platform using Flask/FastAPI
5. Add OCR support for handwritten answers
6. Implement LLM-based feedback enhancement