Skip to content

This project is a comprehensive tool designed to automatically evaluate and score self-introduction transcripts—such as student speeches—according to a detailed, data-driven rubric.

Notifications You must be signed in to change notification settings

pulkittyagi02/Text-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Text-Analyzer

This project is a comprehensive tool designed to automatically evaluate and score self-introduction transcripts—such as student speeches—according to a detailed, data-driven rubric.

Project Overview This project provides an automated, detailed, and objective scoring tool for self-introduction or student transcript evaluation, using rule-based logic and NLP. The tool is implemented entirely in a Google Colab notebook for rapid prototyping and ease of use—no backend deployment required.

Features Multi-criterion Evaluation: Each transcript is scored on multiple rubric-driven categories—Salutation, Keyword Presence, Flow, Speech Rate, Grammar, Vocabulary Richness, Clarity, and Engagement. Hybrid Rule + NLP Methods: Uses both deterministic logic (for structure/keywords etc.) and NLP models for sentiment, grammar, and vocabulary checks. Data-Driven Rubric: All weights and scoring rules are defined in a Python dictionary (RUBRIC), making modification and transparency easy. Detailed Feedback: Returns a normalized score (0–100) and criterion-level breakdown with feedback. Optimized for Colab: Easy to use—copy the code into Colab, follow the input prompts, and get instant results!

Scoring Formula & Rubric Explanation For each transcript: Salutation Level: Checks for strong/good/normal/no opening phrases, awarding up to 5 points per rubric, scaled by weight. Key word Presence: Points for each must-have and good-to-have keyword found. Flow: Checks logical order of key details (e.g., salutation → name → details → closing). Speech Rate: Assesses ideal speaking rates (words per minute). Grammar: Uses LanguageTool for error rate per 100 words. Vocabulary Richness: Uses type-token ratio (TTR). Clarity: Counts filler words from a preset list. Engagement: VADER sentiment analysis.

Calculation: Each raw score is multiplied by its rubric-assigned weight. The sum of all weighted scores is normalized to a 0–100 range. Output is provided as a JSON-style summary and detailed breakdown.

Development Steps (In Depth) Objective/Requirement Analysis: Score speeches in a detailed, explainable way using both rules and NLP. Rubric Design: All criteria, weights, and scoring rules built into a RUBRIC Python dictionary for clarity and easy editing. Library Setup: Installed NLTK, LanguageTool, VADER, and SentenceTransformer for NLP tasks. Model Initialization: Downloaded NLTK data, initialized language tool server and transformer, VADER sentiment. Scoring Function Implementation: Each rubric category receives a dedicated, explainable function, returning both score and feedback. Aggregator Construction: score_transcript() collects individual scores, multiplies by weights, normalizes, and produces the output dictionary. Colab Input/Output Integration: Inputs are gathered via input(); output is printed or formatted as JSON. No web server needed! Testing & Iteration: Sample cases tested and rubric fine-tuned for desired granularity/objectivity. Errors debugged and rubric/logic synced wherever needed.

Dependencies Python 3.x

Google Colab

nltk, language_tool_python, vaderSentiment, sentence-transformers

Author: Pulkit Tyagi

Date:24/11/2025

About

This project is a comprehensive tool designed to automatically evaluate and score self-introduction transcripts—such as student speeches—according to a detailed, data-driven rubric.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published