Skip to content

hosannaute/Question-Generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

DOCUMENT TO QUESTION GENERATOR - NLP PARSING UTILITY

A Python-based command-line tool that reads document files, sanitizes the extracted text, and applies rule-based logic to automatically generate contextual study questions.

PROJECT OVERVIEW Developed as a technical showcase for my SIWES engineering portfolio, this application demonstrates backend data processing and text manipulation. Instead of relying on external AI APIs, it uses custom Regular Expressions (Regex) and trigger-word mapping to parse academic or technical text and programmatically generate relevant questions.

KEY FEATURES

Multi-Format File Parsing: Safely extracts text from both raw .txt files and complex .pdf documents using the pypdf library.

Data Sanitization: Cleans the extracted data by collapsing whitespace and stripping page numbers or noisy numeric lines to prevent processing errors.

Rule-Based NLP Logic: Scans sentences for specific linguistic triggers (e.g., "results in", "functions by") to intelligently generate categorized questions (What, Explain, How, Why).

Dual Data Export: Saves the generated questions into two formats simultaneously: a human-readable .txt file and a structured .json file for database or API integration.

TECHNICAL STACK

Language: Python 3.x

Core Concepts: File Input/Output, Regular Expressions (re), String Manipulation, Algorithm Design, JSON Serialization.

Dependencies: pypdf (for PDF extraction).

INSTALLATION AND USAGE

Install Dependencies Open your terminal and install the required PDF library: pip install pypdf

Run the Application python document_reader.py

How to Use:

Run the script and input the path to your document when prompted.

Alternatively, pass the file path directly in the terminal (e.g., python document_reader.py notes.pdf).

The engine will extract the text, process the sentences, and instantly generate a questions.txt and questions.json file in the same folder.

CODE ARCHITECTURE HIGHLIGHTS This application proves an understanding of clean software architecture. By isolating the text extraction, data cleaning, and question generation into strict, separate functions, the codebase adheres to the Single Responsibility Principle (SRP), making it highly modular and easy to maintain.

Status: SIWES Portfolio Project (2026)

About

A Python NLP utility that extracts text from PDFs and generates study questions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages