Skip to content

A Python desktop application (Tkinter) that analyzes documents paragraph-by-paragraph using a Hugging Face Transformer model to perform text classification and generate structured CSV reports.

Notifications You must be signed in to change notification settings

sabdulraqeb/python-document_analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

python document_analyzer

Document Analysis Reporter (Python GUI) A desktop application built with Python and Tkinter for analyzing documents by paragraph, using a Hugging Face Transformer model for real-time text classification.

Features GUI Interface: Simple, cross-platform interface for selecting documents.

Multi-File Support: Reads TXT and MD files directly. (Simulates PDF/DOCX conversion, requiring external libraries for full functionality).

Paragraph Chunking: Splits documents reliably by double newlines (\n\n) to analyze content chunk-by-chunk.

NLP Integration: Uses the transformers library for sentiment analysis, with a graceful fallback to a keyword-based heuristic if the model is unavailable.

Report Generation: Saves the analysis results into structured CSV and human-readable TXT files in a user-selected directory.

Installation and Setup Clone the Repository:

git clone https://github.com/sabdulraqeb/python-document_analyzer/tree/main cd document-analysis-reporter

Create a Virtual Environment (Recommended):

python -m venv venv source venv/bin/activate # On Windows, use venv\Scripts\activate

Install Dependencies:

pip install -r requirements.txt

Note: Installing torch and transformers is necessary to enable the real NLP model. If you skip this step, the app will automatically use the simulated, keyword-based analysis.

Usage Run the main script from your terminal:

python document_analyzer.py

A window will open showing the application status (either "NLP Model Active!" or "Using Keyword Simulation").

Click "Select Document" and choose a .txt, .md, .pdf (simulated), or .docx (simulated) file.

After processing, a second dialog will appear asking you to select the folder where the analysis reports (_report.csv and _report.txt) should be saved.

A final confirmation message will appear once the files are saved successfully.

About

A Python desktop application (Tkinter) that analyzes documents paragraph-by-paragraph using a Hugging Face Transformer model to perform text classification and generate structured CSV reports.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages