Skip to content

maniparvas/Automation_of_data_preprocessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Data Analysis and Preprocessing System

A comprehensive Python-based system for automated data analysis and preprocessing of CSV/Excel datasets.

Features

  • Data Upload: Support for CSV and Excel files
  • Automated Analysis: Detailed insights including statistics, correlations, distributions, missing values, and outliers
  • Preprocessing Pipeline:
    • Missing value handling (drop, fill, interpolation)
    • Categorical encoding (label, one-hot)
    • Scaling and normalization
    • Feature selection and dimensionality reduction
  • Visualizations: Histograms, boxplots, heatmaps, scatter plots
  • Report Generation: PDF/HTML reports with findings and transformations
  • Export: Processed data ready for ML pipelines

Installation

pip install -r requirements.txt

Usage

Interactive Mode (Streamlit)

streamlit run main.py

Programmatic Mode

from data_analyzer import DataAnalyzer

analyzer = DataAnalyzer()
analyzer.load_data('your_dataset.csv')
analyzer.analyze()
analyzer.preprocess()
analyzer.export_processed_data('processed_data.csv')
analyzer.generate_report('report.html')

Project Structure

├── main.py                 # Main Streamlit interface
├── data_analyzer.py        # Core analysis class
├── data_loader.py          # Data loading utilities
├── preprocessing.py        # Preprocessing pipeline
├── visualizations.py       # Visualization functions
├── report_generator.py     # Report generation
├── utils.py               # Utility functions
└── requirements.txt       # Dependencies

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages