A comprehensive Python-based system for automated data analysis and preprocessing of CSV/Excel datasets.
- Data Upload: Support for CSV and Excel files
- Automated Analysis: Detailed insights including statistics, correlations, distributions, missing values, and outliers
- Preprocessing Pipeline:
- Missing value handling (drop, fill, interpolation)
- Categorical encoding (label, one-hot)
- Scaling and normalization
- Feature selection and dimensionality reduction
- Visualizations: Histograms, boxplots, heatmaps, scatter plots
- Report Generation: PDF/HTML reports with findings and transformations
- Export: Processed data ready for ML pipelines
pip install -r requirements.txtstreamlit run main.pyfrom data_analyzer import DataAnalyzer
analyzer = DataAnalyzer()
analyzer.load_data('your_dataset.csv')
analyzer.analyze()
analyzer.preprocess()
analyzer.export_processed_data('processed_data.csv')
analyzer.generate_report('report.html')├── main.py # Main Streamlit interface
├── data_analyzer.py # Core analysis class
├── data_loader.py # Data loading utilities
├── preprocessing.py # Preprocessing pipeline
├── visualizations.py # Visualization functions
├── report_generator.py # Report generation
├── utils.py # Utility functions
└── requirements.txt # Dependencies