Skip to content

eliot-99/DataScope

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DataScope - AI-Powered Data Analysis Platform

DataScope Logo

Transform Your Data Into Actionable Insights

Professional AI-powered data analysis platform for comprehensive data visualization and intelligent recommendations


πŸš€ Overview

DataScope is a cutting-edge web application that revolutionizes data analysis by combining advanced statistical computing with artificial intelligence. Upload your datasets and receive instant comprehensive analysis, beautiful visualizations, and AI-generated insights that help you make data-driven decisions.

✨ Key Highlights

  • πŸ€– AI-Powered Analysis: Leverages Google Gemini AI for intelligent insights and recommendations
  • πŸ“Š Advanced Visualizations: Automatic generation of correlation matrices, distributions, and statistical plots
  • πŸ” Comprehensive Analytics: Deep statistical analysis including outlier detection and data quality assessment
  • πŸ’‘ Smart Recommendations: AI-driven suggestions for data cleaning, preprocessing, and ML readiness
  • 🎨 Modern UI/UX: Responsive design with interactive grid backgrounds and professional styling
  • πŸ”’ Security-First: Temporary file processing with automatic cleanup and no permanent data storage
  • ⚑ Production-Ready: Docker, Heroku, and Railway deployment configurations included

πŸ›  Technology Stack

Backend Technologies

  • Python 3.10+ - Core runtime environment
  • Flask 2.3.3 - Web framework with modern architecture
  • Pandas 2.1.1 - Data manipulation and analysis
  • NumPy 1.24.3 - Numerical computing
  • Matplotlib 3.7.2 - Statistical visualizations
  • Seaborn 0.12.2 - Advanced statistical graphics
  • Scikit-learn 1.3.0 - Machine learning utilities
  • Google Generative AI 0.3.0 - AI-powered insights
  • Plotly 5.17.0 - Interactive visualizations

Frontend Technologies

  • Modern CSS3 - Custom design system with CSS Grid and Flexbox
  • Vanilla JavaScript - Interactive animations and dynamic grid system
  • Canvas API - Animated background effects
  • Font Awesome 6.5.1 - Professional iconography
  • Progressive Enhancement - Works without JavaScript

Production Infrastructure

  • Gunicorn 21.2.0 - WSGI HTTP Server
  • Docker - Containerized deployment
  • Environment Configuration - Secure settings management
  • Health Check Endpoints - Monitoring and uptime tracking

πŸ“‹ Features Deep Dive

πŸ”¬ Comprehensive Data Analysis

  • Statistical Summaries: Mean, median, mode, standard deviation, quartiles
  • Data Quality Assessment: Missing value analysis, duplicate detection, data type validation
  • Feature Analysis: Unique value counts, cardinality analysis, data distribution patterns
  • Outlier Detection: Statistical outlier identification using IQR and z-score methods
  • Correlation Analysis: Pearson, Spearman correlation matrices with significance testing

πŸ“ˆ Advanced Visualizations

  • Distribution Analysis: Histograms with KDE overlays for all numeric columns
  • Correlation Heatmaps: Interactive correlation matrices with statistical significance
  • Box Plots: Outlier visualization and quartile analysis
  • Categorical Analysis: Bar charts for categorical data distribution
  • Missing Data Visualization: Heatmaps showing missing value patterns

πŸ€– AI-Powered Insights

  • Data Quality Scoring: Automated assessment of dataset quality
  • Cleaning Recommendations: AI-generated suggestions for data preprocessing
  • Pattern Recognition: Automatic detection of trends, seasonality, and anomalies
  • ML Readiness Assessment: Evaluation of data suitability for machine learning
  • Next Steps Guidance: Intelligent recommendations for further analysis

🎨 Modern User Experience

  • Interactive Grid Background: Dynamic background that responds to mouse movements
  • Professional Design System: Consistent color palette and typography
  • Responsive Layout: Optimized for desktop, tablet, and mobile devices
  • Real-time Processing: Progress indicators and smooth loading states
  • Error Handling: Comprehensive error pages and user feedback

πŸ— Project Architecture

DataScope Deployment/
β”œβ”€β”€ πŸ“ Core Application
β”‚   β”œβ”€β”€ app.py                     # Flask application with comprehensive routing
β”‚   β”œβ”€β”€ requirements.txt           # Python dependencies with version pinning
β”‚   └── .env                      # Environment configuration (not in repo)
β”œβ”€β”€ πŸ“ Frontend Assets
β”‚   β”œβ”€β”€ static/
β”‚   β”‚   β”œβ”€β”€ css/
β”‚   β”‚   β”‚   β”œβ”€β”€ style.css         # Modern design system with CSS variables
β”‚   β”‚   β”‚   β”œβ”€β”€ dashboard.css     # Dashboard-specific styles
β”‚   β”‚   β”‚   └── dashboard_fixed.css # Fixed dashboard layout styles
β”‚   β”‚   β”œβ”€β”€ js/
β”‚   β”‚   β”‚   β”œβ”€β”€ script.js         # Interactive grid system and animations
β”‚   β”‚   β”‚   └── simple_script.js  # Utility functions
β”‚   β”‚   β”œβ”€β”€ images/
β”‚   β”‚   β”‚   β”œβ”€β”€ datascope-logo.svg    # Vector logo
β”‚   β”‚   β”‚   β”œβ”€β”€ datascope-favicon.svg # Favicon
β”‚   β”‚   β”‚   └── FORMAL.png        # Additional branding
β”‚   β”‚   └── plots/                # Generated visualization storage
β”œβ”€β”€ πŸ“ Templates
β”‚   β”œβ”€β”€ base.html                 # Base template with navigation
β”‚   β”œβ”€β”€ index.html                # Landing page with features showcase
β”‚   β”œβ”€β”€ upload.html               # File upload interface
β”‚   β”œβ”€β”€ upload_new.html           # Alternative upload interface
β”‚   β”œβ”€β”€ results_dashboard.html    # Analysis results display
β”‚   β”œβ”€β”€ about.html                # About page
β”‚   β”œβ”€β”€ contact.html              # Contact form
β”‚   β”œβ”€β”€ 404.html                  # 404 error page
β”‚   └── 500.html                  # 500 error page
β”œβ”€β”€ πŸ“ Deployment Configuration
β”‚   β”œβ”€β”€ Dockerfile                # Docker containerization
β”‚   β”œβ”€β”€ Procfile                  # Heroku deployment config
β”‚   β”œβ”€β”€ runtime.txt               # Python version specification
β”‚   └── .gitignore                # Git ignore rules
└── πŸ“ Temporary Storage
    └── uploads/                  # Temporary file uploads (auto-cleaned)

πŸš€ Installation & Setup

Prerequisites

  • Python 3.10+ (specified in runtime.txt)
  • pip package manager
  • Git for repository cloning

Local Development Setup

  1. Clone the Repository

    git clone <your-repository-url>
    cd "DataScope Deployment"
  2. Create Virtual Environment

    python -m venv venv
    
    # Windows
    venv\Scripts\activate
    
    # macOS/Linux
    source venv/bin/activate
  3. Install Dependencies

    pip install -r requirements.txt
  4. Environment Configuration

    # Create .env file with required variables
    GEMINI_API_KEY=your_gemini_api_key_here
    FLASK_SECRET_KEY=your_secure_secret_key
    FLASK_ENV=development
    MAX_CONTENT_LENGTH=16777216
    UPLOAD_FOLDER=uploads
    SMTP_SERVER=smtp.gmail.com
    SMTP_PORT=587
    SMTP_USERNAME=your_email@gmail.com
    SMTP_PASSWORD=your_app_password
  5. Run Application

    python app.py

    Application will be available at http://localhost:5000


🌐 Deployment Options

🐳 Docker Deployment

# Build Docker image
docker build -t datascope .

# Run container
docker run -p 5000:5000 --env-file .env datascope

πŸš€ Heroku Deployment

# Install Heroku CLI and login
heroku login

# Create new Heroku app
heroku create your-datascope-app

# Set environment variables
heroku config:set GEMINI_API_KEY=your_api_key
heroku config:set FLASK_SECRET_KEY=your_secret_key

# Deploy application
git push heroku main

πŸš„ Railway Deployment

  1. Connect your GitHub repository to Railway
  2. Configure environment variables in Railway dashboard
  3. Automatic deployment on git push

βš™οΈ Configuration Options

File Upload Settings

  • Maximum File Size: 50MB (configurable via MAX_CONTENT_LENGTH)
  • Supported Formats: CSV (.csv), Excel (.xlsx, .xls)
  • Processing Timeout: 120 seconds
  • Automatic Cleanup: Files deleted after analysis

AI Configuration

  • Provider: Google Gemini AI
  • Model: gemini-pro
  • Rate Limiting: Built-in request throttling
  • Fallback: Graceful degradation if AI unavailable

Security Features

  • File Validation: Extension and content type checking
  • Data Sanitization: Automatic data cleaning and validation
  • Session Management: Secure session handling
  • CSRF Protection: Built-in Flask security features
  • No Data Persistence: Files automatically deleted post-analysis

πŸ“Š Analysis Capabilities

Statistical Analysis

  • Descriptive Statistics: Complete statistical summaries
  • Missing Value Analysis: Comprehensive missing data patterns
  • Data Type Detection: Automatic type inference and validation
  • Outlier Detection: Multiple statistical methods (IQR, Z-score)
  • Correlation Analysis: Pearson and Spearman correlations

Visualization Types

  • Distribution Plots: Histograms with kernel density estimation
  • Correlation Heatmaps: Interactive correlation matrices
  • Box Plots: Quartile analysis and outlier visualization
  • Bar Charts: Categorical data frequency analysis
  • Missing Data Heatmaps: Visual missing value patterns

AI Insights

  • Data Quality Scoring: Automated quality assessment
  • Preprocessing Recommendations: AI-suggested data cleaning steps
  • Pattern Detection: Trend and anomaly identification
  • ML Readiness: Machine learning suitability assessment
  • Custom Insights: Context-aware analytical recommendations

🎨 Design System

Color Palette

--primary-dark: #28262b      /* Main background and primary elements */
--primary-light: #a9a29c     /* Secondary text and accents */
--secondary-light: #d5ccc7   /* Primary text and highlights */
--neutral-dark: #333333      /* Cards and surface elements */

Typography

  • Font Family: Inter (Google Fonts)
  • Weights: 300, 400, 500, 600, 700, 800, 900
  • Scale: Modular scale from 0.75rem to 3.75rem

Interactive Elements

  • Animated Grid Background: Canvas-based particle system
  • 3D Card Hover Effects: CSS transforms with parallax
  • Smooth Transitions: Easing functions for natural motion
  • Loading Animations: Progress indicators and spinners

πŸ”§ API Endpoints

Core Endpoints

  • GET / - Landing page
  • GET /upload - File upload interface
  • POST /analyze - File analysis (accepts multipart/form-data)
  • GET /results/<timestamp> - Analysis results dashboard
  • POST /chat - AI chat interface
  • GET /export/<timestamp> - Export analysis data

Utility Endpoints

  • GET /api/health - Health check endpoint
  • GET /api/status - Application status
  • GET /about - About page
  • GET /contact - Contact form
  • POST /contact - Contact form submission

Error Handling

  • 404 - Custom 404 error page
  • 500 - Custom 500 error page
  • Graceful error handling with user-friendly messages

πŸ§ͺ Testing

Manual Testing

# Test file upload functionality
curl -X POST -F "file=@sample.csv" http://localhost:5000/analyze

# Test health endpoint
curl http://localhost:5000/api/health

# Test API status
curl http://localhost:5000/api/status

Supported Test Files

  • CSV files with various data types
  • Excel files (.xlsx, .xls)
  • Files with missing values
  • Large datasets (up to 50MB)

πŸ”’ Security & Privacy

Data Security

  • Temporary Processing: Files stored only during analysis
  • Automatic Cleanup: All uploads deleted after processing
  • No Permanent Storage: Zero data retention policy
  • Secure File Handling: Validated file types and content
  • Memory Management: Efficient memory usage and cleanup

Privacy Protection

  • No Data Sharing: Zero third-party data sharing
  • Local Processing: Data processed on secure servers
  • Encrypted Transmission: HTTPS for all communications
  • Session Security: Secure session management
  • Environment Variables: Sensitive data stored securely

🀝 Contributing

Development Setup

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes with proper testing
  4. Commit changes: git commit -m 'Add amazing feature'
  5. Push to branch: git push origin feature/amazing-feature
  6. Open a Pull Request with detailed description

Code Standards

  • Python: Follow PEP 8 style guidelines
  • JavaScript: Use ES6+ features and modern practices
  • CSS: Follow BEM methodology for class naming
  • Documentation: Update README and code comments
  • Testing: Add tests for new functionality

πŸ“ˆ Performance

Optimization Features

  • Efficient Data Processing: Pandas vectorized operations
  • Memory Management: Automatic garbage collection
  • Caching: Static asset caching
  • Compression: Gzip compression for responses
  • CDN Integration: Ready for CDN deployment

Performance Metrics

  • Analysis Speed: < 30 seconds for most datasets
  • Memory Usage: Optimized for large datasets
  • Response Time: < 2 seconds for page loads
  • Uptime: 99.9% availability target

πŸ†˜ Support & Troubleshooting

Common Issues

File Upload Errors

  • Ensure file size is under 50MB
  • Verify file format (CSV or Excel)
  • Check file permissions and corruption

Analysis Failures

  • Verify data quality and format
  • Check for extremely large datasets
  • Ensure network connectivity for AI features

Deployment Issues

  • Verify all environment variables are set
  • Check Python version compatibility
  • Ensure all dependencies are installed

Getting Help

  • GitHub Issues: Report bugs and feature requests
  • Contact Form: Use in-app contact form for support
  • Documentation: Check this README for comprehensive info

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for complete details.


πŸ™ Acknowledgments

  • Google Gemini AI - For providing advanced AI capabilities
  • Flask Community - For the excellent web framework
  • Open Source Libraries - Pandas, NumPy, Matplotlib, Seaborn, and others
  • Design Inspiration - Modern data visualization platforms
  • Contributors - All developers who contribute to this project

πŸ“Š Project Stats

  • Languages: Python, JavaScript, CSS, HTML
  • Framework: Flask with modern architecture
  • AI Integration: Google Gemini API
  • Deployment: Docker, Heroku, Railway ready
  • UI/UX: Modern responsive design
  • Security: Production-grade security features

πŸš€ Ready to Transform Your Data?

Start analyzing your datasets with DataScope's AI-powered platform

Try DataScope Now

Built with ❀️ for data professionals worldwide

DataScope - Where Data Meets Intelligence

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors