Course: Programming for Problem Solving using Python
Assignment Title: Data Analysis and Visualization with Real-World Weather Data
Type: Individual Mini Project
Name: Ayush Kumar
Roll No: 2501420003
Programme: BTech CSE (DS)
This project implements a complete weather data analysis and visualization system that covers data acquisition, cleaning, statistical analysis, visualization, and reporting. It addresses the real-world problem of climate awareness and sustainability by providing actionable insights from weather data.
By completing this project, you will:
- β Load and clean real-world CSV datasets with Pandas
- β Compute statistics using NumPy and group-by operations
- β Create informative plots using Matplotlib
- β Apply storytelling techniques to present insights
- β Automate analysis and export summaries in Python
Project-4/
βββ main.py # Main project script
βββ weather_data.csv # Sample weather dataset
βββ README.md # This file
βββ output/ # Output directory (created on first run)
βββ 01_temperature_trends.png
βββ 02_monthly_rainfall.png
βββ 03_humidity_vs_temperature.png
βββ weather_analysis_combined.png
βββ cleaned_weather_data.csv
βββ aggregated_data_M.csv
βββ WEATHER_ANALYSIS_REPORT.md
- Python 3.7 or higher
- pip (Python package manager)
Install the required packages using:
pip install pandas numpy matplotlibOr install from requirements file:
pip install -r requirements.txtOption A: Use the provided sample data
- The
weather_data.csvfile is already included with the project
Option B: Use your own data
-
Download weather data from:
-
CSV file must contain columns like:
Date,Temperature,Rainfall,Humidity,Pressure -
Place the CSV file in the project directory
-
Update the
csv_filevariable inmain.pyif using a different filename
python main.pyThe script will:
- Load and inspect the weather data
- Clean and process the data
- Compute statistical metrics
- Generate visualizations
- Perform grouping and aggregation
- Export results and generate a report
- β Load CSV file into Pandas DataFrame
- β Inspect data structure with
head(),info(),describe() - β Display missing values summary
- β Handle missing values (mean imputation for numeric, drop for dates)
- β Convert date columns to datetime format
- β Filter and validate relevant columns
- β Compute daily, monthly, and yearly statistics
- β Calculate mean, median, std dev, min, max, percentiles
- β Generate comprehensive statistical summaries
- β Line Chart: Daily temperature trends
- β Bar Chart: Monthly rainfall totals
- β Scatter Plot: Humidity vs. Temperature correlation
- β Combined Figure: Multiple plots in single view
- β Group data by month/season using
groupby() - β Use
resample()for time-series aggregation - β Calculate aggregate statistics (mean, sum, min, max)
- β Export cleaned data to CSV
- β Save all plots as PNG images
- β Generate comprehensive Markdown report
After running the script, the following files will be created in the output/ directory:
- 01_temperature_trends.png - Daily temperature line chart
- 02_monthly_rainfall.png - Monthly rainfall bar chart
- 03_humidity_vs_temperature.png - Humidity-temperature scatter plot
- weather_analysis_combined.png - Combined multi-panel visualization
- cleaned_weather_data.csv - Processed dataset ready for further analysis
- aggregated_data_M.csv - Monthly aggregated statistics
- WEATHER_ANALYSIS_REPORT.md - Comprehensive analysis report with:
- Executive summary
- Dataset overview
- Statistical findings
- Key insights
- Sustainability recommendations
- Methodology
- Conclusions
class WeatherDataVisualizer:
"""Main class for weather data analysis and visualization"""
def load_and_inspect_data() # Task 1
def clean_and_process_data() # Task 2
def compute_statistics() # Task 3
def create_visualizations() # Task 4
def group_and_aggregate() # Task 5
def export_cleaned_data() # Task 6
def generate_report() # Task 6| Method | Purpose | Task |
|---|---|---|
load_and_inspect_data() |
Load CSV and display structure | 1 |
clean_and_process_data() |
Handle missing values, convert dates | 2 |
compute_statistics() |
Calculate NumPy-based statistics | 3 |
create_visualizations() |
Generate Matplotlib plots | 4 |
group_and_aggregate() |
Time-series grouping and aggregation | 5 |
export_cleaned_data() |
Save processed data to CSV | 6 |
generate_report() |
Create Markdown analysis report | 6 |
The provided weather_data.csv contains:
| Date | Temperature | Rainfall | Humidity | Pressure |
|---|---|---|---|---|
| 2024-01-01 | 15.2 | 0.0 | 62 | 1013.5 |
| 2024-01-02 | 16.1 | 0.5 | 65 | 1012.8 |
| ... | ... | ... | ... | ... |
Columns:
- Date: Date in YYYY-MM-DD format
- Temperature: Daily mean temperature in Β°C
- Rainfall: Daily precipitation in mm
- Humidity: Relative humidity as percentage (0-100)
- Pressure: Atmospheric pressure in hPa
======================================================================
WEATHER DATA VISUALIZER - MINI PROJECT
======================================================================
======================================================================
TASK 1: DATA ACQUISITION AND LOADING
======================================================================
β Successfully loaded weather_data.csv
Dataset Shape: 365 rows, 5 columns
--- First 5 Rows ---
Date Temperature Rainfall Humidity Pressure
0 2024-01-01 15.2 0.0 62 1013.5
...
--- Statistical Summary ---
Temperature Rainfall Humidity Pressure
count 365.000000 365.000000 365.000000 365.000000
mean 25.234658 2.104110 52.972603 1008.921918
...
[Continues with Tasks 2-6...]
β ALL TASKS COMPLETED SUCCESSFULLY!
Output files saved to './output/' directory:
β’ weather_analysis_combined.png
β’ 01_temperature_trends.png
β’ 02_monthly_rainfall.png
β’ 03_humidity_vs_temperature.png
β’ cleaned_weather_data.csv
β’ aggregated_data_M.csv
β’ WEATHER_ANALYSIS_REPORT.md
Edit the method calls in main():
visualizer.clean_and_process_data(
date_column='YourDateColumn',
temp_column='YourTempColumn',
rainfall_column='YourRainfallColumn',
humidity_column='YourHumidityColumn'
)Modify the style in WeatherDataVisualizer.__init__():
plt.style.use('seaborn-v0_8-whitegrid') # Other options: 'ggplot', 'bmh', 'dark_background'
plt.rcParams['figure.figsize'] = (16, 10) # Change figure sizeIn group_and_aggregate():
visualizer.group_and_aggregate(group_by='D') # 'D'=daily, 'M'=monthly, 'Y'=yearlyEdit the generate_report() method to add custom sections or analysis.
- Shows daily temperature variations
- Highlights seasonal patterns and anomalies
- Includes filled area for visual emphasis
- Bar chart of accumulated rainfall per month
- Identifies wet and dry seasons
- Useful for water management planning
- Scatter plot with trend line
- Shows inverse correlation in many climates
- Includes polynomial regression line
- 2Γ2 grid combining all major visualizations
- Comprehensive overview in single image
- Professional presentation format
The generated report includes:
- Executive Summary - Overview of analysis
- Dataset Overview - Data dimensions and date range
- Statistical Analysis - Detailed metrics for each variable
- Visualizations - Description of charts
- Key Findings - Important insights
- Data Quality Assessment - Cleaning actions performed
- Sustainability Recommendations - Actionable insights
- Methodology - Tools and techniques used
- Conclusion - Summary and future directions
Solution: Install required packages
pip install pandas numpy matplotlibSolution: Ensure CSV file is in the same directory as main.py or provide full path
Solution: The script saves plots to files; check the output/ directory
Solution: Ensure date column format is standard (YYYY-MM-DD, MM/DD/YYYY, etc.)
Solution: Update column names in main() to match your CSV file
- Pandas - Data manipulation and analysis
- NumPy - Numerical computations and statistics
- Matplotlib - Data visualization and plotting
- Pathlib - Cross-platform file path handling
- Datetime - Date/time handling
This project demonstrates skills applicable to:
- Environmental monitoring systems
- Climate research and reporting
- Energy management optimization
- Agricultural planning
- Urban sustainability initiatives
- Weather forecasting support
- Insurance risk assessment
This project addresses the following evaluation criteria:
| Criteria | Coverage |
|---|---|
| Data Loading | β CSV loading with Pandas |
| Data Cleaning | β Missing value handling, type conversion |
| Statistical Analysis | β NumPy-based computations |
| Visualization | β Multiple chart types |
| Aggregation | β Groupby and resample operations |
| Export/Reporting | β CSV export and Markdown report |
| Code Quality | β Well-documented, modular design |
| Functionality | β All tasks automated |
This project is provided for educational purposes as part of the Programming for Problem Solving using Python course.
Last Updated: December 2025
Version: 1.0
Status: Complete and Ready for Submission