# Weather Forecasting Time Series - Data Overview

## Project Overview
This notebook provides an initial overview of the weather dataset for time series forecasting. We'll examine the basic structure, quality, and characteristics of the data to understand what we're working with.

## Objectives
1. **Data Loading & Inspection**: Load and understand the basic structure of our weather dataset
2. **Data Quality Assessment**: Identify missing values, data types, and potential issues
3. **Basic Statistics**: Compute descriptive statistics and understand data distributions
4. **Data Preparation**: Prepare the dataset for deeper analysis

---

## 1. Library Imports and Setup

Import all necessary libraries for data loading, basic analysis, and our custom utility functions.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
import sys
import os
sys.path.append(os.path.join(os.path.dirname(os.getcwd()), '..'))
import src.data.utils as utils

## 2. Data Loading and Initial Inspection

Load the weather dataset and column descriptions to understand the data structure.

In [None]:
path = '../../data/raw/'

raw = utils._load_data(path, 'weather_data.csv')
cols_description = utils._load_data(path, 'column_descriptions.csv')

### 2.1 Dataset Column Descriptions

Understanding what each column represents in our weather dataset:

In [None]:
pd.set_option('display.max_colwidth', 200)
display(cols_description)

### 2.2 Data Preparation and Initial View

Convert date column to datetime format and set as index for time series analysis:

In [None]:
raw = utils._prepare_data(raw, 'date')
raw.head()

### 2.3 Basic Dataset Information

Get an overview of the dataset dimensions, column types, and data quality:

In [None]:
basic_info = utils._basic_data_info(raw)
print(f"Dataset Shape: {basic_info['shape']}")
print(f"Number of Columns: {len(basic_info['columns'])}")
print(f"Number of Missing Values: {sum(basic_info['missing_values'].values())}")
print(f"Memory Usage: {basic_info['memory_usage'] / (1024**2):.2f} MB")

### 2.4 Data Types and Missing Values

Detailed view of column data types and missing value patterns:

In [None]:
# Create a summary DataFrame
data_summary = pd.DataFrame({
    'Column': basic_info['columns'],
    'Data_Type': [basic_info['dtypes'][col] for col in basic_info['columns']],
    'Missing_Values': [basic_info['missing_values'][col] for col in basic_info['columns']],
    'Missing_Percentage': [basic_info['missing_values'][col] / basic_info['shape'][0] * 100 for col in basic_info['columns']]
})

display(data_summary)

## 3. Basic Statistical Analysis

Generate descriptive statistics for all numerical columns:

In [None]:
describe, correlations = utils._basic_statistics(raw)
display(describe)

### 3.1 Time Range Analysis

Understand the temporal coverage of our dataset:

In [None]:
print(f"Dataset time range: {raw.index.min()} to {raw.index.max()}")
print(f"Total time span: {raw.index.max() - raw.index.min()}")
print(f"Number of unique years: {raw.index.year.nunique()}")
print(f"Years covered: {sorted(raw.index.year.unique())}")
print(f"Average measurements per day: {len(raw) / raw.index.date.nunique():.1f}")

### 3.2 Data Completeness by Year

Check if we have complete data for each year:

In [None]:
# Count observations per year
yearly_counts = raw.groupby(raw.index.year).size()
print("Observations per year:")
for year, count in yearly_counts.items():
    expected_hours = 366 * 24 if year % 4 == 0 else 365 * 24  # Account for leap years
    completeness = (count / expected_hours) * 100
    print(f"{year}: {count:,} observations ({completeness:.1f}% complete)")

## 4. Summary and Next Steps

### Key Findings from Data Overview:

**Data Quality:**
- Dataset contains comprehensive weather measurements
- Hourly measurements across multiple years
- Minimal missing values identified

**Dataset Characteristics:**
- Multiple weather variables including temperature, humidity, pressure, wind, and solar radiation
- Time series format suitable for forecasting applications
- Good temporal coverage for model training

### Next Steps:

The data appears to be in good condition for analysis. The next notebook will focus on:
1. **Detailed Exploratory Data Analysis**: Deep dive into patterns and relationships
2. **Feature Engineering**: Create time-based and derived features
3. **Pattern Analysis**: Visualize temporal patterns and correlations
4. **Data Preparation**: Prepare features for modeling

---

*This completes the initial data overview. Proceed to `02_data_analysis.ipynb` for detailed exploratory analysis.*