# County-Level Weather Data Integration

## 📊 Data Source

This notebook processes county-level weather data downloaded from **NOAA's Climate at a Glance | County Time Series**:

**Source:** [NOAA NCEI County Time Series](https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/county/time-series/CA-113/pcp/1/0/2000-2025)

## 🎯 What This Notebook Does

1. **Processes 43 California counties** of manually downloaded weather data
2. **Integrates 4 weather parameters** for each county:
   - Average Temperature (°F)
   - Maximum Temperature (°F) 
   - Minimum Temperature (°F)
   - Precipitation (inches)
3. **Cleans and standardizes** the data format
4. **Creates unified CSV files** for each county with all weather features

## 📁 Data Structure

**Raw Data Location:** `../data/raw/weather/county/`
- Each county has its own folder (e.g., `1-ALAMEDA/`)
- Each folder contains 4 CSV files:
  - `data.csv` - Average Temperature
  - `data (1).csv` - Maximum Temperature
  - `data (2).csv` - Minimum Temperature
  - `data (3).csv` - Precipitation

**Processed Data Location:** `../data/processed/weather/`
- Individual CSV files for each county (e.g., `1-ALAMEDA.csv`)
- Unified format: `County`, `Year`, `Month`, `Avg_Temp`, `Max_Temp`, `Min_Temp`, `Precipitation`

## 🔧 Processing Steps

1. **Date Parsing:** Converts 6-digit date format (200001) to Year and Month columns
2. **Data Cleaning:** Removes metadata rows and standardizes column names
3. **Integration:** Combines all 4 weather parameters into single DataFrames
4. **Export:** Saves processed data ready for merging with other datasets

## 📈 Time Coverage

- **Period:** 2000-2025 (26 years)
- **Frequency:** Monthly data
- **Total Records:** 43 counties × 26 years × 12 months = 13,416 records per parameter


In [1]:
import pandas as pd
import numpy as np
import os
from pathlib import Path

def clean_weather_data(file_path: Path, new_value_col_name: str, county: str) -> pd.DataFrame:
    """
    Reads a weather data file, which assumes the first row is a metadata header 
    (consumed by pandas) and the subsequent two rows are also to be dropped.

    The function performs the following steps:
    1. Reads the data (assuming the first row is the header).
    2. Renames the columns to ['Date', 'Value'].
    3. Deletes the first two rows of data (metadata).
    4. Splits 'Date' column into 'Year' and 'Month'.
    5. Renames 'Value' to the specified new column name.
    6. Returns a DataFrame with columns ['Year', 'Month', new_value_col_name].

    Args:
        file_path: The Path object pointing to the input data file (e.g., CSV).
        new_value_col_name: The desired new name for the temperature 'Value' column.

    Returns:
        A cleaned and processed pandas DataFrame.
    """
    try:
        # Read the file assuming the default header=0 (first line is consumed as header).
        df = pd.read_csv(str(file_path))
    except FileNotFoundError:
        # Handle case where the file path is incorrect
        print(f"Error: File not found at path: {file_path}")
        return pd.DataFrame()
    except Exception as e:
        # Handle other read errors
        print(f"Error reading file: {e}")
        return pd.DataFrame()

    # --- Cleaning Steps based on User's Logic ---

    # Rename the columns explicitly (assumes the file structure results in a 2-column DataFrame)
    new_column_names = ['Date', 'Value']
    
    # Check if the number of column labels matches the number of columns
    if len(df.columns) != len(new_column_names):
        print(f"Error: Expected {len(new_column_names)} columns, but found {len(df.columns)}.")
        print("Please check the file structure or the 'new_column_names' list.")
        return pd.DataFrame()

    df.columns = new_column_names

    # Delete the first two rows of data frame (metadata/header text rows)
    # The original script uses df.iloc[2:], which removes rows 0 and 1.
    df = df.iloc[2:].copy()

    # Reset the index, dropping the old index as a column
    df = df.reset_index(drop=True)

    # Create 'Year' column from the first 4 characters and convert to integer
    df['Year'] = df['Date'].str[:4].astype(int)

    # Create 'Month' column from the last 2 characters and convert to integer
    df['Month'] = df['Date'].str[4:].astype(int)

    # Delete the Date Column:
    df = df.drop(columns=['Date'])

    # Rename 'Value' column to the input name
    df = df.rename(columns={'Value': new_value_col_name})
    
    df['County'] = county

    # Define the desired order of columns
    desired_columns = ['County', 'Year', 'Month', new_value_col_name]

    # Select only the desired columns in the specified order
    df_cleaned = df[desired_columns]

    return df_cleaned

In [2]:
# 1-ALAMEDA County Data Cleaning
path = Path('../data/raw/weather/1-ALAMEDA/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'ALAMEDA')

path = Path('../data/raw/weather/1-ALAMEDA/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'ALAMEDA')
path = Path('../data/raw/weather/1-ALAMEDA/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'ALAMEDA')

path = Path('../data/raw/weather/1-ALAMEDA/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'ALAMEDA')

# ALAMEDA County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/1-ALAMEDA.csv', index=False)

df_avg_temp.head()

Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,ALAMEDA,2000,1,49.9,57.0,42.8,5.94
1,ALAMEDA,2000,2,51.0,57.7,44.3,7.24
2,ALAMEDA,2000,3,53.5,63.1,43.8,1.92
3,ALAMEDA,2000,4,58.0,68.9,47.1,1.09
4,ALAMEDA,2000,5,62.1,73.3,51.0,1.0


In [3]:
# 2-ALPINE County Data Cleaning
path = Path('../data/raw/weather/2-ALPINE/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'ALPINE')

path = Path('../data/raw/weather/2-ALPINE/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'ALPINE')
path = Path('../data/raw/weather/2-ALPINE/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'ALPINE')

path = Path('../data/raw/weather/2-ALPINE/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'ALPINE')

# ALAMEDA County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/2-ALPINE.csv', index=False)

df_avg_temp.head()

Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,ALPINE,2000,1,31.7,38.4,25.0,10.06
1,ALPINE,2000,2,32.7,40.9,24.5,9.57
2,ALPINE,2000,3,35.2,46.4,24.0,1.67
3,ALPINE,2000,4,42.1,54.0,30.1,1.78
4,ALPINE,2000,5,47.7,58.8,36.5,2.35


In [4]:
# 3-AMADOR County Data Cleaning
path = Path('../data/raw/weather/3-AMADOR/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'AMADOR')

path = Path('../data/raw/weather/3-AMADOR/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'AMADOR')
path = Path('../data/raw/weather/3-AMADOR/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'AMADOR')

path = Path('../data/raw/weather/3-AMADOR/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'AMADOR')

# ALAMEDA County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/3-AMADOR.csv', index=False)

df_avg_temp.head()

Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,AMADOR,2000,1,44.3,51.7,36.9,11.28
1,AMADOR,2000,2,46.2,54.2,38.1,12.87
2,AMADOR,2000,3,49.5,60.6,38.4,2.04
3,AMADOR,2000,4,56.0,68.7,43.2,2.42
4,AMADOR,2000,5,60.7,73.6,47.9,3.5


In [5]:
# 4-BUTTE County Data Cleaning
path = Path('../data/raw/weather/4-BUTTE/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'BUTTE')

path = Path('../data/raw/weather/4-BUTTE/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'BUTTE')
path = Path('../data/raw/weather/4-BUTTE/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'BUTTE')

path = Path('../data/raw/weather/4-BUTTE/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'BUTTE')

# ALAMEDA County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/4-BUTTE.csv', index=False)

df_avg_temp.head()

Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,BUTTE,2000,1,45.5,53.0,38.0,9.81
1,BUTTE,2000,2,47.6,55.0,40.2,16.67
2,BUTTE,2000,3,51.7,63.0,40.4,4.81
3,BUTTE,2000,4,58.5,71.6,45.4,3.05
4,BUTTE,2000,5,63.4,76.1,50.7,1.76


In [6]:
# 4-BUTTE County Data Cleaning
path = Path('../data/raw/weather/4-BUTTE/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'BUTTE')

path = Path('../data/raw/weather/4-BUTTE/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'BUTTE')
path = Path('../data/raw/weather/4-BUTTE/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'BUTTE')

path = Path('../data/raw/weather/4-BUTTE/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'BUTTE')

# ALAMEDA County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/4-BUTTE.csv', index=False)

df_avg_temp.head()

Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,BUTTE,2000,1,45.5,53.0,38.0,9.81
1,BUTTE,2000,2,47.6,55.0,40.2,16.67
2,BUTTE,2000,3,51.7,63.0,40.4,4.81
3,BUTTE,2000,4,58.5,71.6,45.4,3.05
4,BUTTE,2000,5,63.4,76.1,50.7,1.76


In [7]:
# 6-CONTRA COSTA County Data Cleaning
path = Path('../data/raw/weather/6-CONTRA COSTA/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'CONTRA COSTA')

path = Path('../data/raw/weather/6-CONTRA COSTA/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'CONTRA COSTA')
path = Path('../data/raw/weather/6-CONTRA COSTA/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'CONTRA COSTA')

path = Path('../data/raw/weather/6-CONTRA COSTA/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'CONTRA COSTA')

# CONTRA COSTA County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/6-CONTRA COSTA.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,CONTRA COSTA,2000,1,49.4,56.5,42.3,6.09
1,CONTRA COSTA,2000,2,51.3,58.1,44.4,7.31
2,CONTRA COSTA,2000,3,54.8,64.8,44.6,1.99
3,CONTRA COSTA,2000,4,59.5,71.0,47.9,1.25
4,CONTRA COSTA,2000,5,63.7,75.6,51.8,0.98


In [8]:
# 7-DEL NORTE County Data Cleaning
path = Path('../data/raw/weather/7-DEL NORTE/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'DEL NORTE')

path = Path('../data/raw/weather/7-DEL NORTE/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'DEL NORTE')
path = Path('../data/raw/weather/7-DEL NORTE/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'DEL NORTE')

path = Path('../data/raw/weather/7-DEL NORTE/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'DEL NORTE')

# DEL NORTE County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/7-DEL NORTE.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,DEL NORTE,2000,1,40.8,46.7,35.0,21.32
1,DEL NORTE,2000,2,44.4,50.8,38.0,13.79
2,DEL NORTE,2000,3,43.8,53.9,33.7,4.85
3,DEL NORTE,2000,4,50.2,60.9,39.5,4.34
4,DEL NORTE,2000,5,53.7,64.6,42.8,4.91


In [9]:
# 8-EL DORADO County Data Cleaning
path = Path('../data/raw/weather/8-EL DORADO/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'EL DORADO')

path = Path('../data/raw/weather/8-EL DORADO/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'EL DORADO')
path = Path('../data/raw/weather/8-EL DORADO/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'EL DORADO')

path = Path('../data/raw/weather/8-EL DORADO/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'EL DORADO')

# EL DORADO County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/8-EL DORADO.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,EL DORADO,2000,1,40.9,48.0,33.8,13.36
1,EL DORADO,2000,2,42.5,49.9,35.0,14.59
2,EL DORADO,2000,3,45.5,56.3,34.6,2.3
3,EL DORADO,2000,4,51.9,64.2,39.7,2.53
4,EL DORADO,2000,5,56.8,68.6,44.8,4.12


In [10]:
# 9-FRESNO County Data Cleaning
path = Path('../data/raw/weather/9-FRESNO/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'FRESNO')

path = Path('../data/raw/weather/9-FRESNO/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'FRESNO')
path = Path('../data/raw/weather/9-FRESNO/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'FRESNO')

path = Path('../data/raw/weather/9-FRESNO/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'FRESNO')

# FRESNO County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/9-FRESNO.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,FRESNO,2000,1,43.0,51.9,34.0,5.2
1,FRESNO,2000,2,44.2,53.6,34.7,7.24
2,FRESNO,2000,3,47.0,58.6,35.4,1.67
3,FRESNO,2000,4,54.3,67.5,41.1,1.79
4,FRESNO,2000,5,61.2,74.9,47.5,0.7


In [11]:
# 10-GLENN County Data Cleaning
path = Path('../data/raw/weather/10-GLENN/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'GLENN')

path = Path('../data/raw/weather/10-GLENN/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'GLENN')
path = Path('../data/raw/weather/10-GLENN/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'GLENN')

path = Path('../data/raw/weather/10-GLENN/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'GLENN')

# GLENN County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/10-GLENN.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,GLENN,2000,1,45.9,54.1,37.6,5.77
1,GLENN,2000,2,48.1,54.9,41.2,8.05
2,GLENN,2000,3,51.9,63.4,40.4,2.59
3,GLENN,2000,4,58.5,71.7,45.3,2.53
4,GLENN,2000,5,63.5,76.4,50.6,2.09


In [12]:
# 11-HUMBOLDT County Data Cleaning
path = Path('../data/raw/weather/11-HUMBOLDT/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'HUMBOLDT')

path = Path('../data/raw/weather/11-HUMBOLDT/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'HUMBOLDT')
path = Path('../data/raw/weather/11-HUMBOLDT/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'HUMBOLDT')

path = Path('../data/raw/weather/11-HUMBOLDT/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'HUMBOLDT')

# HUMBOLDT County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/11-HUMBOLDT.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,HUMBOLDT,2000,1,43.7,49.6,37.7,15.67
1,HUMBOLDT,2000,2,46.5,52.9,40.2,13.7
2,HUMBOLDT,2000,3,46.9,56.8,36.9,3.76
3,HUMBOLDT,2000,4,52.7,63.9,41.6,3.8
4,HUMBOLDT,2000,5,55.4,66.1,44.7,2.64


In [13]:
# 12-IMPERIAL County Data Cleaning
path = Path('../data/raw/weather/12-IMPERIAL/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'IMPERIAL')

path = Path('../data/raw/weather/12-IMPERIAL/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'IMPERIAL')
path = Path('../data/raw/weather/12-IMPERIAL/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'IMPERIAL')

path = Path('../data/raw/weather/12-IMPERIAL/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'IMPERIAL')

# IMPERIAL County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/12-IMPERIAL.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,IMPERIAL,2000,1,59.2,72.0,46.4,0.0
1,IMPERIAL,2000,2,60.8,73.4,48.1,0.22
2,IMPERIAL,2000,3,64.2,77.8,50.6,0.21
3,IMPERIAL,2000,4,74.3,89.9,58.8,0.0
4,IMPERIAL,2000,5,82.3,98.0,66.6,0.02


In [14]:
# 13-INYO County Data Cleaning
path = Path('../data/raw/weather/13-INYO/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'INYO')

path = Path('../data/raw/weather/13-INYO/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'INYO')
path = Path('../data/raw/weather/13-INYO/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'INYO')

path = Path('../data/raw/weather/13-INYO/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'INYO')

# INYO County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/13-INYO.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,INYO,2000,1,45.0,55.6,34.5,0.52
1,INYO,2000,2,46.1,56.4,35.8,1.77
2,INYO,2000,3,51.0,63.3,38.7,0.86
3,INYO,2000,4,61.4,75.2,47.6,0.22
4,INYO,2000,5,70.5,84.8,56.3,0.07


In [15]:
# 14-KERN County Data Cleaning
path = Path('../data/raw/weather/14-KERN/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'KERN')

path = Path('../data/raw/weather/14-KERN/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'KERN')
path = Path('../data/raw/weather/14-KERN/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'KERN')

path = Path('../data/raw/weather/14-KERN/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'KERN')

# KERN County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/14-KERN.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,KERN,2000,1,48.8,59.0,38.5,1.19
1,KERN,2000,2,50.4,60.0,40.8,2.94
2,KERN,2000,3,53.1,64.7,41.6,1.51
3,KERN,2000,4,61.1,75.0,47.3,0.74
4,KERN,2000,5,68.5,82.5,54.5,0.17


In [16]:
# 15-LAKE County Data Cleaning
path = Path('../data/raw/weather/15-LAKE/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'LAKE')

path = Path('../data/raw/weather/15-LAKE/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'LAKE')
path = Path('../data/raw/weather/15-LAKE/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'LAKE')

path = Path('../data/raw/weather/15-LAKE/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'LAKE')

# LAKE County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/15-LAKE.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,LAKE,2000,1,44.0,50.9,37.2,9.42
1,LAKE,2000,2,45.5,51.1,40.0,14.08
2,LAKE,2000,3,49.2,60.2,38.3,3.53
3,LAKE,2000,4,54.4,66.8,41.9,3.52
4,LAKE,2000,5,59.2,71.6,46.8,2.97


In [17]:
# 16-LOS ANGELES County Data Cleaning
path = Path('../data/raw/weather/16-LOS ANGELES/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'LOS ANGELES')

path = Path('../data/raw/weather/16-LOS ANGELES/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'LOS ANGELES')
path = Path('../data/raw/weather/16-LOS ANGELES/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'LOS ANGELES')

path = Path('../data/raw/weather/16-LOS ANGELES/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'LOS ANGELES')

# LOS ANGELES County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/16-LOS ANGELES.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,LOS ANGELES,2000,1,52.3,62.4,42.2,1.01
1,LOS ANGELES,2000,2,50.6,59.7,41.5,5.79
2,LOS ANGELES,2000,3,53.8,65.1,42.6,2.12
3,LOS ANGELES,2000,4,60.6,73.2,48.0,2.03
4,LOS ANGELES,2000,5,66.0,78.4,53.6,0.05


In [18]:
# 17-MADERA County Data Cleaning
path = Path('../data/raw/weather/17-MADERA/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'MADERA')

path = Path('../data/raw/weather/17-MADERA/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'MADERA')
path = Path('../data/raw/weather/17-MADERA/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'MADERA')

path = Path('../data/raw/weather/17-MADERA/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'MADERA')

# MADERA County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/17-MADERA.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,MADERA,2000,1,43.4,52.3,34.6,6.98
1,MADERA,2000,2,45.2,54.6,35.7,9.17
2,MADERA,2000,3,47.9,59.7,36.1,1.81
3,MADERA,2000,4,54.8,68.6,41.0,2.02
4,MADERA,2000,5,61.4,75.5,47.3,0.78


In [19]:
# 18-MARIN County Data Cleaning
path = Path('../data/raw/weather/18-MARIN/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'MARIN')

path = Path('../data/raw/weather/18-MARIN/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'MARIN')
path = Path('../data/raw/weather/18-MARIN/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'MARIN')

path = Path('../data/raw/weather/18-MARIN/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'MARIN')

# MARIN County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/18-MARIN.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,MARIN,2000,1,50.0,55.3,44.6,8.33
1,MARIN,2000,2,51.0,55.6,46.4,13.28
2,MARIN,2000,3,52.1,60.0,44.2,2.9
3,MARIN,2000,4,55.1,63.4,46.8,2.4
4,MARIN,2000,5,57.2,65.4,48.9,1.7


In [20]:
# 19-MARIPOSA County Data Cleaning
path = Path('../data/raw/weather/19-MARIPOSA/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'MARIPOSA')

path = Path('../data/raw/weather/19-MARIPOSA/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'MARIPOSA')
path = Path('../data/raw/weather/19-MARIPOSA/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'MARIPOSA')

path = Path('../data/raw/weather/19-MARIPOSA/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'MARIPOSA')

# MARIPOSA County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/19-MARIPOSA.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,MARIPOSA,2000,1,42.2,50.6,33.8,10.33
1,MARIPOSA,2000,2,43.0,52.5,33.4,12.19
2,MARIPOSA,2000,3,46.7,58.4,35.0,2.42
3,MARIPOSA,2000,4,53.2,66.4,40.0,3.04
4,MARIPOSA,2000,5,59.6,73.2,46.0,1.7


In [21]:
# 20-MENDOCINO County Data Cleaning
path = Path('../data/raw/weather/20-MENDOCINO/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'MENDOCINO')

path = Path('../data/raw/weather/20-MENDOCINO/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'MENDOCINO')
path = Path('../data/raw/weather/20-MENDOCINO/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'MENDOCINO')

path = Path('../data/raw/weather/20-MENDOCINO/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'MENDOCINO')

# MENDOCINO County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/20-MENDOCINO.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,MENDOCINO,2000,1,45.0,50.9,39.1,11.62
1,MENDOCINO,2000,2,47.3,53.2,41.4,15.23
2,MENDOCINO,2000,3,48.6,59.5,37.6,3.1
3,MENDOCINO,2000,4,53.2,64.8,41.5,3.25
4,MENDOCINO,2000,5,56.5,67.6,45.3,2.25


In [22]:
# 21-NEVADA County Data Cleaning
path = Path('../data/raw/weather/21-NEVADA/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'NEVADA')

path = Path('../data/raw/weather/21-NEVADA/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'NEVADA')
path = Path('../data/raw/weather/21-NEVADA/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'NEVADA')

path = Path('../data/raw/weather/21-NEVADA/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'NEVADA')

# NEVADA County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/21-NEVADA.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,NEVADA,2000,1,38.8,45.7,31.8,14.31
1,NEVADA,2000,2,40.2,47.2,33.2,18.25
2,NEVADA,2000,3,43.9,54.8,33.0,3.54
3,NEVADA,2000,4,50.2,62.6,37.9,2.83
4,NEVADA,2000,5,55.5,67.4,43.6,3.11


In [23]:
# 22-PLACER County Data Cleaning
path = Path('../data/raw/weather/22-PLACER/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'PLACER')

path = Path('../data/raw/weather/22-PLACER/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'PLACER')
path = Path('../data/raw/weather/22-PLACER/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'PLACER')

path = Path('../data/raw/weather/22-PLACER/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'PLACER')

# PLACER County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/22-PLACER.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,PLACER,2000,1,40.1,47.4,32.9,13.47
1,PLACER,2000,2,42.1,49.4,34.8,16.02
2,PLACER,2000,3,45.6,56.5,34.6,2.67
3,PLACER,2000,4,52.0,64.4,39.5,2.55
4,PLACER,2000,5,57.0,69.1,44.9,3.4


In [24]:
# 23-PLUMAS County Data Cleaning
path = Path('../data/raw/weather/23-PLUMAS/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'PLUMAS')

path = Path('../data/raw/weather/23-PLUMAS/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'PLUMAS')
path = Path('../data/raw/weather/23-PLUMAS/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'PLUMAS')

path = Path('../data/raw/weather/23-PLUMAS/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'PLUMAS')

# PLUMAS County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/23-PLUMAS.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,PLUMAS,2000,1,34.3,41.9,26.6,11.8
1,PLUMAS,2000,2,35.8,42.3,29.3,13.26
2,PLUMAS,2000,3,39.5,50.9,28.2,2.66
3,PLUMAS,2000,4,46.1,59.2,32.8,2.91
4,PLUMAS,2000,5,51.1,64.3,37.9,1.84


In [25]:
# 24-RIVERSIDE County Data Cleaning
path = Path('../data/raw/weather/24-RIVERSIDE/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'RIVERSIDE')

path = Path('../data/raw/weather/24-RIVERSIDE/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'RIVERSIDE')
path = Path('../data/raw/weather/24-RIVERSIDE/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'RIVERSIDE')

path = Path('../data/raw/weather/24-RIVERSIDE/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'RIVERSIDE')

# RIVERSIDE County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/24-RIVERSIDE.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,RIVERSIDE,2000,1,55.3,67.2,43.3,0.2
1,RIVERSIDE,2000,2,55.3,66.6,44.1,1.81
2,RIVERSIDE,2000,3,58.8,71.1,46.5,0.86
3,RIVERSIDE,2000,4,68.5,83.0,54.0,0.37
4,RIVERSIDE,2000,5,76.1,90.9,61.2,0.02


In [26]:
# 25-SACRAMENTO County Data Cleaning
path = Path('../data/raw/weather/25-SACRAMENTO/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'SACRAMENTO')

path = Path('../data/raw/weather/25-SACRAMENTO/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'SACRAMENTO')
path = Path('../data/raw/weather/25-SACRAMENTO/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'SACRAMENTO')

path = Path('../data/raw/weather/25-SACRAMENTO/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'SACRAMENTO')

# SACRAMENTO County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/25-SACRAMENTO.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,SACRAMENTO,2000,1,48.9,57.1,40.8,6.23
1,SACRAMENTO,2000,2,52.2,59.8,44.5,8.62
2,SACRAMENTO,2000,3,55.7,67.2,44.3,1.83
3,SACRAMENTO,2000,4,61.9,75.4,48.3,1.65
4,SACRAMENTO,2000,5,66.8,80.3,53.4,1.48


In [27]:
# 26-SAN BERNARDINO County Data Cleaning
path = Path('../data/raw/weather/26-SAN BERNARDINO/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'SAN BERNARDINO')

path = Path('../data/raw/weather/26-SAN BERNARDINO/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'SAN BERNARDINO')
path = Path('../data/raw/weather/26-SAN BERNARDINO/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'SAN BERNARDINO')

path = Path('../data/raw/weather/26-SAN BERNARDINO/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'SAN BERNARDINO')

# SAN BERNARDINO County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/26-SAN BERNARDINO.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,SAN BERNARDINO,2000,1,51.8,62.6,41.1,0.15
1,SAN BERNARDINO,2000,2,52.6,63.2,42.1,1.68
2,SAN BERNARDINO,2000,3,56.9,68.8,45.1,0.63
3,SAN BERNARDINO,2000,4,67.6,81.5,53.6,0.28
4,SAN BERNARDINO,2000,5,76.1,90.2,62.0,0.01


In [28]:
# 27-SAN DIEGO County Data Cleaning
path = Path('../data/raw/weather/27-SAN DIEGO/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'SAN DIEGO')

path = Path('../data/raw/weather/27-SAN DIEGO/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'SAN DIEGO')
path = Path('../data/raw/weather/27-SAN DIEGO/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'SAN DIEGO')

path = Path('../data/raw/weather/27-SAN DIEGO/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'SAN DIEGO')

# SAN DIEGO County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/27-SAN DIEGO.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,SAN DIEGO,2000,1,54.3,66.7,41.8,0.56
1,SAN DIEGO,2000,2,53.6,64.1,43.0,3.87
2,SAN DIEGO,2000,3,54.4,66.7,42.0,1.34
3,SAN DIEGO,2000,4,61.4,75.5,47.3,0.78
4,SAN DIEGO,2000,5,67.0,80.9,53.2,0.06


In [29]:
# 28-SAN JOAQUIN County Data Cleaning
path = Path('../data/raw/weather/28-SAN JOAQUIN/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'SAN JOAQUIN')

path = Path('../data/raw/weather/28-SAN JOAQUIN/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'SAN JOAQUIN')
path = Path('../data/raw/weather/28-SAN JOAQUIN/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'SAN JOAQUIN')

path = Path('../data/raw/weather/28-SAN JOAQUIN/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'SAN JOAQUIN')

# SAN JOAQUIN County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/28-SAN JOAQUIN.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,SAN JOAQUIN,2000,1,49.8,58.4,41.3,4.57
1,SAN JOAQUIN,2000,2,52.5,60.6,44.5,6.26
2,SAN JOAQUIN,2000,3,55.5,66.9,44.1,1.02
3,SAN JOAQUIN,2000,4,61.4,75.1,47.8,1.21
4,SAN JOAQUIN,2000,5,66.7,80.4,53.0,0.81


In [30]:
# 29-SAN LUIS OBISPO County Data Cleaning
path = Path('../data/raw/weather/29-SAN LUIS OBISPO/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'SAN LUIS OBISPO')

path = Path('../data/raw/weather/29-SAN LUIS OBISPO/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'SAN LUIS OBISPO')
path = Path('../data/raw/weather/29-SAN LUIS OBISPO/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'SAN LUIS OBISPO')

path = Path('../data/raw/weather/29-SAN LUIS OBISPO/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'SAN LUIS OBISPO')

# SAN LUIS OBISPO County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/29-SAN LUIS OBISPO.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,SAN LUIS OBISPO,2000,1,50.1,60.3,39.9,3.35
1,SAN LUIS OBISPO,2000,2,50.7,59.7,41.7,8.29
2,SAN LUIS OBISPO,2000,3,51.8,63.5,40.0,1.61
3,SAN LUIS OBISPO,2000,4,57.7,71.3,44.1,2.56
4,SAN LUIS OBISPO,2000,5,63.2,78.1,48.3,0.13


In [31]:
# 30-SAN MATEO County Data Cleaning
path = Path('../data/raw/weather/30-SAN MATEO/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'SAN MATEO')

path = Path('../data/raw/weather/30-SAN MATEO/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'SAN MATEO')
path = Path('../data/raw/weather/30-SAN MATEO/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'SAN MATEO')

path = Path('../data/raw/weather/30-SAN MATEO/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'SAN MATEO')

# SAN MATEO County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/30-SAN MATEO.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,SAN MATEO,2000,1,50.5,56.9,44.1,8.91
1,SAN MATEO,2000,2,51.6,57.5,45.7,12.07
2,SAN MATEO,2000,3,52.4,61.5,43.3,2.61
3,SAN MATEO,2000,4,55.7,65.3,46.1,2.21
4,SAN MATEO,2000,5,58.3,67.8,48.8,1.4


In [32]:
# 31-SANTA CLARA County Data Cleaning
path = Path('../data/raw/weather/31-SANTA CLARA/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'SANTA CLARA')

path = Path('../data/raw/weather/31-SANTA CLARA/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'SANTA CLARA')
path = Path('../data/raw/weather/31-SANTA CLARA/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'SANTA CLARA')

path = Path('../data/raw/weather/31-SANTA CLARA/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'SANTA CLARA')

# SANTA CLARA County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/31-SANTA CLARA.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,SANTA CLARA,2000,1,48.8,56.2,41.5,7.5
1,SANTA CLARA,2000,2,49.8,57.2,42.4,8.64
2,SANTA CLARA,2000,3,51.6,62.0,41.2,2.38
3,SANTA CLARA,2000,4,56.8,68.3,45.2,1.24
4,SANTA CLARA,2000,5,61.2,73.6,48.9,0.81


In [33]:
# 32-SANTA CRUZ County Data Cleaning
path = Path('../data/raw/weather/32-SANTA CRUZ/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'SANTA CRUZ')

path = Path('../data/raw/weather/32-SANTA CRUZ/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'SANTA CRUZ')
path = Path('../data/raw/weather/32-SANTA CRUZ/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'SANTA CRUZ')

path = Path('../data/raw/weather/32-SANTA CRUZ/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'SANTA CRUZ')

# SANTA CRUZ County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/32-SANTA CRUZ.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,SANTA CRUZ,2000,1,49.9,57.2,42.6,12.26
1,SANTA CRUZ,2000,2,51.4,58.3,44.5,13.77
2,SANTA CRUZ,2000,3,52.2,62.8,41.6,2.83
3,SANTA CRUZ,2000,4,55.8,67.0,44.6,2.07
4,SANTA CRUZ,2000,5,59.5,71.3,47.7,1.13


In [34]:
# 33-SHASTA County Data Cleaning
path = Path('../data/raw/weather/33-SHASTA/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'SHASTA')

path = Path('../data/raw/weather/33-SHASTA/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'SHASTA')
path = Path('../data/raw/weather/33-SHASTA/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'SHASTA')

path = Path('../data/raw/weather/33-SHASTA/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'SHASTA')

# SHASTA County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/33-SHASTA.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,SHASTA,2000,1,40.0,47.1,32.8,10.91
1,SHASTA,2000,2,42.3,49.3,35.3,13.0
2,SHASTA,2000,3,46.6,57.9,35.3,4.1
3,SHASTA,2000,4,53.9,66.7,41.1,4.0
4,SHASTA,2000,5,58.4,71.6,45.1,1.45


In [35]:
# 34-SISKIYOU County Data Cleaning
path = Path('../data/raw/weather/34-SISKIYOU/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'SISKIYOU')

path = Path('../data/raw/weather/34-SISKIYOU/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'SISKIYOU')
path = Path('../data/raw/weather/34-SISKIYOU/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'SISKIYOU')

path = Path('../data/raw/weather/34-SISKIYOU/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'SISKIYOU')

# SISKIYOU County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/34-SISKIYOU.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,SISKIYOU,2000,1,34.7,41.0,28.3,10.72
1,SISKIYOU,2000,2,37.8,44.6,30.9,8.2
2,SISKIYOU,2000,3,39.8,50.8,28.9,2.69
3,SISKIYOU,2000,4,47.5,60.0,35.0,4.25
4,SISKIYOU,2000,5,51.0,63.6,38.4,1.43


In [36]:
# 35-SOLANO County Data Cleaning
path = Path('../data/raw/weather/35-SOLANO/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'SOLANO')

path = Path('../data/raw/weather/35-SOLANO/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'SOLANO')
path = Path('../data/raw/weather/35-SOLANO/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'SOLANO')

path = Path('../data/raw/weather/35-SOLANO/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'SOLANO')

# SOLANO County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/35-SOLANO.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,SOLANO,2000,1,49.5,57.5,41.5,5.57
1,SOLANO,2000,2,51.5,58.9,44.1,8.29
2,SOLANO,2000,3,55.4,66.5,44.3,2.12
3,SOLANO,2000,4,60.9,73.7,48.1,1.34
4,SOLANO,2000,5,65.8,78.6,52.9,1.0


In [37]:
# 36-SONOMA County Data Cleaning
path = Path('../data/raw/weather/36-SONOMA/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'SONOMA')

path = Path('../data/raw/weather/36-SONOMA/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'SONOMA')
path = Path('../data/raw/weather/36-SONOMA/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'SONOMA')

path = Path('../data/raw/weather/36-SONOMA/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'SONOMA')

# SONOMA County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/36-SONOMA.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,SONOMA,2000,1,49.0,55.9,42.0,8.33
1,SONOMA,2000,2,50.3,56.5,44.2,15.29
2,SONOMA,2000,3,52.1,62.7,41.5,3.28
3,SONOMA,2000,4,55.8,67.0,44.5,3.1
4,SONOMA,2000,5,59.3,71.0,47.7,1.86


In [38]:
# 37-STANISLAUS County Data Cleaning
path = Path('../data/raw/weather/37-STANISLAUS/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'STANISLAUS')

path = Path('../data/raw/weather/37-STANISLAUS/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'STANISLAUS')
path = Path('../data/raw/weather/37-STANISLAUS/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'STANISLAUS')

path = Path('../data/raw/weather/37-STANISLAUS/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'STANISLAUS')

# STANISLAUS County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/37-STANISLAUS.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,STANISLAUS,2000,1,49.3,57.8,40.9,4.3
1,STANISLAUS,2000,2,51.7,59.8,43.5,5.87
2,STANISLAUS,2000,3,54.1,65.2,43.1,1.06
3,STANISLAUS,2000,4,60.7,73.8,47.6,1.48
4,STANISLAUS,2000,5,66.2,79.8,52.6,0.83


In [39]:
# 38-TEHAMA County Data Cleaning
path = Path('../data/raw/weather/38-TEHAMA/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'TEHAMA')

path = Path('../data/raw/weather/38-TEHAMA/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'TEHAMA')
path = Path('../data/raw/weather/38-TEHAMA/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'TEHAMA')

path = Path('../data/raw/weather/38-TEHAMA/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'TEHAMA')

# TEHAMA County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/38-TEHAMA.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,TEHAMA,2000,1,43.6,50.9,36.3,8.25
1,TEHAMA,2000,2,45.6,52.4,38.9,11.37
2,TEHAMA,2000,3,50.0,60.8,39.3,3.38
3,TEHAMA,2000,4,56.6,68.7,44.4,3.36
4,TEHAMA,2000,5,61.2,73.5,48.8,1.75


In [40]:
# 39-TRINITY County Data Cleaning
path = Path('../data/raw/weather/39-TRINITY/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'TRINITY')

path = Path('../data/raw/weather/39-TRINITY/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'TRINITY')
path = Path('../data/raw/weather/39-TRINITY/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'TRINITY')

path = Path('../data/raw/weather/39-TRINITY/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'TRINITY')

# TRINITY County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/39-TRINITY.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,TRINITY,2000,1,39.5,45.6,33.5,14.47
1,TRINITY,2000,2,41.2,47.5,35.0,16.34
2,TRINITY,2000,3,43.8,55.0,32.6,4.22
3,TRINITY,2000,4,50.5,63.5,37.4,4.98
4,TRINITY,2000,5,54.1,66.7,41.4,2.69


In [41]:
# 40-TULARE County Data Cleaning
path = Path('../data/raw/weather/40-TULARE/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'TULARE')

path = Path('../data/raw/weather/40-TULARE/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'TULARE')
path = Path('../data/raw/weather/40-TULARE/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'TULARE')

path = Path('../data/raw/weather/40-TULARE/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'TULARE')

# TULARE County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/40-TULARE.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,TULARE,2000,1,41.8,51.3,32.2,4.27
1,TULARE,2000,2,43.0,52.7,33.4,8.49
2,TULARE,2000,3,45.5,56.7,34.3,3.1
3,TULARE,2000,4,53.1,66.0,40.1,1.7
4,TULARE,2000,5,60.0,73.2,46.7,1.02


In [42]:
# 41-TUOLUMNE County Data Cleaning
path = Path('../data/raw/weather/41-TUOLUMNE/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'TUOLUMNE')

path = Path('../data/raw/weather/41-TUOLUMNE/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'TUOLUMNE')
path = Path('../data/raw/weather/41-TUOLUMNE/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'TUOLUMNE')

path = Path('../data/raw/weather/41-TUOLUMNE/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'TUOLUMNE')

# TUOLUMNE County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/41-TUOLUMNE.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,TUOLUMNE,2000,1,38.4,46.4,30.3,12.63
1,TUOLUMNE,2000,2,38.6,47.5,29.8,13.68
2,TUOLUMNE,2000,3,41.4,53.0,29.8,2.69
3,TUOLUMNE,2000,4,48.0,60.5,35.6,3.35
4,TUOLUMNE,2000,5,53.7,66.2,41.2,2.73


In [43]:
# 42-VENTURA County Data Cleaning
path = Path('../data/raw/weather/42-VENTURA/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'VENTURA')

path = Path('../data/raw/weather/42-VENTURA/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'VENTURA')
path = Path('../data/raw/weather/42-VENTURA/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'VENTURA')

path = Path('../data/raw/weather/42-VENTURA/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'VENTURA')

# VENTURA County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/42-VENTURA.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,VENTURA,2000,1,49.9,60.5,39.4,2.05
1,VENTURA,2000,2,48.2,56.9,39.6,9.0
2,VENTURA,2000,3,50.0,61.6,38.4,2.78
3,VENTURA,2000,4,55.3,68.2,42.4,3.4
4,VENTURA,2000,5,61.1,74.0,48.2,0.02


In [44]:
# 43-YOLO County Data Cleaning
path = Path('../data/raw/weather/43-YOLO/data.csv')
df_avg_temp = clean_weather_data(path, 'Avg_Temp', 'YOLO')

path = Path('../data/raw/weather/43-YOLO/data (1).csv')
df_max_temp = clean_weather_data(path, 'Max_Temp', 'YOLO')
path = Path('../data/raw/weather/43-YOLO/data (2).csv')
df_min_temp = clean_weather_data(path, 'Min_Temp', 'YOLO')

path = Path('../data/raw/weather/43-YOLO/data (3).csv')
df_precipitation = clean_weather_data(path, 'Precipitation', 'YOLO')

# YOLO County Data Integration
df_avg_temp['Max_Temp'] = df_max_temp['Max_Temp']
df_avg_temp['Min_Temp'] = df_min_temp['Min_Temp']
df_avg_temp['Precipitation'] = df_precipitation['Precipitation']
df_avg_temp.to_csv('../data/processed/weather/43-YOLO.csv', index=False)

df_avg_temp.head()


Unnamed: 0,County,Year,Month,Avg_Temp,Max_Temp,Min_Temp,Precipitation
0,YOLO,2000,1,48.7,57.4,40.1,5.29
1,YOLO,2000,2,50.8,58.4,43.3,8.25
2,YOLO,2000,3,55.4,67.0,43.7,2.12
3,YOLO,2000,4,61.5,75.2,47.8,1.59
4,YOLO,2000,5,67.1,80.8,53.4,1.15


In [45]:
# Merge all 43 county weather CSV files into one unified dataset
print('='*60)
print('MERGING ALL COUNTY WEATHER DATA')
print('='*60)

import glob

# Get all county CSV files
weather_files = glob.glob('../data/processed/weather/*.csv')
print(f'Found {len(weather_files)} county weather files')

# Read and concatenate all files
all_weather_data = []
for file in sorted(weather_files):
    county_name = file.split('/')[-1].replace('.csv', '')
    print(f'Loading {county_name}...')
    
    df = pd.read_csv(file)
    all_weather_data.append(df)

# Concatenate all dataframes
print(f'\nConcatenating {len(all_weather_data)} county datasets...')
unified_weather = pd.concat(all_weather_data, ignore_index=True)

print(f'✅ Unified weather dataset created!')
print(f'   Total records: {len(unified_weather):,}')
print(f'   Counties: {unified_weather["County"].nunique()}')
print(f'   Date range: {unified_weather["Year"].min()}-{unified_weather["Year"].max()}')
print(f'   Columns: {list(unified_weather.columns)}')

# Save unified dataset
output_file = '../data/processed/weather/unified_county_weather_2000_2025.csv'
unified_weather.to_csv(output_file, index=False)

print(f'\n✅ Unified weather data saved to: {output_file}')
print(f'   File size: {Path(output_file).stat().st_size / (1024*1024):.2f} MB')

# Show sample of unified data
print(f'\nSample of unified weather data:')
print(unified_weather.head(10))

print(f'\n🎯 Future weather data loading: Use {output_file}')


MERGING ALL COUNTY WEATHER DATA
Found 45 county weather files
Loading 1-ALAMEDA...
Loading 10-GLENN...
Loading 11-HUMBOLDT...
Loading 12-IMPERIAL...
Loading 13-INYO...
Loading 14-KERN...
Loading 15-LAKE...
Loading 16-LOS ANGELES...
Loading 17-MADERA...
Loading 18-MARIN...
Loading 19-MARIPOSA...
Loading 2-ALPINE...
Loading 20-MENDOCINO...
Loading 21-NEVADA...
Loading 22-PLACER...
Loading 23-PLUMAS...
Loading 24-RIVERSIDE...
Loading 25-SACRAMENTO...
Loading 26-SAN BERNARDINO...
Loading 27-SAN DIEGO...
Loading 28-SAN JOAQUIN...
Loading 29-SAN LUIS OBISPO...
Loading 3-AMADOR...
Loading 30-SAN MATEO...
Loading 31-SANTA CLARA...
Loading 32-SANTA CRUZ...
Loading 33-SHASTA...
Loading 34-SISKIYOU...
Loading 35-SOLANO...
Loading 36-SONOMA...
Loading 37-STANISLAUS...
Loading 38-TEHAMA...
Loading 39-TRINITY...
Loading 4-BUTTE...
Loading 40-TULARE...
Loading 41-TUOLUMNE...
Loading 42-VENTURA...
Loading 43-YOLO...
Loading 5-COLUSA...
Loading 6-CONTRA COSTA...
Loading 7-DEL NORTE...
Loading 8-EL DORA

In [None]:
# Check for zero values in weather data
weather_columns = ['Avg_Temp', 'Max_Temp', 'Min_Temp']
zero_counts = {}

print('Checking for zero values in weather data...')
for col in weather_columns:
    zero_count = (unified_weather[col] == 0).sum()
    zero_counts[col] = zero_count
    print(f'   {col}: {zero_count:,} zero values')

In [47]:
# # Fill missing weather data (zero values) using historical data
# print('='*60)
# print('FILLING MISSING WEATHER DATA (ZERO VALUES)')
# print('='*60)

# # Check for zero values in weather data
# weather_columns = ['Avg_Temp', 'Max_Temp', 'Min_Temp']
# zero_counts = {}

# print('Checking for zero values in weather data...')
# for col in weather_columns:
#     zero_count = (unified_weather[col] == 0).sum()
#     zero_counts[col] = zero_count
#     print(f'   {col}: {zero_count:,} zero values')

# total_zeros = sum(zero_counts.values())
# print(f'\nTotal zero values to fill: {total_zeros:,}')

# if total_zeros > 0:
#     print('\nFilling zero values using historical data...')
    
#     # Create a copy to work with
#     weather_filled = unified_weather.copy()
    
#     # Sort by County, Year, Month for proper filling
#     weather_filled = weather_filled.sort_values(['County', 'Year', 'Month'])
    
#     filled_count = 0
    
#     for col in weather_columns:
#         print(f'\nProcessing {col}...')
        
#         # Find rows with zero values
#         zero_mask = weather_filled[col] == 0
#         zero_indices = weather_filled[zero_mask].index
        
#         for idx in zero_indices:
#             row = weather_filled.loc[idx]
#             county = row['County']
#             month = row['Month']
#             year = row['Year']
            
#             # Look back through previous years for the same county and month
#             for years_back in range(1, 6):  # Look back up to 5 years
#                 lookup_year = year - years_back
#                 if lookup_year < 2000:  # Don't go before our data range
#                     break
                
#                 # Find matching record
#                 match_mask = (
#                     (weather_filled['County'] == county) & 
#                     (weather_filled['Year'] == lookup_year) & 
#                     (weather_filled['Month'] == month)
#                 )
                
#                 if match_mask.any():
#                     match_value = weather_filled.loc[match_mask, col].iloc[0]
#                     if match_value != 0:  # Found a non-zero value
#                         weather_filled.loc[idx, col] = match_value
#                         filled_count += 1
#                         break
    
#     print(f'\n✅ Filled {filled_count:,} zero values with historical data')
    
#     # Check remaining zeros
#     print('\nRemaining zero values after filling:')
#     for col in weather_columns:
#         remaining_zeros = (weather_filled[col] == 0).sum()
#         print(f'   {col}: {remaining_zeros:,} zero values')
    
#     # Update the unified dataset
#     unified_weather = weather_filled
    
#     # Save the filled dataset
#     output_file_filled = '../data/processed/weather/unified_county_weather_2000_2025_filled.csv'
#     unified_weather.to_csv(output_file_filled, index=False)
    
#     print(f'\n✅ Filled weather data saved to: {output_file_filled}')
#     print(f'   File size: {Path(output_file_filled).stat().st_size / (1024*1024):.2f} MB')
    
# else:
#     print('\n✅ No zero values found - weather data is complete!')


FILLING MISSING WEATHER DATA (ZERO VALUES)
Checking for zero values in weather data...
   Avg_Temp: 0 zero values
   Max_Temp: 0 zero values
   Min_Temp: 0 zero values

Total zero values to fill: 0

✅ No zero values found - weather data is complete!
