# Feature Engineering for Climate Change Data

In this notebook, we will perform feature engineering on the climate change dataset. This includes creating new features, transforming existing ones, and preparing the dataset for modeling.

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer

# Load the dataset
data_path = '../data/raw/Global_Climate_Change_Data_2020_2025.csv'
df = pd.read_csv(data_path)
df.head()

In [2]:
# Check for missing values
missing_values = df.isnull().sum()
missing_values[missing_values > 0]

In [3]:
# Fill missing values
df.fillna(method='ffill', inplace=True)

# Create new features
df['Year'] = pd.to_datetime(df['Date']).dt.year
df['Month'] = pd.to_datetime(df['Date']).dt.month
df['Day'] = pd.to_datetime(df['Date']).dt.day

# Drop original date column
df.drop(columns=['Date'], inplace=True)
df.head()

In [4]:
# Define categorical and numerical features
categorical_features = ['SomeCategoricalColumn']  # Replace with actual categorical columns
numerical_features = df.select_dtypes(include=['int64', 'float64']).columns.tolist()

# Create preprocessing pipelines
numerical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_features),
        ('cat', categorical_transformer, categorical_features)
    ]
)

In [5]:
# Apply the transformations
X = df.drop(columns=['TargetColumn'])  # Replace with actual target column
y = df['TargetColumn']  # Replace with actual target column

X_transformed = preprocessor.fit_transform(X)
X_transformed.shape

## Conclusion

In this notebook, we performed feature engineering on the climate change dataset. We handled missing values, created new features, and prepared the dataset for modeling. The next step will be to train machine learning models using the processed data.