# Data Wrangling Operations Notebook
This notebook performs the full wrangling pipeline described in **Section 5** of the assignment. Created on 2025-07-26.

In [2]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load pre-downloaded ECB exchange rate dataset
ecb_df = pd.read_csv('ecb_exchange_rates (1).csv')

# Inspect initial data
print("Initial data sample:")
print(ecb_df.head())


Initial data sample:
  Currency      Rate        Date
0      USD    1.0932  2025-07-25
1      GBP    0.8456  2025-07-25
2      JPY  141.2100  2025-07-25
3      CHF    0.9678  2025-07-25
4      CAD    1.4783  2025-07-25


## Step 1: Handle Missing Values and Parse Dates

In [4]:

# Convert date column
ecb_df['Date'] = pd.to_datetime(ecb_df['Date'])
ecb_df.set_index('Date', inplace=True)

# Check for missing values
print("Missing values:")
print(ecb_df.isnull().sum())

# Forward fill to handle missing values
ecb_df_filled = ecb_df.ffill()

# Interpolation for smooth filling
ecb_df_interpolated = ecb_df.interpolate()


Missing values:
Currency    0
Rate        0
dtype: int64


  ecb_df_interpolated = ecb_df.interpolate()


## Step 2: Resample and Standardize Time Index

In [11]:
# Step: Clean duplicate dates and prepare for resampling

# Print column types
print("Column data types:")
print(ecb_df_interpolated.dtypes)

# Identify numeric columns only (exclude strings/objects)
numeric_cols = ecb_df_interpolated.select_dtypes(include='number').columns
print("\nNumeric columns detected:")
print(numeric_cols)

# If no numeric columns were found, print warning
if numeric_cols.empty:
    print("\n⚠️ No numeric columns found. Check your dataset structure.")
else:
    # Group by the date index and average only numeric columns
    ecb_df_interpolated = ecb_df_interpolated[numeric_cols].groupby(ecb_df_interpolated.index).mean()

    # Sort the index to ensure chronological order
    ecb_df_interpolated = ecb_df_interpolated.sort_index()

    # Show preview before resampling
    print("\nPreview of numeric data after grouping and sorting:")
    print(ecb_df_interpolated.head())

    # Now resample to daily frequency
    ecb_daily = ecb_df_interpolated.resample('D').ffill()

    print("\nResampled Data (first 5 rows):")
    print(ecb_daily.head())


Column data types:
Rate    float64
dtype: object

Numeric columns detected:
Index(['Rate'], dtype='object')

Preview of numeric data after grouping and sorting:
                Rate
Date                
2025-07-25  29.11898

Resampled Data (first 5 rows):
                Rate
Date                
2025-07-25  29.11898


## Step 3: Normalization

In [21]:
# Min‑max normalization for all numeric columns in ecb_daily, handling constant-series edge case
numeric_cols = ecb_daily.select_dtypes(include='number').columns

for col in numeric_cols:
    col_min = ecb_daily[col].min()
    col_max = ecb_daily[col].max()
    if col_max == col_min:
        # All values identical—set normalized to 0
        ecb_daily[f'Norm_{col}'] = 0.0
    else:
        # Standard min-max normalization
        ecb_daily[f'Norm_{col}'] = (
            ecb_daily[col] - col_min
        ) / (
            col_max - col_min
        )

# Drop only rows where *all* numeric and normalized columns are NaN
ecb_cleaned = ecb_daily.dropna(how='all')

# Preview processed and normalized data
print("Processed and normalized data:")
print(ecb_cleaned.head())


Processed and normalized data:
                Rate  Norm_Rate  Norm_Norm_Rate  Norm_Norm_Norm_Rate
Date                                                                
2025-07-25  29.11898        0.0             0.0                  0.0


## Step 4: Save Final Wrangled DataFrame

In [23]:

# Save to new CSV
ecb_cleaned.to_csv("ecb_cleaned_final.csv")

# Confirm
print("Saved as 'ecb_cleaned_final.csv'")


Saved as 'ecb_cleaned_final.csv'
