# February DS/AL-ML + BIA Data Jam - US Consumer Behavior

## Introduction

The project is be collaboratively evaluate the claim using real U.S. macroeconomic data from Federal Reserve Economic Data (FRED) and present a clear, evidence-based conclusion. 

## Data Preprocessing

In [1]:
# Import necessary libraries
import re
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
# Load the datasets
credit_owned = pd.read_csv('../data/credit_owned.csv')
personal_expenditure = pd.read_csv('../data/personal_expenditure.csv')
saving_rate = pd.read_csv('../data/saving_rate.csv')

In [3]:
# Display the first few rows of each dataset
print("Credit Owned Dataset:")
display(credit_owned.head())

print("\nPersonal Expenditure Dataset:")
display(personal_expenditure.head())

print("\nSaving Rate Dataset:")
display(saving_rate.head())

Credit Owned Dataset:


Unnamed: 0,observation_date,TOTALSL
0,1943-01-01,6577.83
1,1943-02-01,6463.04
2,1943-03-01,6234.21
3,1943-04-01,6125.75
4,1943-05-01,5936.26



Personal Expenditure Dataset:


Unnamed: 0,observation_date,PCEC96
0,2007-01-01,11181.0
1,2007-02-01,11178.2
2,2007-03-01,11190.7
3,2007-04-01,11201.5
4,2007-05-01,11218.0



Saving Rate Dataset:


Unnamed: 0,observation_date,PSAVERT
0,1959-01-01,11.3
1,1959-02-01,10.6
2,1959-03-01,10.3
3,1959-04-01,11.2
4,1959-05-01,10.6


In [4]:
# Determining the size of all the DataFrames

# Display the shape of each dataset
print("Shape of Credit Owned Dataset:", credit_owned.shape)
print("Shape of Personal Expenditure Dataset:", personal_expenditure.shape)
print("Shape of Saving Rate Dataset:", saving_rate.shape)

Shape of Credit Owned Dataset: (995, 2)
Shape of Personal Expenditure Dataset: (227, 2)
Shape of Saving Rate Dataset: (803, 2)


In [5]:
# Function to rename the column names of the datasets
def columns(data):
    return re.sub(r'(?<=[a-z])(?=[A-Z])', '_', data).lower()

In [6]:
# Apply the column renaming function to each dataset
credit_owned.columns = [columns(col) for col in credit_owned.columns]
personal_expenditure.columns = [columns(col) for col in personal_expenditure.columns]
saving_rate.columns = [columns(col) for col in saving_rate.columns]

In [7]:
# Display the first few rows of each dataset
print("Credit Owned Dataset:")
display(credit_owned.head())

print("\nPersonal Expenditure Dataset:")
display(personal_expenditure.head())

print("\nSaving Rate Dataset:")
display(saving_rate.head())

Credit Owned Dataset:


Unnamed: 0,observation_date,totalsl
0,1943-01-01,6577.83
1,1943-02-01,6463.04
2,1943-03-01,6234.21
3,1943-04-01,6125.75
4,1943-05-01,5936.26



Personal Expenditure Dataset:


Unnamed: 0,observation_date,pcec96
0,2007-01-01,11181.0
1,2007-02-01,11178.2
2,2007-03-01,11190.7
3,2007-04-01,11201.5
4,2007-05-01,11218.0



Saving Rate Dataset:


Unnamed: 0,observation_date,psavert
0,1959-01-01,11.3
1,1959-02-01,10.6
2,1959-03-01,10.3
3,1959-04-01,11.2
4,1959-05-01,10.6


In [8]:
# Display informative summary of each dataset
print("Credit Owned Dataset Info:")
credit_owned.info()

print("\nPersonal Expenditure Dataset Info:")
personal_expenditure.info()

print("\nSaving Rate Dataset Info:")
saving_rate.info()

Credit Owned Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 995 entries, 0 to 994
Data columns (total 2 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   observation_date  995 non-null    object 
 1   totalsl           995 non-null    float64
dtypes: float64(1), object(1)
memory usage: 15.7+ KB

Personal Expenditure Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 227 entries, 0 to 226
Data columns (total 2 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   observation_date  227 non-null    object 
 1   pcec96            227 non-null    float64
dtypes: float64(1), object(1)
memory usage: 3.7+ KB

Saving Rate Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 803 entries, 0 to 802
Data columns (total 2 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   observation_date  

In [9]:
# Display descriptive statistics of each dataset
print("Credit Owned Dataset Description:")
display(credit_owned.describe())

print("\nPersonal Expenditure Dataset Description:")
display(personal_expenditure.describe())

print("\nSaving Rate Dataset Description:")
display(saving_rate.describe())

Credit Owned Dataset Description:


Unnamed: 0,totalsl
count,995.0
mean,1230814.0
std,1481781.0
min,5354.36
25%,74641.36
50%,480994.5
75%,2215879.0
max,5084831.0



Personal Expenditure Dataset Description:


Unnamed: 0,pcec96
count,227.0
mean,13176.984581
std,1724.263802
min,11068.0
25%,11555.0
50%,12884.0
75%,14455.35
max,16715.4



Saving Rate Dataset Description:


Unnamed: 0,psavert
count,803.0
mean,8.404857
std,3.424809
min,1.4
25%,5.7
50%,8.3
75%,11.1
max,31.8


In [10]:
# Check for missing values in each dataset

print("Missing Values in Credit Owned Dataset:")
print(credit_owned.isnull().sum())  

print("\nMissing Values in Personal Expenditure Dataset:")
print(personal_expenditure.isnull().sum())

print("\nMissing Values in Saving Rate Dataset:")
print(saving_rate.isnull().sum())

Missing Values in Credit Owned Dataset:
observation_date    0
totalsl             0
dtype: int64

Missing Values in Personal Expenditure Dataset:
observation_date    0
pcec96              0
dtype: int64

Missing Values in Saving Rate Dataset:
observation_date    0
psavert             0
dtype: int64


In [11]:
# Check for duplicates in the datasets
print("Duplicate Rows in Credit Owned Dataset:", credit_owned.duplicated().sum())   
print("Duplicate Rows in Personal Expenditure Dataset:", personal_expenditure.duplicated().sum())
print("Duplicate Rows in Saving Rate Dataset:", saving_rate.duplicated().sum())

Duplicate Rows in Credit Owned Dataset: 0
Duplicate Rows in Personal Expenditure Dataset: 0
Duplicate Rows in Saving Rate Dataset: 0


## Data Analysis