# US Mortgage Market Analysis

## Introduction

This project aims to conduct an exploratory analysis of the loan-level data reported by banks and financial institutions in the United States regarding mortgages granted to the public. The [Home Mortgage Disclosure Act](https://www.consumerfinance.gov/data-research/hmda/) mandates that certain banks and institutions in the US report this information.

## Exploratory Data Analysis

### Importing Libraries

In [131]:
import numpy as np
import pandas as pd

### Loading Data

In [132]:
mortgage_df = pd.read_csv("./datasets/year_2021.csv", nrows=10000)

#### Random Sampling

In [133]:
mortgage_df = mortgage_df.sample(n=1000)

### Filtering Missing Data

In [134]:
num_rows, num_cols = mortgage_df.shape

In [135]:
# Drops rows that are all missing values
mortgage_df.dropna(how="all", inplace=True)

# Drops columns that have more than 95% of its values as NAs
mortgage_df.dropna(thresh=num_rows*0.05, axis=1, inplace=True)

# Drops columns that are all missing values
#mortgage_df.dropna(how="all", axis=1, inplace=True)

# Reset index values
mortgage_df.reset_index(drop=True, inplace=True)

In [136]:
# Freequency table of missing values per column
na_counts = mortgage_df.isna().sum().sort_values(ascending=False)

# Filters out columns with non-missing values
na_counts = na_counts[na_counts > 0]

# Convert to relative frequency table
na_freq_tab = na_counts / num_rows

In [143]:
print("Relative Frequency Table of Missing Values:")
print(na_freq_tab)

Relative Frequency Table of Missing Values:
applicant_ethnicity-2        0.895
lender_credits               0.819
discount_points              0.781
co-applicant_age_above_62    0.580
rate_spread                  0.389
debt_to_income_ratio         0.325
loan_to_value_ratio          0.324
origination_charges          0.249
total_loan_costs             0.249
interest_rate                0.202
property_value               0.174
dtype: float64


In [129]:
mortgage_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 73 columns):
 #   Column                                    Non-Null Count  Dtype  
---  ------                                    --------------  -----  
 0   activity_year                             1000 non-null   int64  
 1   lei                                       1000 non-null   object 
 2   derived_msa-md                            1000 non-null   int64  
 3   state_code                                1000 non-null   object 
 4   county_code                               1000 non-null   int64  
 5   census_tract                              1000 non-null   int64  
 6   conforming_loan_limit                     1000 non-null   object 
 7   derived_loan_product_type                 1000 non-null   object 
 8   derived_dwelling_category                 1000 non-null   object 
 9   derived_ethnicity                         1000 non-null   object 
 10  derived_race                         

In [138]:
# Frequency table of column data types
mortgage_df.dtypes.value_counts()

int64      48
object     13
float64    12
dtype: int64

In [145]:
mortgage_df.columns

Index(['activity_year', 'lei', 'derived_msa-md', 'state_code', 'county_code',
       'census_tract', 'conforming_loan_limit', 'derived_loan_product_type',
       'derived_dwelling_category', 'derived_ethnicity', 'derived_race',
       'derived_sex', 'action_taken', 'purchaser_type', 'preapproval',
       'loan_type', 'loan_purpose', 'lien_status', 'reverse_mortgage',
       'open-end_line_of_credit', 'business_or_commercial_purpose',
       'loan_amount', 'loan_to_value_ratio', 'interest_rate', 'rate_spread',
       'hoepa_status', 'total_loan_costs', 'origination_charges',
       'discount_points', 'lender_credits', 'loan_term',
       'negative_amortization', 'interest_only_payment', 'balloon_payment',
       'other_nonamortizing_features', 'property_value', 'construction_method',
       'occupancy_type', 'manufactured_home_secured_property_type',
       'manufactured_home_land_property_interest', 'total_units', 'income',
       'debt_to_income_ratio', 'applicant_credit_score_type',
