**Home Mortgage Disclosure Act (HMDA)**


The Home Mortgage Disclosure Act (HMDA) was enacted by Congress in 1975 and was implemented by the Federal Reserve Board's Regulation C. On July 21, 2011, the rule-writing authority of Regulation C was transferred to the Consumer Financial Protection Bureau (CFPB). Regulation C, requires lending institutions to report public loan data. In this section of the website, you can find out more about the regulation and its interpretation.

The primary purposes of the Home Mortgage Disclosure Act and Regulation C are to monitor the geographic targets of mortgage lenders, provide an identification mechanism for any predatory lending practices and to provide reporting statistics on the mortgage market to the government. The HMDA helps to support the community investment initiatives sponsored by government programs, with HMDA contributing to the oversight of the initiatives through statistical reporting. HMDA also helps government officials to identify any predatory lending practices which may be affecting mortgage loan issuance. HMDA submissions also provide a means for analyzing government resource allocations and ensuring that resources are appropriately allocated to fund community initiatives.(from -Investopedia)

"""
If interested follow the below link to understand HMDA file glossory

https://www.ffiec.gov/hmda/glossary.htm

"""


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
sns.set(font_scale=1.5)

In [None]:
hmda = pd.read_csv('../input/Washington_State_HDMA-2016.csv')

In [None]:
hmda.head()

In [None]:
hmda.info()

In [None]:
hmda.describe()

In [None]:
hmda['action_taken_name'].value_counts()

In [None]:
#Plotting a bar graph - Loan application status in Washington state.
Loan_status = hmda['action_taken_name'].value_counts()
plt.figure(figsize=(10,8))
sns.barplot(x=Loan_status.index,y=Loan_status.values,alpha=.8)
plt.title('Loan Application Status in Washington')
plt.ylabel('Loan Applications Status in Number ', fontsize=20)
plt.xlabel('LoanStatus', fontsize=20)
plt.xticks(rotation=90)
plt.show()

In [None]:
#Plotting a pie chart - Loan application status in Washington state.
plt.figure(figsize=(20,10),dpi = 100)
labels = hmda['action_taken_name'].value_counts().index
sizes = hmda['action_taken_name'].value_counts().values
plt.pie(sizes,labels=labels,autopct='%1.1f%%',startangle=90)
plt.legend(labels,loc="best")
# View the plot drop above
plt.axis('equal', fontsize=10)
# View the plot
plt.tight_layout()
plt.show()

In [None]:
hmda[['applicant_income_000s','loan_amount_000s']].groupby(hmda['action_taken_name']).describe()


Find out if applicant income and loan amount palyed a role in Loan origination

In [None]:
 hmda[['applicant_income_000s','loan_amount_000s']].corr()
    

In [None]:
hmda[['hud_median_family_income','applicant_income_000s','loan_amount_000s']].corr()

In [None]:
# create tabular correlation matrix
mean_income_loanamt = hmda[['hud_median_family_income','applicant_income_000s','loan_amount_000s']].corr()
_, ax = plt.subplots(figsize=(10,10)) 

# graph correlation matrix
_ = sns.heatmap(mean_income_loanamt, ax=ax,
                xticklabels=mean_income_loanamt.columns.values,
                yticklabels=mean_income_loanamt.columns.values,
               cmap="coolwarm")

Lets find all the No. of loans got fianlly originated in Washington


In [None]:
loans_orig = hmda[hmda['action_taken_name'] == 'Loan originated']


In [None]:
loans_orig.shape

In [None]:
loans_orig.head()

**Of the Orginated loans,how many are different loan types?**

****Looks like Conventional Loans are topping the charts!********


In [None]:
#Plotting a pie chart - showing of the Loan Originated loans, how many loan types are present.
plt.figure(figsize=(20,10),dpi = 100)
labels = loans_orig['loan_type_name'].value_counts().index
sizes = loans_orig['loan_type_name'].value_counts().values
#colors = ['gold', 'yellowgreen', 'lightcoral', 'lightskyblue']
# fontsize=12
#explode=(0, 0, 0, 0, 0.15)
plt.pie(sizes,labels=labels,autopct='%1.1f%%',startangle=90)
plt.legend(labels,loc="best")
# View the plot drop above
plt.axis('equal', fontsize=12)
# View the plot
plt.tight_layout()
plt.show()

In [None]:
#Plotting a bar chart - of the Loan Originated loans , exact no of different type of loans types
Loan_type =loans_orig['loan_type_name'].value_counts()
plt.figure(figsize=(10,8))
sns.barplot(x=Loan_type.index,y=Loan_type.values,alpha=.8)
plt.title('Loan Type  in Washington')
plt.xlabel('Loan Type of the Loan Originated', fontsize=20)
plt.xticks(rotation=308)
plt.show()

In [None]:
#Plotting a bar graph - Loans Orginated by diff loan purpose
Loan_purpose =loans_orig['loan_purpose_name'].value_counts()
plt.figure(figsize=(10,8))
sns.barplot(x=Loan_purpose.index,y=Loan_purpose.values,alpha=.8)
plt.title('Loan Purpose  in Washington')
plt.xlabel('Loan Purpose of the Loans Originated', fontsize=20)
plt.xticks(rotation=308)
plt.show()

In [None]:
loans_orig['applicant_sex_name'].value_counts()


In [None]:
#Plotting a bar graph - Loans Originated by Gender
Loan_gender =loans_orig['applicant_sex_name'].value_counts()
plt.figure(figsize=(10,8))
sns.barplot(x=Loan_gender.index,y=Loan_gender.values,alpha=.8)
plt.title('Loan gender in Washington')
plt.xticks(rotation=90)
plt.show()

**Lien Status**
For HMDA reporting purposes, lenders are required to report lien status for loans they originate and applications that do not result in originations (Codes 1 through 3 are used for these loans; Code 4 is used for purchased loans). Lien status is determined by reference to the best information readily available to the lender at the time final action is taken and to the lender's own procedures. Lien status aids in the interpretation of price data. For more information on lien status, see the HMDA Price Data Frequently Asked Questions (FAQs) section of the following link: http://www.federalreserve.gov/newsevents/press/bcreg/20060403a.htm

**First lien secured loans. In the event of a bankruptcy or liquidation, the assets used by the company as security would first be provided to the first lien secured lenders as repayment of their borrowings.**

**Second-lien debt has a subordinated claim to collateral pledged to secure a loan. In a forced liquidation, a second lien may receive proceeds from the sale of the assets pledged to secure the loan, but only after senior debt holders have been paid.**

**Not secured by lien - With unsecured debts, lenders do not have rights to any collateral for the debt. If you fall behind on your payments, they generally cannot claim your assets for the debt.**

In [None]:
#Plotting a bar graph - Loans Originated by lein status
lien_status_name =loans_orig['lien_status_name'].value_counts()
plt.figure(figsize=(10,8))
sns.barplot(x=lien_status_name.index,y=lien_status_name.values,alpha=.8)
plt.title('Lien status in Washington')
plt.xticks(rotation=308)
plt.show()

In [None]:

#Plotting a bar graph - Loans Originated by applicant_ethnicity_name
Loan_gender =loans_orig['applicant_ethnicity_name'].value_counts()
plt.figure(figsize=(10,8))
sns.barplot(x=Loan_gender.index,y=Loan_gender.values,alpha=.8)
plt.title('Loan Application ethinicity in Washington')
plt.xticks(rotation=90)
plt.show()

**tract_to_msamd_income - The percentage of the median family income for the tract compared to the median family income for the MSA/MD, rounded to two decimal places.******

In [None]:
#loans originated from different counties in Washingtom

loans_orig['county_name'].unique()


In [None]:
len(loans_orig['county_name'].unique())

In [None]:
loans_county=loans_orig.sort_values(by='loan_amount_000s',ascending=False).groupby(loans_orig['county_name'])


In [None]:
loans_county['county_name'].value_counts()

In [None]:
#loans_county[['loan_amount_000s','county_name']].

#Plotting a bar graph - Loans Originated by county
county =loans_county['county_name'].value_counts()
plt.figure(figsize=(10,8))
sns.barplot(x=county.index,y=county.values,alpha=.8)
plt.title('Loans Originated by counties in Washington')
plt.xticks(rotation=90)
plt.show()




In [None]:
# counties and hud median salary and loan amounts
# fig, ax = plt.subplots(figsize=(20,10))
# sns.catplot(y="county_name",hue="hud_median_family_income", kind="count", palette="pastel", edgecolor=".8", data=loans_orig, ax=ax)

In [None]:
hmda[['tract_to_msamd_income','loan_amount_000s']].groupby(hmda['tract_to_msamd_income']).describe()

In [None]:
#drop as_of_age has 
hmda.drop('as_of_year',axis=1,inplace=True)

In [None]:
# create tabular correlation matrix
corr = hmda.corr()
_, ax = plt.subplots(figsize=(13,10)) 

# graph correlation matrix
_ = sns.heatmap(corr, ax=ax,
                xticklabels=corr.columns.values,
                yticklabels=corr.columns.values,
               cmap="YlGnBu")

**Summary**

Loan originated                                        263712
Application denied by financial institution             64177
Application withdrawn by applicant                      60358
Loan purchased by the institution                       48356
File closed for incompleteness                          18176
Application approved but not accepted                   11735
Preapproval request denied by financial institution        35
Preapproval request approved but not accepted              17


**About 4% of the loans applications could have been converted to status,- Loan Originated if there were no errros in the file.If this process can be automated and completed untill loan closing. This is good indicator to tap into that market share**.
**
Male                                                                                 172650
Female                                                                                65579
Information not provided by applicant in mail, Internet, or telephone application     22433
Not applicable                                                                         3050

**Out of all Loan Originated loan applications, clearly there is a gender gap wrt. to closing the loan to get into Loan Originated status.., is this because of the pay gap??? something to think about or any due to any other practices**

**Out of 4 types of loan, people have favored conventional loan type(easy to secure a loan if you have a good credit score along with Employment) to govt funded VA(veterans loans) loans, FHA(Federal Housing Administration.). An Interesting point is that highest loans are refinanced loans(lower interest rates?? or converted ARM loans to fixed loans?? )**

**Out of 40 counties in Washington, Kings county seems to be the "King " of all securing highest loan amount(maybe avg incomes are high???) **

**Out of all loan originated applications, it is not "Not Hispanic/ Latinos" who has secured the loan, seems like the other Ethnicity groups have secured the loan

**

**All Loan Originated status loans were secured thru the first lien, meaning if they miss paying mortgage amount, they can get into delinquency state and later into Bankruptcy or Foreclosure. In the event of foreclosure, they will lose the house as they have first lien**

**Not but not the least, there is a clear correlation between, applicant income and loan amount in securing a loan, higher the individual income, greater chances of securing a loan**

