# Fannie Mae analysis

(just started)

This notebook contains some python code to analyse mortgage data
See the following link how to download data and [more details](https://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html).

Per quarter there is a Acquisition data set and a Performance data set. See the [details here](https://www.fanniemae.com/resources/file/fundmarket/pdf/webinar-101.pdf)


In [1]:
#### using the datatable package from h2o....  super package!
import pandas as pd
import datatable as dt

## import acquisition and performance data

The data on mortgages is per quarter of starting mortgages. For example the file 2010Q1.txt contains all mortgaes that started in Q1 2010, each row is one mortgage.

The performance of the mortgage in the acquisition file are in the file Performance_2010Q1.txt. Multiple rows in this file correspond to one mortgage, For every mortgage we have the monthly performance of the mortgage, form its start until Dec of 2019.

In [18]:
%%time

#### import Acquisition data
Acquisitions_Variables = [
    "LOAN_ID", "ORIG_CHN", "Seller_Name", "ORIG_RT", "ORIG_AMT", "ORIG_TRM", "ORIG_DTE",
    "FRST_DTE", "OLTV", "OCLTV", "NUM_BO", "DTI", "CSCORE_B", "FTHB_FLG", "PURPOSE", "PROP_TYP",
    "NUM_UNIT", "OCC_STAT", "STATE", "ZIP_3", "MI_PCT", "Product_Type", "CSCORE_C", "MI_TYPE", "RELOCATION_FLG"
]

AQ2018Q1 = dt.fread(
    "data/Acquisition_2010Q1.txt",
    sep = "|",
    header = None ,
    columns = Acquisitions_Variables
)

AQ2018Q1 = AQ2018Q1.to_pandas()
AQ2018Q1.shape

CPU times: user 3.88 s, sys: 355 ms, total: 4.24 s
Wall time: 840 ms


(323174, 25)

In [19]:
### First five records
AQ2018Q1.head(5)

Unnamed: 0,LOAN_ID,ORIG_CHN,Seller_Name,ORIG_RT,ORIG_AMT,ORIG_TRM,ORIG_DTE,FRST_DTE,OLTV,OCLTV,...,PROP_TYP,NUM_UNIT,OCC_STAT,STATE,ZIP_3,MI_PCT,Product_Type,CSCORE_C,MI_TYPE,RELOCATION_FLG
0,100010079393,C,"WELLS FARGO BANK, N.A.",4.875,284000,360,01/2010,03/2010,80,80.0,...,PU,1,P,TX,787,,FRM,,,N
1,100013622306,R,"JPMORGAN CHASE BANK, NATIONAL ASSOCIATION",4.75,87000,180,12/2009,02/2010,63,63.0,...,SF,1,P,CA,932,,FRM,785.0,,N
2,100019943199,R,OTHER,5.0,417000,360,11/2009,01/2010,43,43.0,...,PU,1,S,FL,342,,FRM,808.0,,N
3,100022098429,R,OTHER,5.25,461000,360,01/2010,03/2010,61,61.0,...,SF,2,P,NY,112,,FRM,,,N
4,100023088745,R,"WELLS FARGO BANK, N.A.",5.25,100000,360,11/2009,01/2010,80,80.0,...,CO,1,P,OH,446,,FRM,,,N


In [20]:
%%time

#### Import performance data
Performance_Variables = [
    "LOAN_ID", "Monthly_Rpt_Prd", "Servicer_Name", "LAST_RT", "LAST_UPB", "Loan_Age", "Months_To_Legal_Mat",
    "Adj_Month_To_Mat", "Maturity_Date", "MSA", "Delq_Status", "MOD_FLAG", "Zero_Bal_Code", "ZB_DTE", "LPI_DTE",
    "FCC_DTE","DISP_DT", "FCC_COST", "PP_COST", "AR_COST", "IE_COST", "TAX_COST", "NS_PROCS", "CE_PROCS", "RMW_PROCS",
    "O_PROCS", "NON_INT_UPB", "PRIN_FORG_UPB_FHFA", "REPCH_FLAG", "PRIN_FORG_UPB_OTH", "TRANSFER_FLG"
]

PERF2018Q1 = dt.fread(
    "data/Performance_2010Q1.txt",
    sep = "|",
    header = None ,
    columns = Performance_Variables
)

PERF2018Q1 = PERF2018Q1.to_pandas()
PERF2018Q1.shape

CPU times: user 1min 5s, sys: 27.4 s, total: 1min 33s
Wall time: 50.8 s


(18634553, 31)

In [21]:
#### first 5 records
PERF2018Q1.head(5)

Unnamed: 0,LOAN_ID,Monthly_Rpt_Prd,Servicer_Name,LAST_RT,LAST_UPB,Loan_Age,Months_To_Legal_Mat,Adj_Month_To_Mat,Maturity_Date,MSA,...,TAX_COST,NS_PROCS,CE_PROCS,RMW_PROCS,O_PROCS,NON_INT_UPB,PRIN_FORG_UPB_FHFA,REPCH_FLAG,PRIN_FORG_UPB_OTH,TRANSFER_FLG
0,100010079393,02/01/2010,"WELLS FARGO BANK, N.A.",4.875,,0,360,360.0,02/2040,12420,...,,,,,,,,,,N
1,100010079393,03/01/2010,,4.875,,1,359,358.0,02/2040,12420,...,,,,,,,,,,N
2,100010079393,04/01/2010,,4.875,,2,358,358.0,02/2040,12420,...,,,,,,,,,,N
3,100010079393,05/01/2010,,4.875,,3,357,357.0,02/2040,12420,...,,,,,,,,,,N
4,100010079393,06/01/2010,,4.875,,4,356,355.0,02/2040,12420,...,,,,,,,,,,N


## Start with a simple analysis

This will be the easiest in terms of data prep. Look only at mortgages starting in one specific quarter. We can then join the Acquisition and performance file


In [6]:
%%time
test = (
    AQ2018Q1
    .merge(
        PERF2018Q1,
        how="left",
        left_on="LOAN_ID",
        right_on="LOAN_ID"
    )
    .filter([ 
        "LOAN_ID","ORIG_DTE","FRST_DTE",
        "Monthly_Rpt_Prd", "Loan_Age", "Seller_Name", "ORIG_RT",	"ORIG_AMT",
        "Zero_Bal_Code", "Delq_Status", "ZB_DTE", "LPI_DTE"
    ])
)

CPU times: user 57.4 s, sys: 1min 4s, total: 2min 2s
Wall time: 2min 13s


In [7]:
test

Unnamed: 0,LOAN_ID,ORIG_DTE,FRST_DTE,Monthly_Rpt_Prd,Loan_Age,Seller_Name,ORIG_RT,ORIG_AMT,Zero_Bal_Code,Delq_Status,ZB_DTE,LPI_DTE
0,100010079393,01/2010,03/2010,02/01/2010,0,"WELLS FARGO BANK, N.A.",4.875,284000,,0,,
1,100010079393,01/2010,03/2010,03/01/2010,1,"WELLS FARGO BANK, N.A.",4.875,284000,,0,,
2,100010079393,01/2010,03/2010,04/01/2010,2,"WELLS FARGO BANK, N.A.",4.875,284000,,0,,
3,100010079393,01/2010,03/2010,05/01/2010,3,"WELLS FARGO BANK, N.A.",4.875,284000,,0,,
4,100010079393,01/2010,03/2010,06/01/2010,4,"WELLS FARGO BANK, N.A.",4.875,284000,,0,,
...,...,...,...,...,...,...,...,...,...,...,...,...
18634548,999999167522,01/2010,03/2010,10/01/2011,20,OTHER,5.250,417000,,0,,
18634549,999999167522,01/2010,03/2010,11/01/2011,21,OTHER,5.250,417000,,0,,
18634550,999999167522,01/2010,03/2010,12/01/2011,22,OTHER,5.250,417000,,0,,
18634551,999999167522,01/2010,03/2010,01/01/2012,23,OTHER,5.250,417000,,0,,


The column Delq_Status is the Loan Delinquency status and has the following meaning:

* 0 - "Current or less than 30 days past due"
* 1 - "30 - 59 days past due"
* 2 - "60 - 89 days past due"
* 3 - "90 - 119 days past due"
* 4 - "120 - 149 days past due"
* 5 - "150 - 179 days past due"
* 6 - "180 Day Delinquency"
* 7 - "210 Day Delinquency"
* 8 - "240 Day Delinquency"
* 9 - "270 Day Delinquency" / "270+ Day Delinquency"

In [22]:
test.Delq_Status.value_counts()

0     18135400
X       264341
1       127953
2        29554
3        13632
        ...   
99           1
98           1
95           1
96           1
97           1
Name: Delq_Status, Length: 102, dtype: int64

In [23]:
test.query("Delq_Status == '5'")

Unnamed: 0,LOAN_ID,ORIG_DTE,FRST_DTE,Monthly_Rpt_Prd,Loan_Age,Seller_Name,ORIG_RT,ORIG_AMT,Zero_Bal_Code,Delq_Status,ZB_DTE,LPI_DTE
1995,100091653134,12/2009,02/2010,07/01/2018,102,OTHER,5.375,312000,,5,,
1996,100091653134,12/2009,02/2010,08/01/2018,103,OTHER,5.375,312000,,5,,
1997,100091653134,12/2009,02/2010,09/01/2018,104,OTHER,5.375,312000,,5,,
6564,100312066648,01/2010,03/2010,03/01/2014,49,OTHER,5.250,160000,,5,,
8863,100446401716,12/2009,02/2010,09/01/2011,20,"JPMORGAN CHASE BANK, NATIONAL ASSOCIATION",5.125,244000,,5,,
...,...,...,...,...,...,...,...,...,...,...,...,...
18631508,999865000263,01/2010,03/2010,11/01/2013,45,OTHER,5.125,60000,,5,,
18631519,999865000263,01/2010,03/2010,10/01/2014,56,OTHER,5.125,60000,,5,,
18631539,999865000263,01/2010,03/2010,06/01/2016,76,OTHER,5.125,60000,,5,,
18632769,999919586679,12/2009,02/2010,01/01/2017,84,"WELLS FARGO BANK, N.A.",5.000,427000,,5,,


In [25]:
test.query("LOAN_ID == 100091653134")

Unnamed: 0,LOAN_ID,ORIG_DTE,FRST_DTE,Monthly_Rpt_Prd,Loan_Age,Seller_Name,ORIG_RT,ORIG_AMT,Zero_Bal_Code,Delq_Status,ZB_DTE,LPI_DTE
1893,100091653134,12/2009,02/2010,01/01/2010,0,OTHER,5.375,312000,,0,,
1894,100091653134,12/2009,02/2010,02/01/2010,1,OTHER,5.375,312000,,0,,
1895,100091653134,12/2009,02/2010,03/01/2010,2,OTHER,5.375,312000,,0,,
1896,100091653134,12/2009,02/2010,04/01/2010,3,OTHER,5.375,312000,,0,,
1897,100091653134,12/2009,02/2010,05/01/2010,4,OTHER,5.375,312000,,0,,
...,...,...,...,...,...,...,...,...,...,...,...,...
2008,100091653134,12/2009,02/2010,08/01/2019,115,OTHER,5.375,312000,,0,,
2009,100091653134,12/2009,02/2010,09/01/2019,116,OTHER,5.375,312000,,1,,
2010,100091653134,12/2009,02/2010,10/01/2019,117,OTHER,5.375,312000,,1,,
2011,100091653134,12/2009,02/2010,11/01/2019,118,OTHER,5.375,312000,,1,,
