# Fannie Mae loan performance data
Source: [Single-Family Fixed Rate Mortgage Dataset](https://loanperformancedata.fanniemae.com/lppub/index.html#Portfolio)

Three files:
1. An **acquisitions file** - static mortgage loan data for mortgage loans included in our historical dataset that were initially acquired by Fannie Mae that were subsequently refinanced through HARP.
2. A **performance file** that provides monthly performance data. 
3. A **loan mapping file**, which enables users to map the loan refinanced through HARP to its corresponding original loan in the historical research dataset.

# Questions
1. If we want to map acquisition data, it has anonymized zip codes. The performance data has the MSA (Metropolitan Statistical Areas) code though.

In [3]:
import csv
from pathlib import Path
import pandas as pd
import glob
import os
import getpass

data_dir = "data/Acquisition"
print(os.listdir(data_dir))

['Acquisition_2019Q1.csv', 'Acquisition_2019Q1.txt', 'Fannie10Q1_refi.csv', 'Fannie19Q1_refi2.csv', 'Fannie19Q1_refi_avgfico.csv']


In [23]:
headerline = [
    'loanIdentifier'
    ,'origChannel'
    ,'sellerName'
    ,'origIntRate'
    ,'origUPB'
    ,'origLoanTerm'
    ,'origDate'
    ,'firstPmtDate'
    ,'origLTV'
    ,'origCLTV'
    ,'numBorrowers'
    ,'origDebtIncRatio'
    ,'borrCreditScore'
    ,'firstTHBI'
    ,'loanPurp'
    ,'propType'
    ,'numUnits'
    ,'occType'
    ,'propState'
    ,'zipCode'
    ,'pMIperct'
    ,'prodType'
    ,'coborrCreditScore'
    ,'mortInsType'
    ,'relocMortInd'
]

In [30]:
# Get each *.txt file, convert it to CSV, and store as same file name.CSV
for in_path in Path(data_dir).glob('*.txt'):
     out_path = in_path.with_suffix('.csv')
     #print(f'in path: {in_path}')
     with in_path.open('r') as fin, out_path.open('w', newline='') as fout:
        reader = csv.DictReader(fin, fieldnames=headerline,delimiter="|")
        #print(f'reader fieldnames: {reader.fieldnames}')
        writer = csv.DictWriter(fout, reader.fieldnames, delimiter=",")
        
        writer.writeheader()
        writer.writerows(reader)

In [35]:
li = []

for in_path in Path(data_dir).glob('*.csv'):
    temp = pd.read_csv(in_path, index_col=None, header=0)
    li.append(temp)

df = pd.concat(li, axis=0, ignore_index=True, sort=False)

In [40]:
df.head()
print(df.shape)

(384278, 52)


In [41]:
df.to_csv('FannieAcq2000-19Q1.csv')

In [42]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 384278 entries, 0 to 384277
Data columns (total 52 columns):
loanIdentifier         297452 non-null float64
origChannel            355330 non-null object
sellerName             355330 non-null object
origIntRate            355330 non-null float64
origUPB                355330 non-null float64
origLoanTerm           297452 non-null float64
origDate               355330 non-null object
firstPmtDate           355330 non-null object
origLTV                355330 non-null float64
origCLTV               355330 non-null float64
numBorrowers           355330 non-null float64
origDebtIncRatio       355234 non-null float64
borrCreditScore        354945 non-null float64
firstTHBI              297452 non-null object
loanPurp               355330 non-null object
propType               297452 non-null object
numUnits               297452 non-null float64
occType                297452 non-null object
propState              355330 non-null object
zipCo