In [1]:
import pandas as pd
import numpy as np

This notebook shows some basics stats about securities labeled as NC1

## Load Data

In [2]:
data = pd.read_csv('trimmed_2005_v2.csv')

In [3]:
#Get all rows with NC1
nc1 = data.loc[data['Label'] == 'NC1(failing)']

## Data Validation

In [4]:
print('we have {} rows of data'.format(data.shape[0]))
print('we have {} rows of NC1'.format(nc1.shape[0]))
print('percentage: {:.2f}%'.format(nc1.shape[0]/data.shape[0]*100))

we have 5015 rows of data
we have 1402 rows of NC1
percentage: 27.96%


In [5]:
#groupby label and check # unique value
print('List of all labels count:')
data.groupby('Label')['CUSIP'].nunique()

List of all labels count:


Label
0                                59
1.4                              99
FE                              374
IOfailing                         1
IOpassMED                        46
IOpassMEY                        97
MED                             965
MEY                            1108
NC1(failing)                   1388
NC2(z>1 nsf)                     36
NC3(z>1, not paid off, nsf)      56
NMEm                            696
NMEs                             34
Name: CUSIP, dtype: int64

In [6]:
valid = data.groupby('Label')['CUSIP'].nunique()['NC1(failing)'] == nc1.shape[0]
print('Does row of nc1 equals # of nc1 in the data? {}'.format(valid))

Does row of nc1 equals # of nc1 in the data? False


## Explore NC1

In [7]:
count = nc1.groupby('Prospectus')['CUSIP'].nunique().count()
print('There are ' + str(count) + ' of prospectus that have NC1')
print("On average {:.2f} NC1 per prospectus that have NC1".format(nc1.shape[0]/count))

There are 263 of prospectus that have NC1
On average 5.33 NC1 per prospectus that have NC1


In [8]:
show_top = 5
print('Top {} of the MTG_TRANCHE_TYP_LONG among NC1 are:'.format(show_top))
nc1.groupby('MTG_TRANCHE_TYP_LONG')['CUSIP'].nunique().sort_values(ascending=False).head(show_top)

Top 5 of the MTG_TRANCHE_TYP_LONG among NC1 are:


MTG_TRANCHE_TYP_LONG
SUB,CSTR,NAS        474
SUB,NAS             133
MEZ,FLT,STEP        130
MEZ,FLT,STEP,IRC     89
SUB,FLT,STEP,IRC     55
Name: CUSIP, dtype: int64

In [9]:
print('Top {} of the MTG_TRANCHE_TYP_LONG among non-NC1s are:'.format(show_top))
data.loc[data['Label'] != 'NC1(failing)'].groupby('MTG_TRANCHE_TYP_LONG')['CUSIP'].nunique().sort_values(ascending=False).head(show_top)

Top 5 of the MTG_TRANCHE_TYP_LONG among non-NC1s are:


MTG_TRANCHE_TYP_LONG
MEZ,FLT,STEP,IRC    237
FLT,STEP,IRC        221
FLT,STEP            186
MEZ,FLT,STEP        172
SEQ,AS              158
Name: CUSIP, dtype: int64

In 2004, MTG_TRANCHE_TYP_LONG among NC1 and non-NC1 data set, SUB (Subordinated), CSTR(Collateral Strip Rate), and NAS(Non-Accelerated Security) seems to be the top components. <br>

In 2005, MTG_TRANCHE_TYP_LONG among NC1 and non-NC1 data set, FLT(Floater), STEP(Stepped Rate Bond), and MEZ(Mezzanine) seems to be the top components. <br>

For NC1 in 2004, I couldn't really see big difference between both groups.<br>
For NC1 in 2005, there is really big difference between both groups.<br>

[See here for more info about MTG_TRANCHE_TYP](https://docs.google.com/spreadsheets/d/1MOwPnTr2owqPoJNy73U7UEc3z1RvtzELOCM0ZFxBJU8/edit?usp=sharing)

In [10]:
nc1.columns

Index(['Year', 'PID', 'Prospectus', 'Class', 'norm_class', 'Name',
       'Current_Balance', 'Zero-Balance Payment Period Number',
       'Sum Principle Paid', 'MTG ORIG AMT', 'Maturity', 'CUSIP',
       'MTG_TRANCHE_TYP_LONG', 'Moody Rating', 'Initial Moody Rating',
       'Bloomberg Composite', 'HCLB', 'MTG INT SHRTFLL', 'HIST INTRST SHRTFLL',
       'Label', 'NL_fail'],
      dtype='object')

In [11]:
total = nc1['MTG ORIG AMT'].sum()
print('Sum of MTG ORIG AMT among NC1 = {:.2f}'.format(total))

Sum of MTG ORIG AMT among NC1 = 10417.03


In [12]:
print('Desciption of MTG ORIG AMT among NC1:')
nc1['MTG ORIG AMT'].describe()

Desciption of MTG ORIG AMT among NC1:


count    1402.000000
mean        7.430118
std         8.223513
min         0.000100
25%         2.160000
50%         5.038000
75%        10.025500
max       151.617000
Name: MTG ORIG AMT, dtype: float64

## To do
- Look into Bloomberg (Paydown Infomation?)
- Why is CUSIP duplicated in 2005?
