In [1]:
import pandas as pd
import numpy as np

%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina')

This notebook shows some basics stats about securities labeled as NC1

## Load Data

In [2]:
data = pd.read_csv('trimmed_2004_v2.csv')

In [3]:
#Get all rows with NC1
nc1 = data.loc[data['Label'] == 'NC1(failing)']

## Data Validation

In [4]:
print('we have {} rows of data'.format(data.shape[0]))
print('we have {} rows of NC1'.format(nc1.shape[0]))
print('percentage: {:.2f}%'.format(nc1.shape[0]/data.shape[0]*100))

we have 4450 rows of data
we have 461 rows of NC1
percentage: 10.36%


In [5]:
#groupby label and check # unique value
print('List of all labels count:')
data.groupby('Label')['CUSIP'].nunique()

List of all labels count:


Label
0                                41
1.4                             133
FE                              397
IOfailing                         1
IOpassMED                       114
IOpassMEY                        79
MED                            1220
MEY                            1251
NC1(failing)                    461
NC2(z>1 nsf)                     24
NC3(z>1, not paid off, nsf)      24
NMEm                            638
NMEs                              7
Name: CUSIP, dtype: int64

In [6]:
valid = data.groupby('Label')['CUSIP'].nunique()['NC1(failing)'] == nc1.shape[0]
print('Does row of nc1 equals # of nc1 in the data? {}'.format(valid))

Does row of nc1 equals # of nc1 in the data? True


## Explore NC1

In [7]:
count = nc1.groupby('Prospectus')['CUSIP'].nunique().count()
print('There are ' + str(count) + ' of prospectus that have NC1')
print("On average {:.2f} NC1 per prospectus that have NC1".format(nc1.shape[0]/count))

There are 150 of prospectus that have NC1
On average 3.07 NC1 per prospectus that have NC1


In [8]:
show_top = 5
print('Top {} of the MTG_TRANCHE_TYP_LONG among NC1 are:'.format(show_top))
nc1.groupby('MTG_TRANCHE_TYP_LONG')['CUSIP'].nunique().sort_values(ascending=False).head(show_top)

Top 5 of the MTG_TRANCHE_TYP_LONG among NC1 are:


MTG_TRANCHE_TYP_LONG
SUB,CSTR,NAS    281
SUB,NAS         126
MEZ,FLT,STEP     10
SUB,CSTR,AS       7
SUB,CSTR          4
Name: CUSIP, dtype: int64

In [9]:
print('Top {} of the MTG_TRANCHE_TYP_LONG among non-NC1s are:'.format(show_top))
data.loc[data['Label'] != 'NC1(failing)'].groupby('MTG_TRANCHE_TYP_LONG')['CUSIP'].nunique().sort_values(ascending=False).head(show_top)

Top 5 of the MTG_TRANCHE_TYP_LONG among non-NC1s are:


MTG_TRANCHE_TYP_LONG
SUB,CSTR,NAS        441
SEQ,AS              262
SUB,NAS             238
MEZ,FLT,STEP,IRC    227
CSTR,PT,AS          189
Name: CUSIP, dtype: int64

In 2004, MTG_TRANCHE_TYP_LONG among NC1 and non-NC1 data set, SUB (Subordinated), CSTR(Collateral Strip Rate), and NAS(Non-Accelerated Security) seems to be the top components. <br>

In 2005, MTG_TRANCHE_TYP_LONG among NC1 and non-NC1 data set, FLT(Floater), STEP(Stepped Rate Bond), and MEZ(Mezzanine) seems to be the top components. <br>

For NC1 in 2004, I couldn't really see big difference between both groups.<br>
For NC1 in 2005, there is really big difference between both groups.<br>

[See here for more info about MTG_TRANCHE_TYP](https://docs.google.com/spreadsheets/d/1MOwPnTr2owqPoJNy73U7UEc3z1RvtzELOCM0ZFxBJU8/edit?usp=sharing)

In [10]:
total = nc1['MTG ORIG AMT'].sum()
print('Sum of MTG ORIG AMT among NC1 = {:.2f}'.format(total))

Sum of MTG ORIG AMT among NC1 = 2113.59


In [11]:
print('Desciption of MTG ORIG AMT among NC1:')
nc1['MTG ORIG AMT'].describe()

Desciption of MTG ORIG AMT among NC1:


count    461.000000
mean       4.584784
std       29.141494
min        0.048000
25%        0.644000
50%        1.200900
75%        2.273000
max      560.470000
Name: MTG ORIG AMT, dtype: float64

## To do
- Look into Bloomberg (Paydown Infomation?)
- Look why payment just suddenly stops instead of gradually decreased and stoped
- Why is CUSIP duplicated?


## Why is CUSIP duplicated?
The folling shows that there are duplicate CUSIP in the data set

In [12]:
data.groupby('Label')['CUSIP'].count()

Label
0                                41
1.4                             136
FE                              400
IOfailing                         1
IOpassMED                       114
IOpassMEY                        81
MED                            1232
MEY                            1285
NC1(failing)                    461
NC2(z>1 nsf)                     24
NC3(z>1, not paid off, nsf)      24
NMEm                            644
NMEs                              7
Name: CUSIP, dtype: int64

In [13]:
data.groupby('Label')['CUSIP'].nunique()

Label
0                                41
1.4                             133
FE                              397
IOfailing                         1
IOpassMED                       114
IOpassMEY                        79
MED                            1220
MEY                            1251
NC1(failing)                    461
NC2(z>1 nsf)                     24
NC3(z>1, not paid off, nsf)      24
NMEm                            638
NMEs                              7
Name: CUSIP, dtype: int64

All the Falses show two are in different sizes

In [14]:
data.groupby('Label')['CUSIP'].count() == data.groupby('Label')['CUSIP'].nunique()

Label
0                               True
1.4                            False
FE                             False
IOfailing                       True
IOpassMED                       True
IOpassMEY                      False
MED                            False
MEY                            False
NC1(failing)                    True
NC2(z>1 nsf)                    True
NC3(z>1, not paid off, nsf)     True
NMEm                           False
NMEs                            True
Name: CUSIP, dtype: bool

These are what CUSIPs are duplicated.

In [15]:
data.groupby(by=['Label','CUSIP']).size().sort_values(ascending=False)

Label      CUSIP    
MEY        126671Y75    3
           126671Z74    3
MED        126671Y67    3
           1266712F2    3
           1266712B1    3
           1266712A3    3
MEY        126671Y83    3
MED        126671Z90    3
           126671Z82    3
MEY        126671Z66    3
           126671Z58    3
           126671Y91    3
           126671Z25    3
           126671Z33    3
           126671Z41    3
           576433QQ2    2
1.4        073879LU0    2
           073879LV8    2
MEY        576433QV1    2
FE         073879LT3    2
           57643MEZ3    2
IOpassMEY  57643MEJ9    2
1.4        57643MFA7    2
MEY        576433QS8    2
           576433QT6    2
           576433QU3    2
           576433QW9    2
           576433QR0    2
FE         576433RD0    2
MEY        576433RB4    2
                       ..
           61748HHG9    1
           61748HHF1    1
           61748HHE4    1
           61748HHD6    1
           61748HHC8    1
           61748HHB0    1
           61748H