## Introduction

In this notebook, we will try to obtain the unique headers amongst all the csv files that we are going to merge. Then we will finally make a master comma seperated value file with list of headers that we obtain from this notebook

## Data Preprocessing

### Import the necessary libaries 

In [1]:
import pandas as pd
import numpy as np
import glob
import csv
import os

### Setting up parent directory and sub directory

In [2]:
parent_dir = "../Data/"
filelist = []

def makefilelist(parent_dir):
    headers = []
    csv_headers = []
    subject_dirs = [os.path.join(parent_dir, dir) for dir in os.listdir(parent_dir) if os.path.isdir(os.path.join(parent_dir, dir))]
    filelist = []
    for dir in subject_dirs:
        csv_files = [os.path.join(dir, csv) for csv in os.listdir(dir) if os.path.isfile(os.path.join(dir, csv)) and csv.endswith('.csv')]
        for file in csv_files:
            filelist.append(file)
    
    return filelist

### Read headers from CSV files

In [3]:
def readCSV(fileList):
    master_csv_headers = []
    for filename in fileList:
        with open(filename, 'r') as f:
            d_reader = csv.DictReader(f)
            headers = d_reader.fieldnames
            for header in headers:
                master_csv_headers.append(header)
                
    return master_csv_headers

In [4]:
filelist = makefilelist(parent_dir)
csv_headers = readCSV(filelist)

Now we will only use the unqiue headers that we might require for our master CSV

In [5]:
def uniqueHeaders(csv_headers):
    return list(set(csv_headers))

def chomp(list1):
    list1 = [x.replace('\n', '') for x in list1]
    return list1

Checking our unique header values and taking a look on the kind of values we have

In [6]:
final_headers = uniqueHeaders(csv_headers)
lol = chomp(final_headers)
print(lol)

['', 'UNREALIZED\xa0DEPRECIATION\xa0  ', 'Payments\xa0Received\xa0by\xa0Fund', '\ufeffReference Entity/Swap Counterparty', 'Reference Entity/Obligation', 'Notional Amount(1)', 'Rating of Reference Debt Obligation*', 'Notional amount\xa0', 'Premium(Received)Paid', 'Upfront Payments', 'Notional Amount1', 'Expiration Date', 'UNREALIZEDAPPRECIATION(DEPRECIATION)(000s)', 'Value($)', '(000 s)', 'Rate Paid by the Fund', 'Unrealized Depreciation ($)', 'Fixed Rate', '\xa0 NOTIONAL AMOUNT\xa0 \xa0 ', 'ImpliedCreditSpread\xa0atJune\xa030,2015(%)(d)', 'CreditSpread', 'Receive Fixed\xa0Rate', 'BUY/SELL\xa0 PROTECTION ', 'CurrentCreditSpread\xa0(10)', 'Notional Amount(c)', 'Payments made by the Fund', 'Bilateral Credit Default Swap Agreements (unrealized depreciation)', 'Termination\xa0 date\xa0', 'Underlying Debt\xa0Obligation', 'FixedRate', 'UnamortizedUp\xa0FrontPremiumPaid/(Received)', 'Notional  Amount   (000)', 'Maturity Date\xa0', 'Fixed         Payments         Made by         the Fund', 'Ex

In [7]:
for header in lol:
    if '0' in header:
        print(header)

UNREALIZEDAPPRECIATION(DEPRECIATION)(000s)
(000 s)
ImpliedCreditSpread atJune 30,2015(%)(d)
CurrentCreditSpread (10)
Notional  Amount   (000)
Notional Amount (000 s)
NOTIONALAMOUNT(000 S)
NotionalAmount(000 s)b
UnrealizedAppreciation(Depreciation)(000)
Notional amount (000)
ImpliedCreditSpread at06/30/20168
Value atMarch 31,2009(U.S. Dollars)
NotionalAmount(000)
Value(000)
AMOUNT (000 S)
NotionalAmount(000)(4)
PREMIUMSPAID/(RECEIVED)(000s)
NotionalAmount (000)
Notional Amount(000)
NOTIONALAMOUNT(000)
Notional Amount(000 s)
AMOUNT(000 S)
Amount (000)
Implied Credit Spread at December 31, 2016(b)
Upfront  Payments  (000)
ImpliedCreditSpread atJune 30,2014(%)(d)
NotionalAmount(000)
NOTIONALAMOUNT(000's)
Amount(000)
CreditReceivedat4/30/2015
UpfrontPaymentPaid(Received)(000)
Notional Amount (000s) 
Amount(000)
Credit Spread (10)
Amount(000s)
Notional Amount(000s)
ImpliedCreditSpread atJune 30,2014(%)(d)
(107,967
ImpliedCreditSpread at5/31/201510
Amount(000 s)
Notional Amount (000s)
AMOUNT 

In [26]:
def normalizeValues(filelist):
    for filename in filelist:
        with open(filename, 'r') as f:
            df = pd.read_csv(filename, error_bad_lines=False)
            headers = list(df)
            for header in headers:
                if '000' in header:
                    df[header] = df[header].astype(str) + '000'
            print (df)
            print("Filename", filename)        
            df.to_csv(filename, index=False, header=True)

In [27]:
normalizeValues(filelist)

      Counterparty\n  \nNotional amount \nTermination date  \
0  Deutsche\nBank AG            2110000           06/20/16   

  \nUnrealized appreciation(depreciation)  
0                                $152,115  
Filename ../Data/0000880943_N-CSRS_0001209286-07-000236/table2.csv
       Swap Counterparty and\n  Reference Obligation Buy/Sell\n  Protection  \
0  Barclays Capital\n  Dow Jones CDX North Americ...                    Buy   
1  Goldman Sachs\n  Dow Jones CDX North America I...                   Sell   
2  JPMorgan Chase Dow\n  Jones CDX North America ...                   Sell   
3  JPMorgan Chase Dow\n  Jones CDX North America ...                   Sell   
4  JPMorgan Chase Dow\n  Jones CDX North America ...                   Sell   
5  Barclays Capital\n  Dow Jones CDX North Americ...                    Buy   
6  Goldman Sachs\n  Dow Jones CDX North America I...                   Sell   

  Notional\n  Amount \n  (000) Pay/Receive\n  Fixed Rate Termination\n  Date  \
0      

Filename ../Data/0000927972_N-CSRS_0001193125-11-183930/table13.csv
    NotionalAmount ExpirationDate                  Counterparty  \
0          4020000        6/20/16  Credit Suisse Securities LLC   
1          1444564        7/25/45  Credit Suisse Securities LLC   
2         17000000       10/12/52           Goldman Sachs & Co.   
3          5000000       10/12/52           Goldman Sachs & Co.   
4          2500000       10/12/52           Goldman Sachs & Co.   
5          4175000       10/12/52           Goldman Sachs & Co.   
6          5100000       10/12/52           Goldman Sachs & Co.   
7          4825000       10/12/52           Goldman Sachs & Co.   
8          4825000       10/12/52           Goldman Sachs & Co.   
9          9650000       10/12/52           Goldman Sachs & Co.   
10         1625000        6/20/16         JP Morgan Chase Bank    
11         3000000        6/20/16         JP Morgan Chase Bank    

   Buy/SellProtection Receive(Pay)FixedRate        Deliverab

Filename ../Data/0000772129_N-CSRS_0001104659-09-062391/table57.csv
  \nNotional Amount \nExpiration Date       \nCounterparty \nReceive (Pay)  \
0      \n21,000,000       \n3/20/2014          \nDeutsche          \n(Pay)   
1      \n15,000,000       \n3/20/2019  \nDeutsche Bank AG        \nReceive   

   \nAnnual Premium  \nImplied Credit Spread (1) \nDeliverable on Default  \
0              1.70                         0.71           \nRepublic of    
1              1.66                         0.77      \nRepublic of Italy   

  \nMaximum Potential Amount of Future Payments by the Fund Under the Contract (2)  \
0                                              \nN/A                                 
1                                       \n15,000,000                                 

  \nMarket Value  
0     \n(960,257  
1    \n1,109,783  
Filename ../Data/0000772129_N-CSRS_0001104659-09-062391/table86.csv
  \nNotional Amount \nExpiration Date \nCounterparty \nReceive (Pay)  \
0      \n

0   5.000%  22,528  47,500     18,306  
Filename ../Data/0001214511_N-CSRS_0001193125-13-079747/table11.csv
  \nSWAPCOUNTERPARTY &REFERENCEOBLIGATION \nBUY/SELLPROTECTION  \
0              \nBarclays BankAlcoa, Inc.                \nBuy   
1            \nJPMorgan ChaseCDX.NA.IG.20               \nSell   
2            \nJPMorgan ChaseCDX.NA.IG.20               \nSell   
3      \nJPMorgan ChaseKohl's Corporation                \nBuy   

  \nNOTIONALAMOUNT(000)  \nINTERESTRATE \nTERMINATIONDATE  \
0                100000             1.0         \n9/20/18   
1                 45000             1.0         \n6/20/18   
2                200000             1.0         \n6/20/18   
3                125000             1.0         \n6/20/18   

  \nUNREALIZEDAPPRECIATION(DEPRECIATION) \nUPFRONTPAYMENTS   \nVALUE  \
0                                 \n(352          \n10,456  \n10,104   
1                                 \n(240             \n528     \n288   
2                                  \n17

    \nCounterparty\n             \nReference Entity\n  \
0  \nCitibank N.A.\n  \nCommonwealth of Puerto Rico\n   

  \nBuy/Sell Protection (9)\n \nCredit Spread (10)\n \nNotional Amount\n  \
0                     \nBuy\n              \n25.5%\n       \n1,810,000\n   

  \nFixed Rate (Annualized)\n \nTermination Date\n  
0                  \n5.000%\n         \n12/20/19\n  
Filename ../Data/0000897424_N-CSRS_0000891804-15-000102/table2.csv
   \nNotional amount \nTermination dates \nPayments made by the Fund  \
0       \n45,000,000          \n05/09/15                    \n2.3725   
1       \n20,000,000          \n05/19/16                    \n3.1850   
2       \n25,000,000          \n10/27/15                    \n2.2675   
3       \n20,000,000          \n12/19/15                    \n2.5600   
4       \n45,000,000           \n03/2/16                    \n2.5625   
5       \n10,000,000          \n07/05/16                     \n3.180   
6       \n67,000,000          \n09/18/16               

Filename ../Data/0001279014_N-CSRS_0001379491-08-000050/table2.csv
       \nAmount \n\t     \nCounterparty \n\t \nInterest Rate \n\t  \
0  \n$6,642,712 \n\t  \nLehman Brothers \n\t         \n4.94% \n\t   

         \nDate \n\t  
0  \n04/09/2008 \n\t  
Filename ../Data/0001279014_N-CSRS_0001379491-08-000050/table1.csv
                     \nCounterparty\n      \nReference Entity\n  \
0  \n\n    Bank of America, N.A. \n\n  \n    Sealed Air Corp.\n   

  \nBuy/Sell Protection\n  \nFixed Rate\n \nExpiration Date\n  \
0             \n    Buy\n            1.12    \n    03/20/18\n   

   \nAmount (000)\n  
0  \n    1,540\n000  
Filename ../Data/0000080832_N-CSRS_0000950123-09-038593/table3.csv
                          \nCounterparty\n         \nReference Entity\n  \
0        \n\n    Bank of America, N.A.\n\n       \n    Carnival Corp.\n   
1        \n\n    Bank of America, N.A.\n\n     \n    CenturyTel, Inc.\n   
2        \n\n    Bank of America, N.A.\n\n  \n    Toll Brothers, Inc.\n   
3   

13                              (17,200                 95000    77800  
Filename ../Data/0000927972_N-CSRS_0001193125-12-295005/table2.csv
                \n\nUnderlying Instrument\n\n  \
0              \n\n    Republic of Brazil\n\n   
1                \n\n    Brunswick Corp. \n\n   
2                \n\n    Brunswick Corp. \n\n   
3                \n\n    Brunswick Corp. \n\n   
4    \n\n    Capital One Financial Corp. \n\n   
5    \n\n    Capital One Financial Corp. \n\n   
6                   \n\n    Centex Corp. \n\n   
7                   \n\n    Centex Corp. \n\n   
8                    \n\n    FedEx Corp. \n\n   
9              \n\n    Gannett Co., Inc. \n\n   
10             \n\n    Gannett Co., Inc. \n\n   
11             \n\n    Gannett Co., Inc. \n\n   
12  \n\n    Marriott International, Inc. \n\n   
13                   \n\n    Masco Corp. \n\n   
14                   \n\n    Masco Corp. \n\n   
15    \n\n    Simon Property Group, Inc. \n\n   
16           \n\n    Toll B

Filename ../Data/0001137761_N-CSRS_0001387131-07-000089/table2.csv
                             \nUnderlying Instrument    Counterparty  \
0  \nCDX North America HighYield Index Swap Agree...  \nGOLDMANSACHS   

  \nMaturityDate  \nImplied Credit Spread at December 31, 2016(b)  \
0       12/20/21                                             3.56   

  \nNotional Amount(c)  \nFixed Deal Receive Rate     Value  \
0            5,000,000                         5.0  318,333   

  \nPremiums Paid/(Received) \nUnrealized Gain  
0                    160,875           157,458  
Filename ../Data/0000802716_N-CSRS_0001104659-17-015470/table2.csv
                             \nUnderlying Instrument    Counterparty  \
0  \nCDX North America HighYield Index Swap Agree...  \nGOLDMANSACHS   

  \nMaturityDate  \nImplied Credit Spread at December 31, 2016(b)  \
0       12/20/21                                             3.56   

  \nNotional Amount(c)  \nFixed Deal Receive Rate     Value  \
0         

b'Skipping line 4: expected 7 fields, saw 8\nSkipping line 5: expected 7 fields, saw 8\nSkipping line 6: expected 7 fields, saw 8\n'
b'Skipping line 4: expected 7 fields, saw 8\nSkipping line 5: expected 7 fields, saw 8\nSkipping line 6: expected 7 fields, saw 8\nSkipping line 7: expected 7 fields, saw 8\nSkipping line 8: expected 7 fields, saw 8\nSkipping line 9: expected 7 fields, saw 8\n'


  \nCounterparty/\nReference Entity\n \nNotional \nAmount\n  \
0           Bank of America N.A./CDX         \n$4,400,000\n   
1           Bank of America N.A./CDX         \n$4,400,000\n   

  \nBuy/Sell \nProtection\n \nInterest \nRate\n \nTermination \nDate\n  \
0                   \nBuy\n           \n1.00%\n            \n6/20/16\n   
1                   \nBuy\n           \n1.00%\n            \n6/20/16\n   

  \nPremiums \nPaid \n(Received)\n  \
0                     \n$(5,086)\n   
1                     \n$(5,086)\n   

  \nUnrealized \nAppreciation\n(Depreciation)\n  
0                                 \n$(26,109)\n  
1                                 \n$(26,109)\n  
Filename ../Data/0000100334_N-CSRS_0001437749-12-006716/combined.csv
  \nNotionalAmount \nExpirationDate  \
0      \n1,500,000      \n3/20/2010   

                                       \nDescription  \
0  \nAgreement with Lehman Brothers dated 3/15/20...   

  \nNet UnrealizedAppreciation  
0                     \n50,4

Filename ../Data/0000887210_N-CSRS_0001193125-16-643434/table2.csv
          Reference Obligation/ Counterparty Buy/Sell Credit Protection  \
0  North America High Yield 22/Deutsche Bank                       Sell   

  Average Credit Rating (a)  Fixed Deal Pay/Receive Rate Notional Value  \
0                        B+                          5.0     $1,980,000   

  Fair Value Upfront Premiums Paid Expiration Date Unrealized Appreciation  
0   $172,197              $141,250   June 20, 2019                  38,672  
Filename ../Data/0000356476_N-CSRS_0000356476-14-000228/table1.csv
                                                          Buy/Sell   \
Counterparty (Issuer)                Reference Entity   Protection    
Barclays Bank PLC                    CDX.NA.HY.15              Buy    
                                     CDX.NA.HY.15              Buy    
Morgan Stanley Capital Services Inc  CDX.NA.HY.15              Buy    
                                     CDX.NA.HY.15      

4           \n    370\n000  
Filename ../Data/0000005094_N-CSRS_0000950123-10-017042/table7.csv
                               Swap Reference Entity  \
0                      Capmark Financial Group, Inc.   
1  CDX North America Investment Grade Index, Seri...   
2                       Countrywide Home Loans, Inc.   
3                                         Inco Ltd.:   
4                                         Inco Ltd.:   
5                         Merrill Lynch & Co., Inc.:   
6                         Merrill Lynch & Co., Inc.:   
7                                     Vale Overseas:   
8                                     Vale Overseas:   
9                                                NaN   

                             Counterparty Buy/SellCredit Protection  \
0                  Goldman Sachs Bank USA                      Sell   
1                        Deutsche Bank AG                       Buy   
2   Morgan Stanley Capital Services, Inc.                      Sell   
3  

Filename ../Data/0001028621_N-CSRS_0001193125-09-248145/table10.csv
        \nSWAP COUNTERPARTY & REFERENCE OBLIGATION\n \nBUY/SELL PROTECTION\n  \
0  \n    Goldman Sachs International\n    Dow Jon...            \n    Sell\n   
1    \n    Barclays Bank Plc \n    Dow Jones Index\n            \n    Sell\n   

  \nAMOUNT (000 S)\n  \nINTEREST RATE\n       \nTERMINATION DATE\n  \
0             800000               0.35      \n    June 20, 2012\n   
1             700000               0.60  \n    December 20, 2012\n   

  \nUNREALIZED DEPRECIATION\n  
0             \n    (41,150\n  
1             \n    (19,366\n  
Filename ../Data/0000913534_N-CSRS_0000950123-09-000433/table2.csv
       Counterparty Reference\n  Obligation Buy/\nSell(1)   \
0  Credit\n  Suisse                    LCDX           Sell   

  (Pay)/\n  Receive\n  Rate Expiration\n  Date Notional\n  Amount  \
0                   (2.25%)         12/20/2012         35,000,000   

  Unrealized\n  Appreciation/\n  (Depreciation)  
0  