# RegTech Challenge

This notebook reads Reverse Repo json file data, downloads the Investment Firm Regulation and implements
a calculator in Python.

The task is to get a rough understanding of the objectives of the Investment Firm Regulation and implement
a calculator in Python.

For the purpose of this exercise, we will limit the implementation to the Trading Counterparty Default Risk
(K-TCD) for a Reverse Repo transaction.

It is not necessary any resource other than the definitions in Chapter 4 Section 1 "Trading Counterparty default" in the
Investment Firm Regulation (Articles 26 to 32). <a href="https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32019R2033">IFR link.</a>

The code should accept as an input the two legs of the SFT as a JSON file (.json) that conforms to the
securities.json schema of the FIRE Data Format and return the K-TCD for that transaction.

If you have already installed these libraries, comment these lines:

In [9]:
# If you have already installed these libraries, comment these lines:
!pip install numpy
!pip install pdfplumber



import libraries

In [10]:
# import libraries
import os, json
import urllib.request
import numpy as np
import pdfplumber

define functions

In [11]:
# define functions
def download_file(download_url, filename):
    """
    Download PDF from url and save in current directory
    """
    try:
        response = urllib.request.urlopen(download_url)    
        file = open(filename + ".pdf", 'wb')
        file.write(response.read())
        file.close()
        print("\n PDF downloaded from url")
        return response.status
    except:
        print("\n Error while downloading the PDF from url")
        return response.status
        pass

def extract_text_fromPDFReader(pdf):
    '''
    this function returns the text from the pdf
    '''
    txt = []
    for i in range(len(pdf.pages)):
        
        # creating a page object
        pageObj = pdf.pages[i]
     
        # extracting text from page
        txt.append(pageObj.extract_text())
    return txt

def extract_text_fromPDFReader_headerless(pdf):
    '''
    this function returns the text from the pdf removing the header contained in the first line
    '''
    txt_headerless = []
    for i in range (len(pdf.pages)):
        
        # creating a page object
        pageObj = pdf.pages[i]
        
        tx = pageObj.extract_text()
        tx = tx.split('\n')
        tx = tx[1:]
        tx = '\n'.join(tx)
        txt_headerless.append(tx)
    return txt_headerless

def get_dict_from_regulation(flat_headerless_text):
    '''
    This function returns a dictionary ordering the parts, titles, chapters, articles and points
    from the regulation written in the PDF.
    '''
    # adjust Articles' syntax
    for i in range(1,66):
        flat_headerless_text=flat_headerless_text.replace('\nArticle {} \n'.format(i),'\nArticle-{} \n'.format(i))
        
    # adjust Points' syntax
    for i in range(1,20):
        flat_headerless_text=flat_headerless_text.replace('\n{}. '.format(i),'\n-{}. '.format(i))

    # Parts
    parts = flat_headerless_text.split(" \nPART ")
    parts = { i : parts[i] for i in range(0, len(parts) ) }
    for i in parts.keys():
        parts[i]={ 'text' : parts[i]}

    # Titles
    for i in parts.keys():
        if ' \nTITLE ' in parts[i]['text']:
            parts[i]['titles'] = { j : parts[i]['text'].split(" \nTITLE ")[j] for j in range(0, len(parts[i]['text'].split(" \nTITLE ")) ) }
        else:
            parts[i]['titles'] = {}
    for i in parts.keys():
        for j in parts[i]['titles'].keys():
            parts[i]['titles'][j]={ 'text' : parts[i]['titles'][j]}
            
    # Chapters
    for i in parts.keys():
        if len(parts[i]['titles'])>0:
            for k in parts[i]['titles'].keys():
                if ' \nCHAPTER ' in parts[i]['titles'][k]['text']:
                    parts[i]['titles'][k]['chapters'] = { j : parts[i]['titles'][k]['text'].split(" \nCHAPTER ")[j] for j in range(0, len(parts[i]['titles'][k]['text'].split(" \nCHAPTER ")) ) }
                else:
                    parts[i]['titles'][k]['chapters'] = {}
        else:
            parts[i]['titles']['chapters'] = {} 
    for i in parts.keys():
        for j in parts[i]['titles'].keys():
            if (len(parts[i]['titles']) > 0) and (len(parts[i]['titles'].keys()) > 1):
                for k in parts[i]['titles'][j]['chapters'].keys():
                    parts[i]['titles'][j]['chapters'][k]={ 'text' : parts[i]['titles'][j]['chapters'][k]}
                    
    # Articles
    for i in parts.keys():
        # if part but no titles
        if len(parts[i]['titles']) < 2:
            lst = { j : parts[i]['text'].split(" \nArticle-")[j] for j in range(0, len(parts[i]['text'].split(" \nArticle-")) ) }
            
            if len(lst)>1:
                parts[i]['titles']['articles'] = lst
            else:
                parts[i]['titles']['articles'] = {}

        else:
            for k in parts[i]['titles'].keys():
                # if part and titles but no chapters        
                if parts[i]['titles'][k]['chapters'] == {}:
                    lst = { j : parts[i]['titles'][k]['text'].split(" \nArticle-")[j] for j in range(0, len(parts[i]['titles'][k]['text'].split(" \nArticle-")) ) }
                    
                    if len(lst)>1:
                        parts[i]['titles'][k]['articles'] = lst
                    else:
                        parts[i]['titles'][k]['articles'] = {}
                else:
                    # if part and titles and chapters 
                    for l in parts[i]['titles'][k]['chapters'].keys():
                        lst = { j : parts[i]['titles'][k]['chapters'][l]['text'].split(" \nArticle-")[j] for j in range(0, len(parts[i]['titles'][k]['chapters'][l]['text'].split(" \nArticle-")) ) }
                        
                        if len(lst)>1:
                            parts[i]['titles'][k]['chapters'][l]['articles'] = lst
                        else:
                            parts[i]['titles'][k]['chapters'][l]['articles'] = {}
    for i in parts.keys():
        if ('articles' in parts[i]['titles'].keys()) and (parts[i]['titles']['articles'] != {}):
            for k in parts[i]['titles']['articles'].keys():
                parts[i]['titles']['articles'][k]={ 'text' : parts[i]['titles']['articles'][k]}
        else:
            for j in parts[i]['titles'].keys():
                if ('articles' not in parts[i]['titles'].keys()) and ('articles' in parts[i]['titles'][j]) and (parts[i]['titles'][j]['articles'] != {}):
                    for k in parts[i]['titles'][j]['articles'].keys():
                        parts[i]['titles'][j]['articles'][k]={ 'text' : parts[i]['titles'][j]['articles'][k]}
                elif ('articles' not in parts[i]['titles'].keys()) and ('articles' not in parts[i]['titles'][j]) and (parts[i]['titles'][j]['chapters'] != {}):
                    for k in parts[i]['titles'][j]['chapters'].keys():
                        if parts[i]['titles'][j]['chapters'][k]['articles'] != {}:
                            for l in parts[i]['titles'][j]['chapters'][k]['articles'].keys():
                                parts[i]['titles'][j]['chapters'][k]['articles'][l]={ 'text' : parts[i]['titles'][j]['chapters'][k]['articles'][l]}
                         
    # Points
    for i in parts.keys():
        # if part but no titles
        if ('articles' in parts[i]['titles'].keys()) and (parts[i]['titles']['articles'] == {}):
            parts[i]['titles']['points'] = {}
        elif ('articles' in parts[i]['titles'].keys()) and (parts[i]['titles']['articles'] != {}):
            for a in parts[i]['titles']['articles']:
                lst = { j : parts[i]['titles']['articles'][a]['text'].split(" \n-")[j] for j in range(0, len(parts[i]['titles']['articles'][a]['text'].split(" \n-")) ) }
                
                if len(lst)>1:
                    parts[i]['titles']['articles'][a] = lst
                else:
                    parts[i]['titles']['articles'][a] = {}
        
        else:
            for k in parts[i]['titles'].keys():
                # if part and titles but no chapters        
                if parts[i]['titles'][k]['chapters'] == {}:
                    if parts[i]['titles'][k]['articles'] == {}:
                        parts[i]['titles'][k]['points'] = {} 
                    else:
                        for a in parts[i]['titles'][k]['articles']:
                            lst = { j : parts[i]['titles'][k]['articles'][a]['text'].split(" \n-")[j] for j in range(0, len(parts[i]['titles'][k]['articles'][a]['text'].split(" \n-")) ) }
                            
                            if len(lst)>1:
                                parts[i]['titles'][k]['articles'][a]['points'] = lst
                            else:
                                parts[i]['titles'][k]['articles'][a]['points'] = {}
                else:
                    # if part and titles and chapters 
                    for l in parts[i]['titles'][k]['chapters'].keys():
                        if parts[i]['titles'][k]['chapters'][l]['articles'] == {}:
                            parts[i]['titles'][k]['chapters'][l]['points'] = {} 
                        else:
                            for a in parts[i]['titles'][k]['chapters'][l]['articles']:
                                lst = { j : parts[i]['titles'][k]['chapters'][l]['articles'][a]['text'].split(" \n-")[j] for j in range(0, len(parts[i]['titles'][k]['chapters'][l]['articles'][a]['text'].split(" \n-")) ) }
                            
                                if len(lst)>1:
                                    parts[i]['titles'][k]['chapters'][l]['articles'][a]['points'] = lst
                                else:
                                    parts[i]['titles'][k]['chapters'][l]['articles'][a]['points'] = {}
    
    return parts

def get_risk_factor(issuer_type,rf1,rf2,rf3):
    if issuer_type in rf1 or issuer_type in rf2:
        rf=0.016
    elif issuer_type in rf3:
        rf=0.08
    else:
        rf=np.nan
        print("\n Issuer type not found")
    return rf

<li> 0. Parameters </li>

In [12]:
# regulatory documentation, pdf path
pdf_path = "https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32019R2033"
pdf_name = "Regulation"
pdf_name_backup = "CELEX_32019R2033_EN_TXT"
rf1=['central_bank','central_govt']
rf2=['investment_firm','credit_institution']
rf3=['other','other_financial']

<li>1. Read json data</li>

In [13]:
# directory pointing to the directory where to read the json file
path_to_json = os.getcwd()

# name of the json file with the data
data_json = 'data.json'

f=open(path_to_json + "\\" + data_json)

jl=json.load(f)

for i in range(len(jl['data'])):
    jl[jl['data'][i]['movement']]=jl['data'][i]

jl.pop('data')

[{'id': 'rev_repo_cash_leg',
  'date': '2021-06-01T00:00:00Z',
  'currency_code': 'GBP',
  'end_date': '2021-07-01T00:00:00Z',
  'balance': -1500,
  'movement': 'cash',
  'sft_type': 'rev_repo',
  'start_date': '2021-06-01T00:00:00Z',
  'type': 'cash',
  'trade_date': '2021-07-01T00:00:00Z',
  'customer': {'type': 'regional_govt'}},
 {'id': 'rev_repo_asset_leg',
  'date': '2021-06-01T00:00:00Z',
  'currency_code': 'GBP',
  'end_date': '2021-07-01T00:00:00Z',
  'mtm_dirty': 1400,
  'movement': 'asset',
  'sft_type': 'rev_repo',
  'start_date': '2021-06-01T00:00:00Z',
  'type': 'bond',
  'trade_date': '2021-07-01T00:00:00Z',
  'customer': {'type': 'regional_govt'},
  'issuer': {'type': 'central_govt'}}]

<li>2. Download Regulation file</li>

In [14]:
resp=download_file(pdf_path, pdf_name)
    
# creating an object 
if resp == 200:
    file = open(pdf_name + ".pdf", 'rb')
else:
    file = open(pdf_name_backup + ".pdf", 'rb')


 PDF downloaded from url


<li>3. Read (extract) text from PDF</li>

In [15]:
# creating a pdf reader object
pdf = pdfplumber.open(file)

# printing number of pages in pdf file
print("\n number of pages in pdf file: ",len(pdf.pages))

#extract all text
txt=extract_text_fromPDFReader(pdf)

#remove header of each page to keep only text
txt_headerless=extract_text_fromPDFReader_headerless(pdf)

flat_headerless_text='\n'.join(txt_headerless)   

# dictionary of the regulation
d = get_dict_from_regulation(flat_headerless_text)


 number of pages in pdf file:  63


<li>4. Calculating K-TCD</li>

In [17]:
# Article 26        
alpha=1.2

# Table 2, risk factor
rf = get_risk_factor(jl['asset']['issuer']['type'], rf1, rf2, rf3)

# Article 27, replacement cost
rc = -jl['cash']['balance']
# potential future exposure
pfe = 0

# Article 30
adjustment = 0.00707
C = (1-adjustment)*jl['asset']['mtm_dirty']

# Article 27, exposure value
EV = max(0,rc+pfe-C)

#Article 32, Credit valuation adjustment
cva = 1

# Article 26, Own funds requirement

own_funds_requirement = alpha*EV*rf*cva

print("\n Own Funds Requirement for ",data_json," is ",round(own_funds_requirement,2),jl['asset']['currency_code'])
print ("\n End")


 Own Funds Requirement for  data.json  is  2.11 GBP

 End
