# How to...validate banking IDs

This notebook shows how to use the **financial-entity-cleaner.id.banking** module to validate ID's such as LEI, ISIN and SEDOL. You can use this module in three different ways:
1. [by validating string values, one by one](#string_values)
2. [by validating ID attributes as columns in a pandas dataframe](#df)
3. [by validating a csv file that contains ID column(s)](#csv)

No matter which approach you choose, you will need to import and create an object based on the **BankingIdCleaner()** class which is available in the **financial_entity_cleaner.id.banking** module. This notebook shows how you can customize the behaviour of this class to adapt the cleaning to your own needs.   

In [1]:
# Sets up the location of the financial-entity-cleaner library relative to this notebook 
import sys
sys.path.append('../../')

In [2]:
# Import the BankingIdCleaner() class for ID validation
from financial_entity_cleaner.id.banking import BankingIdCleaner

In [3]:
# Create an object based on BankingIdCleaner() class to perform validation over string values, dataframe or .csv file
id_cleaner_obj = BankingIdCleaner()

In [4]:
# Check the ID's supported by the library
id_cleaner_obj.get_id_types()

['lei', 'isin', 'sedol']

## 1. Basic usage

By default, the API assumes that the value passed as parameter is an ISIN code.
The API returns:
- None if the value is not a string or has no characters in it.
- True if the value is a valid ID of the specified type
- False if the value is not a valid ID of the specified type

In [6]:
id_cleaner_obj.id_type

'isin'

In [7]:
# Testing to clean up a value that is not a string
id_cleaner_obj.is_valid_id(12345)

TypeError: 'NoneType' object is not subscriptable

In [8]:
# Testing a valid ISIN code
id_cleaner_obj.is_valid_id('GB00B1YW4409')

True

In [9]:
# Testing an invalid ISIN code
id_cleaner_obj.is_valid_id('tttt0B1YW4409')

False

## 2. Working with other ID types (LEI and SEDOL)

In [10]:
id_cleaner_obj.id_type='lei'

In [11]:
# Testing a valid LEI code
id_cleaner_obj.is_valid_id('969500DPKGC9JE9F0820')

True

In [12]:
id_cleaner_obj.id_type='sedol'

In [13]:
# Testing a valid SEDOL code
id_cleaner_obj.is_valid_id('2595708')

True

The library throws an exception id the type is not supported.

In [14]:
id_cleaner_obj.id_type='test'

IndexError: tuple index out of range

## 3. Cleaning and Validating

The library can also be used for cleaning and validation at the same time. In this case, it will return a list of values where list[0] indicates if the id is valid or not and list[1] returns the cleaner ID.

In [15]:
id_cleaner_obj.id_type='lei'

In [17]:
# Cleaning a valid LEI code
clean_lei = id_cleaner_obj.get_clean_id('969500DPKGC9JE9F0820')
clean_lei

{'id_cleaned': '969500DPKGC9JE9F0820', 'id_validated': True}

By default, if the id is invalid get_clean_id() returns False

In [18]:
# Cleaning an invalid LEI code
clean_lei = id_cleaner_obj.get_clean_id('96XX00DPKGC9JE9F0820')
clean_lei

{'id_cleaned': '96XX00DPKGC9JE9F0820', 'id_validated': False}

If this behaviour is not required, set the parameter set_null_for_invalid=True

In [21]:
id_cleaner_obj.set_null_for_invalid_ids = False

In [22]:
# Cleaning an invalid LEI code
clean_lei = id_cleaner_obj.get_clean_id('96XX00DPKGC9JE9F0820')
clean_lei

{'id_cleaned': '96XX00DPKGC9JE9F0820', 'id_validated': False}

## 4. Cleaning a dataframe

In [None]:
import pandas as pd

In [None]:
input_filename = '../../tests/data/test_cleaner_ids.csv'

In [None]:
df_original = pd.read_csv(input_filename,sep=',',encoding='utf-8')

In [None]:
df_original

In [None]:
# Set up the resultant letter case
id_cleaner_obj.output_lettercase='upper'

In [None]:
id_cleaner_obj.id_type='lei'

In [None]:
df_cleaner = id_cleaner_obj.apply_cleaner_to_df(df_original, 'ID', 'clean', 'valid')

In [None]:
df_cleaner

In [None]:
# Not setting null for invalid ids
id_cleaner_obj.set_null_for_invalid_ids = False

In [None]:
df_cleaner = id_cleaner_obj.apply_cleaner_to_df(df_original, 'ID', 'clean', 'valid')

In [None]:
df_cleaner

## 5. Cleaning a csv file with AutoCleaner

In [None]:
# Import the module for normalizing country information
from financial_entity_cleaner.auto_cleaner import auto_cleaner

In [None]:
# Create an AutoCleaner object
auto_cleaner_obj=auto_cleaner.AutoCleaner()

In [None]:
input_filename = '../../tests/data/test_cleaner_ids.csv'

In [None]:
setup_cleaning_filename = '../../tests/data/test_cleaner_ids.json'

In [None]:
output_filename = '../../tests/data/test_cleaner_ids_result.csv'

In [None]:
auto_cleaner_obj.clean_csv_file(input_filename, setup_cleaning_filename, output_filename)