# RegTech Session: KYC, KYB, Sanctions, and Profiling

This notebook illustrates implementation of KYC, KYB, and Sanction regulation, using risk-based approach that integrates various risk indicators. We consider three risk indicators returning values $(r_1,r_2,r_3,r_4)$. We develop $n$ tools to estimate risk based on various data sources. 

Data sources:

- Sanction lists
- Blockchain address black lists
- Databases of corporate entities

Tools used to estimate risk:

- Private API
- Fuzzy Matching
- Country level risk assessment


In [92]:
#!pip install fuzzywuzzy
#!pip install Levenshtein 
import pandas as pd
import urllib3
import requests
from fuzzywuzzy import fuzz

Let us Consider a new client who is interested to become our customer:

In [93]:
clientData = {
    'name': 'Sukanto Tanoto',
    'residence': 'Indonesia',
    'blockchain_address': "0x104865E1987F25df47554F99ee304038eaae8888",
    'chain': "ETH-SEPOLIA"
}

## Risk Indicator 1: Private API for Blockchain address screening 

We will use [Circle API](https://api.circle.com) maintained by [Circle](https://www.circle.com/) to showcase development of a blockchain address screening tool that can be used as a risk indicator.

In [None]:
def check_blockchain_address(address, chain):
    url = "https://api.circle.com/v1/w3s/compliance/screening/addresses"
    headers = {
        "Content-Type": "application/json", 
        'Authorization': 'Bearer '
    }

    payload = {
        "idempotencyKey": "a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11",
        "address": address, #"0x104865E1987F25df47554F99ee304038eaae8888", #"0x33314ad8Cfd12Becb448B4Aaf4d5aE4Ca87e9999",
        "chain": chain #"ETH-SEPOLIA"
    }
    response = requests.post(url, json=payload, headers=headers)
    return response
        
response = check_blockchain_address(clientData['blockchain_address'], clientData['chain'])
response.json()

{'data': {'result': 'DENIED',
  'decision': {'ruleName': "Circle's Sanctions Blocklist",
   'actions': ['DENY', 'FREEZE_WALLET', 'REVIEW'],
   'reasons': [{'source': 'ADDRESS',
     'sourceValue': '0x33314ad8Cfd12Becb448B4Aaf4d5aE4Ca87e9999',
     'riskScore': 'BLOCKLIST',
     'riskCategories': ['SANCTIONS'],
     'type': 'OWNERSHIP',
     'signalSource': None}],
   'screeningDate': '2025-04-30T09:55:28Z'},
  'id': 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11',
  'address': '0x33314ad8Cfd12Becb448B4Aaf4d5aE4Ca87e9999',
  'chain': 'ETH-SEPOLIA',
  'details': [],
  'alertId': '664a53be-d225-4c6c-9ae2-0e07383e356d'}}

In [95]:
def assign_score_to_circle_response(response):
    response_data = response.json()['data']
    if 'decision' in response_data:
        if 'reasons' in response_data['decision']:
            riskScores = []
            for reason in response_data['decision']['reasons']:
                if 'riskScore' in reason:
                    riskScore = reason['riskScore']
                    if riskScore == 'BLOCKLIST' or riskScore == 'SEVERE':
                        riskScores.append(1.0)
                    elif riskScore == 'high':
                        riskScores(0.8)
    if len(riskScores) > 0:
        return max(riskScores)
    return 0.0

circle_response_score = assign_score_to_circle_response(response)
circle_response_score

1.0

## Risk Indicator 2: Sanction lists screening


### Relevant sources of sanction lists

* [Office of Foreign Assets Control (OFAC)](https://sanctionssearch.ofac.treas.gov/Details.aspx?id=13087)
* [EU sanction map](https://www.sanctionsmap.eu)
* [UK sanction list](https://www.gov.uk/government/publications/the-uk-sanctions-list)
* private sources, e.g., [www.opensanctions.org](https://www.opensanctions.org/)

Let us use one of these sources to download a list of sanctioned entities.

[See names.txt of this link](https://www.opensanctions.org/datasets/default/)

In [96]:
target_url = 'https://data.opensanctions.org/datasets/20250130/default/names.txt?v=20250130065302-gpf'

http = urllib3.PoolManager()
response = http.request('GET', target_url)
lines = response.data.decode('utf-8')
lines = lines.split('\n')

### Identity matching software

- **Levenshtein Distance:** This algorithm measures the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another.

- **Token Sort Ratio** is a string similarity metric that compares two texts by first splitting them into individual words (tokens), sorting these tokens alphabetically, and then recombining them into new strings. The similarity is then calculated using a standard ratio method, such as Levenshtein distance, on these sorted strings. This approach makes the metric insensitive to word order, so strings with the same words in different orders will score highly

In [97]:
def check_name(name, risk_threshold = 85):
  suspected_entities = []

  if name in lines:
    return [(name, 100, 'is in the list')]
  else:
    levenstein_ratio = partial_ratio = token_sort_ratio = 0.0
    for line in lines:

      levenstein_ratio = fuzz.ratio(name, line)
      if levenstein_ratio > risk_threshold:
        suspected_entities.append((line,levenstein_ratio,'lev-ratio'))

      token_sort_ratio = fuzz.token_sort_ratio(name, line, full_process=False)
      if token_sort_ratio > risk_threshold:
        suspected_entities.append((line,token_sort_ratio,'tok-ratio'))

    return suspected_entities

In [5]:
#here, enter name of your favorite dictator/terrorist/criminal
check_name('Kim Jong Un')

[('Jong Man Kim', 87, 'tok-ratio'),
 ('Kim Jong Man', 87, 'lev-ratio'),
 ('Kim Jong Man', 87, 'tok-ratio'),
 ('Kim Jong Eun', 87, 'lev-ratio'),
 ('Kim Džong Un', 87, 'lev-ratio'),
 ('Kim Džong Un', 87, 'tok-ratio'),
 ('Kim Dzong Un', 87, 'lev-ratio'),
 ('Kim Dzong Un', 87, 'tok-ratio'),
 ('Kim Yong Un', 91, 'lev-ratio'),
 ('Jong Un Kim', 100, 'tok-ratio'),
 ('Kim Jong Gun', 87, 'lev-ratio'),
 ('Kim Un Jon', 95, 'tok-ratio'),
 ('Kim Un Jong', 100, 'tok-ratio'),
 ('Kim Jung Un', 91, 'lev-ratio'),
 ('Kim Jung Un', 91, 'tok-ratio'),
 ('Un Jong Kim', 100, 'tok-ratio'),
 ('Un Gyong Kim', 87, 'tok-ratio')]

In [98]:
#here, enter name of someone who is probably not a terrorist
check_name('Peter Fratric')

[]

In [99]:
#decrease the risk threshold, and see you get more matches
check_name('Peter Fratric', risk_threshold = 75)

[('Peter Frick', 83, 'lev-ratio'),
 ('Peter Frick', 83, 'tok-ratio'),
 ('Peter Frølich', 77, 'lev-ratio'),
 ('Peter Frølich', 77, 'tok-ratio'),
 ('Frantisek Peter', 79, 'tok-ratio'),
 ('František Peter', 79, 'tok-ratio'),
 ('Peter Friedrich', 79, 'lev-ratio'),
 ('Peter Friedrich', 79, 'tok-ratio'),
 ('Friedrich, Peter', 76, 'tok-ratio'),
 ('Peter Frolich', 77, 'lev-ratio'),
 ('Peter Frolich', 77, 'tok-ratio'),
 ('Peter Forster', 77, 'lev-ratio'),
 ('Peter Forster', 77, 'tok-ratio'),
 ('Peter Ferraro', 77, 'lev-ratio'),
 ('Peter Ferraro', 77, 'tok-ratio'),
 ('Peter Francis', 77, 'lev-ratio'),
 ('Peter Francis', 77, 'tok-ratio'),
 ('Peter Ferrara', 77, 'lev-ratio'),
 ('Peter Ferrara', 77, 'tok-ratio'),
 ('Peter Fitzpatrick', 80, 'lev-ratio'),
 ('Peter Fitzpatrick', 80, 'tok-ratio'),
 ('Fitzpatrick, Peter', 77, 'tok-ratio'),
 ('Peter Gration', 77, 'lev-ratio'),
 ('Peter Gration', 77, 'tok-ratio')]

In [100]:
check_name(clientData['name'])

[]

## Risk Indicator 3: Intermediaries and ultimate beneficiaries 

Certain individuals can have complex links to various companies or off-shore trusts. This means that although an entity might not be on any sanction lists, they still might play a role as an intermediary or beneficiary as part of a structure where sanctioned companies are present. This indicates that a higher risk value should be assigned.


### Relevant Data

- [https://opencorporates.com](https://opencorporates.com) collects and updates information about the ownership structure of companies.
- database of The International Consortium of Investigative Journalists that is freely accessible at [https://offshoreleaks.icij.org/](https://offshoreleaks.icij.org/pages/database)

According to [https://offshoreleaks.icij.org/nodes/168118](https://offshoreleaks.icij.org/nodes/168118) Mr. Sukanto Tanoto is a beneficial owner of Pec-Tech Limited

In [54]:
check_name("Pec-Tech Limited")

[]

Pec-Tech limited is not a sanctioned entity. However, Secorp Limited and Trustcorp Limited are related entities to Pac-Tech.

In [53]:
check_name('Secorp Limited')

[('Secorp Limited', 100, 'is in the list')]

In [58]:
check_name("Trustcorp Limited")

[('Trustcorp Limited', 100, 'is in the list')]

Both of which are found on the sanction list.

> What risk value should we assign?

> How would you design the design a risk scoring function based on the entity relationships?

### Relevance for Secondary and Sectorial Sanctions

- **Sectoral sanctions**: If a country performs acts of military aggression and is sanctioned by the international community, sanctions on individual entities might not be sufficient, because any company in the country can contribute to the military aggression. Moreover, exports from the sanctioned country of certain products can be beneficial for the world, or we simply might not want to punish groups of people who do not have any direct influence. This requires to implement sectorial sanctions.

References: See https://ofac.treasury.gov/sanctions-programs-and-country-information/ukraine-russia-related-sanctions

- **Secondary sanctions**: A non-sanctioned, so-called intermediate or shell, entity can be created to import dual-use goods in a 3rd country and pass it to the sanctioned country. Under secondary sanctions, you may still end up persecuted. Hence, one needs to do due diligence on supply chains and customers of your customers.


## Risk Indicator 4: Country level risk indicators

Certain jurisdictions might be less trustworthy regarding implementation and enforcement of regulatory standards. This means certain jurisdictions need to be assigned with a higher risk value.

Reference: see e.g. [https://www.fatf-gafi.org/en/countries/black-and-grey-lists.html](https://www.fatf-gafi.org/en/countries/black-and-grey-lists.html)


In [101]:
def check_country_fatf_status(country):
    High_Risk = ["Democratic Republic of Korea", "Iran", "Myanmar"]
    Increased_Monitoring = ["Algeria", "Angola", "Bulgaria", "Burkina Faso", "Cameroon", "Côte d'Ivoire", "Croatia", "Democratic Republic of the Congo", "Haiti", "Kenya", "Lao PDR", "Lebanon", "Mali", "Monaco", "Mozambique", "Namibia", "Nepal", "Nigeria", "South Africa", "South Sudan", "Syria", "Tanzania", "Venezuela", "Vietnam", "Yemen"]
    if country in High_Risk:
        return 1.0
    elif country in Increased_Monitoring:
        return 0.5
    else:
        return 0.0

Alternatively, one can also consider different country level statistics related to corruption, arms trafficking, money laundering etc.

Reference: see e.g. [https://v-dem.net/](https://v-dem.net/)

We can use API of [ourworldindata.org](ourworldindata.org) to obtain the estimate of corruption index and calculate our risk score based on these values

In [102]:
# Fetch the data.
df = pd.read_csv("https://ourworldindata.org/grapher/political-corruption-index.csv?v=1&csvType=full&useColumnShortNames=true", storage_options = {'User-Agent': 'Our World In Data data fetch/1.0'})
#select year 2024
df = df.loc[df['Year'] == 2024]
# Fetch the metadata
metadata = requests.get("https://ourworldindata.org/grapher/political-corruption-index.metadata.json?v=1&csvType=full&useColumnShortNames=true").json()

In [110]:
def country_risk_from_corruption_index(df, country):
    if country not in df['Entity'].tolist():
        print("Warning: Country not found")
        return None
    corruption_index = df.loc[df['Entity'] == country]['corruption_vdem__estimate_best__aggregate_method_average']
    risk = float(corruption_index.values[0])
    return risk

country_risk_from_corruption_index(df, 'Belgium')

0.031

In [108]:
country_risk_from_corruption_index(df, clientData['residence'])

0.756

## Reporting: Briefly on Travel Rules

> Travel rules is a regulation that mandates the sharing of specific personal identifiable information (PII) between 
virtual Asset Providers (VASPs) when transferring digital assets. This regulation applies to both the originator VASP and the beneficiary VASP in any onchain transaction. **The goal is to prevent illegal activities like money laundering and terrorism financing by ensuring greater transparency in digital asset transfers.**

> VASPs must exchange information such as the name, address, and account details for the originator and the beneficiary of the transaction, ensuring regulatory compliance at every step of the transaction.

see Circle documentation by clicking this: [https://developers.circle.com/circle-mint/travel-rule-on-chain](https://developers.circle.com/circle-mint/travel-rule-on-chain) or by opening developers account.