In [2]:
import pandas as pd
import numpy as np
import csv

# KYC Preprocessing

The purpose of this notebook is to add occupation-specific indicators to the KYC data. Indicators are binary flags representing whether or not a client's occupation is part of a specific subset of occupations. These subsets were determined using Fintrac and financial crime recommendations. Mainly:

Per the [Fintrac Operational Alert](https://fintrac-canafe.canada.ca/intel/operation/oai-wildlife-eng), one indicator is "An individual is the owner, operator, employee or associated with an industry that could be used to facilitate illegal wildlife trade (e.g., import/export of goods, fisheries wholesaler, pet store, freight company, animal control)." (**G**)

The [Financial Crime Academy](https://financialcrimeacademy.org/wildlife-trade-risk-indicators-financial/?fbclid=IwAR1XSw09Vtl4mjOOQj_eTFuqZ_GKqM-SPsCJwQKcyFb-XWU4O6nO8zBo3JU) adds that another "indicator relates to activity involving politically exposed persons and wealthy businessmen/women, particularly those with environmental, game, or forestry oversight or environmental or wildlife-related businesses." (**G**)

We have therefore come up with the following indicators based on KYC occupation data:
- `occ_wealth` 
    - binary 
    - 1 if the client's occupation involves frequent exposure to wealthy people, 0 otherwise.
- `occ_animal` 
    - binary 
    - 1 if the client's occupation involves working with animals, 0 otherwise.
- `occ_int` 
    - binary 
    - 1 if the client works international trade, 0 otherwise.
- `occ_shipping`
    - binary
    - 1 if the client works in shipping/postal/cargo services
    - *there are no examples of this in our data*

In [1]:
# Adding the occupation-based indicators to raw data
raw = pd.read_csv('../../data/raw/kyc.csv')
occ = pd.read_csv('../../data/occupation_list.csv')

merged = raw.merge(occ, 'left', on='Occupation')

# Re-order columns
cols = list(merged.columns)
cols.append(cols.pop(cols.index('label')))
merged = merged[cols]

# Export
display(merged.head(3))
merged.to_csv('kyc_adj.csv', index=False)

NameError: name 'pd' is not defined