* The total transactions amount shared between different financial institutions in different countries is **35.68 billion** over the period  of time (2000-2017).
* In **2013**, a higher amount (**7.87 billion**) got transferred between financial institutions in different countries.
* **Russia** is most money received country. Its **20.4%** of **35.68 billion** over a period of time (2000-2017).
  - Top 10 beneficiary countries
     - Russia
     - Latvia
     - Switzerland
     - Singapore
     - Hong Kong
     - United Kingdom
     - United States
     - Netherlands
     - Cayman Islands
     - Cyprus 
* In 2011, Russia received a higher amount **3.22 billion**.
* **Rosbank** received a highest amount **3.55 billion** over a period of time 2000-2017.
  - Top 10 beneficiary banks
      - Rosbank
      - Credit Suisse AG
      - Rigensis Bank AS
      - ING Netherland NV
      - Deutsche Bank AG -- London Branch
      - JPMorgan
      - Societe Generale Bank And Trust Singapore SA
      - Hong Kong And Shanghai Banking Corp
      - Bank Soyuz
      - Caledonian Bank Ltd
* **Amsterdam Trade Bank NV** transferred a highest amount **3.15 billion** over a period 2000-2017.
  - Top 10 originator banks
      - Amsterdam Trade Bank NV
      - AS Expobank
      - Deutsche Bank AG
      - ING Netherland NV
      - Rigensis Bank AS
      - Rosbank
      - JPMorgan Chase Bank
      - Gazprombank
      - Caledonian Bank Ltd
      - Societe Generale Private Banking


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import matplotlib.ticker as mtick
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler


In [None]:
data=pd.read_csv("/kaggle/input/the-fincen-files/transactions_map.csv",index_col='id')
data.dataframeName = 'transactions_map.csv'


In [None]:
data.shape

In [None]:
data.head()

**Fields description**
- id = transaction identification number generated by ICIJ
- icij_sar_id = SAR ID number generated by ICIJ that shows all the transactions that were extracted from the same report.
- filer_org_name_id = financial institution id that filed the report with FinCEN
- filer_org_name = financial institution that filed the report with FinCEN
- begin_date = date the first transaction in the reported transaction by the filer (set of transactions with the same originator and beneficiary) took place
- end_date = date the last transaction in the reported transaction by the filer (set of transactions with same originator and beneficiary) took place
- origsame originator and beneficiary) took place
- originator_bank_id = bank where the transaction (s) was originated
- originator_bank = bank where the transaction (s) was originated
- originator_bank_country = location country of the originator bank
- originator_iso = originator bank ISO code of the bank location country
- beneficiary_bank_id = bank where the transaction (s) was received
- beneficiary_bank = bank where the transaction (s) was received
- beneficiary_bank_country = location country of the beneficiary bank
- beneficiary_iso = beneficiary bank ISO code of the bank location country
- number_transactions = number of transactions
- amount_transactions = total amount of the transactions

Set up datetime format and currency format

In [None]:
# Total transaction amount 35.68 billion from 2000-2017 
data["amount_transactions"].sum()

In [None]:
fmt = '${x:,.0f}'
tick = mtick.StrMethodFormatter(fmt)
ncols=2
data['begin_date']=pd.to_datetime(data['begin_date'])
data['end_date']=pd.to_datetime(data['end_date'])
data['year']=data['begin_date'].dt.year

In [None]:
data.head()

In [None]:
data.isnull().sum(axis=0)
df = data
df.dropna().shape

There are many missing values in the column of 'number of suspicious transactions' (ST), so I used the occurrence of ST instead. The function can work with datasets that have the same structure and variable names.

In [None]:
def SumNumberYearlyST (data):
    labels_year, frequencies_year = np.unique(data['year'][~np.isnan(data['year'])], return_counts = True)
    #labels=labels_year.astype(int)
    plt.figure(figsize = (10,10))
    plt.plot(labels_year,frequencies_year)
    plt.title('Number of Suspicious Transactions per Year')
    plt.xticks(np.arange(min(labels_year), max(labels_year)+2, 2))
    plt.yticks(np.arange(min(frequencies_year)-1,max(frequencies_year),50))
    plt.xlabel('Year')
    plt.ylabel('Number of ST')
    plt.show()

In [None]:
SumNumberYearlyST (data)

# Analysis based on location of transactions

In [None]:
data["year"] = pd.to_datetime(data["begin_date"], errors='coerce').dt.year
temp = data.groupby(["year"])["amount_transactions"].sum().to_frame().reset_index()
temp["year"] = temp["year"].apply(lambda x: int(x))
temp["amount_transactions"] = temp["amount_transactions"].apply(lambda x: round(x/1000000000,2))
ax = temp.plot.bar(x="year", y="amount_transactions", figsize=(15,5), title="Transferred amount every year between financial institutions in different countries")
x_offset = -0.03
y_offset = 0.02
for p in ax.patches:
    b = p.get_bbox()
    val = b.y1 + b.y0        
    ax.annotate(val, ((b.x0 + b.x1)/2 + x_offset, b.y1 + y_offset))
del temp

In [None]:
#data["amount_transactions"] = data["amount_transactions"].apply(lambda x : round(x/1000000000.0,4))
country_total_received_amount = data.groupby(["beneficiary_bank_country"])["amount_transactions"].sum().to_frame().reset_index()
country_total_received_amount.sort_values(["amount_transactions"], ascending=False, inplace=True)
x = country_total_received_amount["beneficiary_bank_country"].tolist()[:10]
y = country_total_received_amount["amount_transactions"].tolist()[:10]
others_amount = sum(country_total_received_amount["amount_transactions"].tolist()[10:])
x.append("others")
y.append(others_amount)
del country_total_received_amount
fig1, ax1 = plt.subplots()
ax1.pie(y, labels=x, autopct='%1.1f%%',
        shadow=True, startangle=90)
# ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.show()

del x
del y

* **Russia** is most money received country. Its **20.4%** of **35.68 billion** over a period of time (2000-2017)

In [None]:
temp = data.loc[data["beneficiary_bank_country"]=="Russia"].groupby(["year"])["amount_transactions"].sum().to_frame().reset_index()
temp["year"] = temp["year"].apply(lambda x: int(x))
temp["amount_transactions"] = temp["amount_transactions"].apply(lambda x: round(x/1000000000,2))
ax = temp.plot.bar(x="year", y="amount_transactions", figsize=(15,5), title="Russia received amount every year")
x_offset = -0.03
y_offset = 0.02
for p in ax.patches:
    b = p.get_bbox()
    val = b.y1 + b.y0        
    ax.annotate(val, ((b.x0 + b.x1)/2 + x_offset, b.y1 + y_offset))
del temp

* In 2011, Russia received a higher amount **3.22 billion**

dire is the direction of transcations, either 'originator' or 'beneficiary', top (int) is the highest ST occurrence.

In [None]:
def NumberST(data, dire, top): 
    labels_org_con, frequencies_org_con =np.unique(data[dire+'_iso'], return_counts = True)
    count_sort_ind = np.argsort(-frequencies_org_con)
    plt.figure(figsize = (15,15))
    plt.bar(labels_org_con[count_sort_ind][:top],frequencies_org_con[count_sort_ind][:top])
    plt.title(f'Number of Suspicious Transactions in {dire} Country'.title())
    plt.yticks(np.arange(min(frequencies_org_con[count_sort_ind][:top])-20,max(frequencies_org_con),100))
    plt.xlabel(f'Top {top} {dire} Country'.title())
    plt.ylabel('Number of ST')
    plt.show()

In [None]:
NumberST(data,'originator', 5)    

The highest ST amount

In [None]:
def HighestST(data, dire, top):
    data1=(data.groupby([dire +'_iso'])['amount_transactions'].sum()).round(0).nlargest(top).reset_index()
    ax=data1.plot.bar(dire +'_iso', 'amount_transactions', fontsize=15,
                    figsize = (15,15),title=f'Amount of Suspicious Transactions in {dire} Country'.title(),
                    xlabel=f'Top {top} {dire} Country'.title(),ylabel='Amound of ST')
    ax.yaxis.set_major_formatter(tick) 

In [None]:
HighestST(data,'beneficiary',5 )

The yearly highest ST amount

In [None]:
##by transaction amount over year
def STyearly(data,dire, top):
    data1=(data.groupby([dire +'_iso'])['amount_transactions'].sum()).round(0).nlargest(top).reset_index()
    toplist=list(data1[dire+'_iso'])
    data2=data[['year', dire+'_iso','amount_transactions']].loc[data[dire+'_iso'].isin(toplist)]
    top_year=(data2.groupby([dire+'_iso','year'])['amount_transactions'].sum()).round(2).reset_index()
    top_year.set_index('year', inplace=True)
    top_year.index.astype(int)
    grouped = top_year.groupby(dire+'_iso')
    nrows = int(np.ceil(grouped.ngroups/ncols))
    fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(15,15), sharex=True, sharey=True)
    for (key, ax) in zip(grouped.groups.keys(), axes.flatten()):
        grouped.get_group(key).plot(ax=ax)
        ax.legend([key],loc='upper right', frameon=False)
        ax.yaxis.set_major_formatter(tick) 
    ax.xaxis.set_major_locator(MaxNLocator(integer=True)) 
    plt.show() 

In [None]:
STyearly(data,'beneficiary', 10)

Import geographical analysis packages

In [None]:
##worldmaps of occurrence##
import geopandas as gpd
import geoplot as gplt
import geoplot.crs as gcrs
import mapclassify as mc

Obtain world map

In [None]:
world = gpd.read_file(gplt.datasets.get_path('world'))
world = world[world.id !='-99']

Setting up central points for countries

In [None]:
world_points=world.copy()
world_points['centroid'] = world_points.centroid
world_points = world_points.set_geometry('centroid')

Generate multipoints for ST flow plots

In [None]:
from shapely.geometry import MultiPoint
country_sum=(data.groupby(['originator_iso','beneficiary_iso'])['amount_transactions'].agg(sum)/100000).reset_index()
map_network = world_points.merge(country_sum, left_on="id", right_on="originator_iso")
map_network = world_points.merge(map_network, left_on="id", right_on="beneficiary_iso")

Basic map showing sum ST by country

In [None]:
def mapST(data,dire):
    country_sum=data.groupby([dire+'_iso'])['amount_transactions'].agg(sum)/100000
    map_st = world.merge(country_sum, left_on="id", right_on=dire+'_iso')
    scheme = mc.UserDefined(map_st['amount_transactions'], bins=[2500, 5000, 10000, 50000])
    gplt.choropleth(
        map_st, hue='amount_transactions',
        edgecolor='lightgray', linewidth=1,
        cmap='rainbow', legend=True, legend_kwargs={'loc': 'lower left', 'fontsize':15},
        scheme=scheme, figsize=(15,15),
        legend_labels=['< $2,500 million','$2,500-5,000 million', '$5,000-10,000 million', '$10,000-50,000 million',
             '>$50,000 million'])
    plt.title("Sum of ST by Country between 2000 and 2017",fontsize=20)
    plt.show()

In [None]:
mapST(data,'beneficiary')

Flow map showing ST sending out and coming in a country

In [None]:
###ST Flows###    
def CountrySTFlow(data, dire, country, top):
    
    ###ST flowing in or out from a Country###
    data2=data[data[dire+'_iso']==country]
    data2['multi'] = [MultiPoint([x, y]) for x, y in zip(data2.centroid_y, data2.centroid_x)]
    data2=data2.set_geometry('multi')
    data2=data2.nlargest(top, columns=['amount_transactions'])
    scheme = mc.JenksCaspall(data2['amount_transactions'])
    lc=[f'${x:,.0f} million' for x in data2['amount_transactions']] 
    ax = gplt.sankey(data2, projection=gcrs.WebMercator(),
                hue='amount_transactions', scheme=scheme, cmap='Dark2', 
                legend=True,legend_kwargs={'loc': 'lower left', 'fontsize':10},
                legend_labels=lc, figsize=(12,12), linestyles=':')
    gplt.polyplot(world, ax=ax, facecolor='lightgray', edgecolor='white')
    plt.title(f"Sum of top {top} ST of {country} {dire.title()} between 2000 and 2017",fontsize=15)
    plt.show()

In [None]:
CountrySTFlow(map_network, 'beneficiary', 'USA', 5)

# Analysis based on banks

In [None]:
#data["amount_transactions"] = data["amount_transactions"].apply(lambda x : round(x/1000000000.0,4))
beneficiary_bank_received_amount = data.groupby(["beneficiary_bank"])["amount_transactions"].sum().to_frame().reset_index()
beneficiary_bank_received_amount.sort_values(["amount_transactions"], ascending=False, inplace=True)
beneficiary_bank_received_amount["amount_transactions"] = beneficiary_bank_received_amount["amount_transactions"].apply(lambda x : round(x/1000000000.0,4))
ax = beneficiary_bank_received_amount[:10].plot.bar(x="beneficiary_bank", y="amount_transactions", figsize=(15,5), title="Top 10 Beneficiary banks")
x_offset = -0.03
y_offset = 0.02
for p in ax.patches:
    b = p.get_bbox()
    val = b.y1 + b.y0        
    ax.annotate(val, ((b.x0 + b.x1)/2 + x_offset, b.y1 + y_offset))


* Rosbank received a highest amount **3.55 billion** over a period of time 2000-2017

In [None]:
originator_bank_received_amount = data.groupby(["originator_bank"])["amount_transactions"].sum().to_frame().reset_index()
originator_bank_received_amount.sort_values(["amount_transactions"], ascending=False, inplace=True)
originator_bank_received_amount["amount_transactions"] = originator_bank_received_amount["amount_transactions"].apply(lambda x : round(x/1000000000.0,4))
ax = originator_bank_received_amount[:10].plot.bar(x="originator_bank", y="amount_transactions", figsize=(15,5), title="Top 10 Originator banks")
x_offset = -0.03
y_offset = 0.02
for p in ax.patches:
    b = p.get_bbox()
    val = b.y1 + b.y0        
    ax.annotate(val, ((b.x0 + b.x1)/2 + x_offset, b.y1 + y_offset))

* **Amsterdam Trade Bank NV** transferred a highest amount **3.15 billion** over a period 2000-2017
