# ICIJ FinCEN Files Visualization

This notebook demonstrates GFQL (Graph Query Language) using the ICIJ FinCEN Files dataset.

Based on: https://hub.graphistry.com/docs/GFQL/gfql/

## Setup and Registration

First, import Graphistry and register with your credentials.

In [None]:
import graphistry

# graphistry.register(api=3, protocol="https", server="hub.graphistry.com",
#                     username="...", password="...")

## Download and Load Data

Download the ICIJ FinCEN Files dataset and create a graph.

In [10]:
import requests
import zipfile
import pandas as pd

# download data
resp = requests.get("https://media.icij.org/uploads/2020/09/download_data_fincen_files.zip")
with open("download_data_fincen_files.zip", "wb") as f:
    f.write(resp.content)
with zipfile.ZipFile("download_data_fincen_files.zip","r") as zip_ref:
    zip_ref.extract("download_transactions_map.csv")

# read csv into pandas dataframe and change type of time columns in data to datetime
df_e = pd.read_csv("download_transactions_map.csv")
df_e["begin_date"] = pd.to_datetime(df_e["begin_date"])
df_e["end_date"] = pd.to_datetime(df_e["end_date"])

# create graph
g = graphistry.edges(df_e, "originator_bank_id", "beneficiary_bank_id").materialize_nodes()

# rename id col in nodes to nodeId
df_n = g._nodes.rename(columns={'id': 'nodeId'})
g = g.nodes(df_n, 'nodeId')

## Node-list sample

View a sample of the nodes in the graph.

In [3]:
g._nodes.head()

Unnamed: 0,nodeId
0,cimb-bank-berhad
1,barclays-bank-plc-ho-uk
2,natwest-offshore
3,evrofinance-mosnarbank
4,latvian-trade-commercial-bank


## Edge-list sample

View a sample of the edges in the graph.

In [4]:
g._edges.head()

Unnamed: 0,id,icij_sar_id,filer_org_name_id,filer_org_name,begin_date,end_date,originator_bank_id,originator_bank,originator_bank_country,originator_iso,beneficiary_bank_id,beneficiary_bank,beneficiary_bank_country,beneficiary_iso,number_transactions,amount_transactions
0,223254,3297,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2015-03-25,2015-09-25,cimb-bank-berhad,CIMB Bank Berhad,Singapore,SGP,barclays-bank-plc-london-england-gbr,Barclays Bank Plc,United Kingdom,GBR,68.0,56898520.0
1,223255,3297,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2015-03-30,2015-09-25,cimb-bank-berhad,CIMB Bank Berhad,Singapore,SGP,barclays-bank-plc-london-england-gbr,Barclays Bank Plc,United Kingdom,GBR,118.0,116238400.0
2,223258,2924,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2012-07-05,2012-07-05,barclays-bank-plc-ho-uk,Barclays Bank Plc Ho UK,United Kingdom,GBR,skandinaviska-enskilda-banken-stockholm-sweden...,Skandinaviska Enskilda Banken,Sweden,SWE,,5000.0
3,223259,2924,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2012-06-20,2012-06-20,barclays-bank-plc-ho-uk,Barclays Bank Plc Ho UK,United Kingdom,GBR,skandinaviska-enskilda-banken-stockholm-sweden...,Skandinaviska Enskilda Banken,Sweden,SWE,,9990.0
4,223260,2924,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2012-05-31,2012-05-31,barclays-bank-plc-ho-uk,Barclays Bank Plc Ho UK,United Kingdom,GBR,skandinaviska-enskilda-banken-stockholm-sweden...,Skandinaviska Enskilda Banken,Sweden,SWE,,12000.0


## Visualization Setup

Create a color palette for visualizing the graph.

In [5]:
def nonlinear_palette_generator(v, linear_palette):
    out = []
    num_palette = len(linear_palette)
    num_palette_repetitions = [round(n**v) + 1 for n in range(num_palette)]
    if v < 1:
        num_palette_repetitions.reverse()
    for i, color in enumerate(linear_palette):
        out = out + [color for _ in range(num_palette_repetitions[i])]
    return out

palette = ["#46327e", "#365c8d", "#277f8e", "#1fa187", "#4ac16d", "#a0da39", "#fde724"]

Visualize full graph with encodings.

In [6]:
# render graph of entire dataset
g_out = g.bind(edge_label="amount_transactions")
g_out = g_out.encode_edge_color("amount_transactions",
                                nonlinear_palette_generator(1.2, palette),
                                as_continuous=True)
g_out = g_out.settings(
    height=800,
    url_params={
        "pointOpacity": 0.6 if len(g_out._nodes) > 1500 else 0.9,
        "edgeOpacity": 0.3 if len(g_out._edges) > 1500 else 0.9,
        "play": 2000})
g_out.plot()

## Caribbean havens subgraph

Find transactions involving Caribbean tax havens using GFQL chain operations.

In [6]:
from graphistry.compute.predicates.is_in import is_in
from graphistry.compute.ast import n, e_forward

carib_havens = ["British Virgin Islands", "Cayman Islands", "Bahamas"]

### Outgoing transactions from Caribbean havens

In [11]:
chain_operations = [
    n(name="is_carib_bank_origin"),
    e_forward(hops=1, edge_match={"originator_bank_country": is_in(options=carib_havens)}),
]
g_carib_out = g.gfql(chain_operations)

In [10]:
g_carib_out._edges.head()

Unnamed: 0,id,icij_sar_id,filer_org_name_id,filer_org_name,begin_date,end_date,originator_bank_id,originator_bank,originator_bank_country,originator_iso,beneficiary_bank_id,beneficiary_bank,beneficiary_bank_country,beneficiary_iso,number_transactions,amount_transactions
0,225260,3213,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2013-07-18,2013-07-23,caledonian-bank-limited,Caledonian Bank Limited,Cayman Islands,CYM,merrill-lynch-new-york-ny-usa,Merrill Lynch,United States,USA,2.0,985000.0
1,225261,3213,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2013-08-16,2013-08-16,caledonian-bank-limited,Caledonian Bank Limited,Cayman Islands,CYM,t-bank-julius-baer-and-co-ag-zurich-switzerlan...,T Bank Julius Baer And Co. AG,Switzerland,CHE,1.0,500000.0
2,225262,3213,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2013-08-28,2013-09-05,caledonian-bank-limited,Caledonian Bank Limited,Cayman Islands,CYM,barclays-capital-inc-new-york-usa-usa,Barclays Capital Inc,United States,USA,2.0,450000.0
3,225264,3213,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2013-03-21,2013-07-31,gonet-bank-and-trust-limited,Gonet Bank And Trust Limited,Bahamas,BHS,caledonian-bank-limited-georgetown-cayman-isla...,Caledonian Bank Limited,Cayman Islands,CYM,2.0,400000.0
4,225883,4076,the-northern-trust-company,The Northern Trust Company,2015-02-04,2015-02-04,dms-bank-trust-ltd,DMS Bank & Trust Ltd,Cayman Islands,CYM,hsbc-hong-kong-hkg,HSBC,Hong Kong,HKG,1.0,101000.0


### Incoming transactions to Caribbean havens

In [12]:
chain_operations = [
    e_forward(hops=1, edge_match={"beneficiary_bank_country": is_in(options=carib_havens)}),
    n(name="is_carib_bank_destination")
]
g_carib_in = g.gfql(chain_operations)

In [13]:
g_carib_in._edges.head()

Unnamed: 0,id,icij_sar_id,filer_org_name_id,filer_org_name,begin_date,end_date,originator_bank_id,originator_bank,originator_bank_country,originator_iso,beneficiary_bank_id,beneficiary_bank,beneficiary_bank_country,beneficiary_iso,number_transactions,amount_transactions
0,225255,3213,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2013-08-27,2013-08-27,pictet-and-cie,Pictet And Cie,Switzerland,CHE,caledonian-bank-limited-georgetown-cayman-isla...,Caledonian Bank Limited,Cayman Islands,CYM,1.0,300015.0
1,225256,3213,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2013-08-16,2013-08-16,banco-de-santander-sa,Banco De Santander S.A.,Uruguay,URY,caledonian-bank-limited-georgetown-cayman-isla...,Caledonian Bank Limited,Cayman Islands,CYM,1.0,200000.0
2,225257,3213,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2013-08-01,2013-09-06,jpmorgan-chase-bank-na,JPMorgan Chase Bank Na,United States,USA,caledonian-bank-limited-georgetown-cayman-isla...,Caledonian Bank Limited,Cayman Islands,CYM,3.0,388000.0
3,225263,3213,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2013-07-24,2013-07-25,emirates-nbd-bank-pjsc,Emirates Nbd Bank PJSC,United Arab Emirates,ARE,caledonian-bank-limited-georgetown-cayman-isla...,Caledonian Bank Limited,Cayman Islands,CYM,2.0,200000.0
4,225264,3213,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2013-03-21,2013-07-31,gonet-bank-and-trust-limited,Gonet Bank And Trust Limited,Bahamas,BHS,caledonian-bank-limited-georgetown-cayman-isla...,Caledonian Bank Limited,Cayman Islands,CYM,2.0,400000.0


### Visualize Caribbean transactions

Encode the visualization with colors based on transaction amounts.

In [None]:
g_carib = graphistry.edges(pd.concat([g_carib_in._edges, g_carib_out._edges], ignore_index=True), 
                           "originator_bank_id", "beneficiary_bank_id").materialize_nodes()

g_carib_styled = (
    g_carib
    .encode_edge_color('amount_transactions',
                      palette=nonlinear_palette_generator(1.05, palette),
                      as_continuous=True)
    .settings(url_params={'play': 2000})
)

g_carib_styled.plot()

## Single transaction subgraph

Find a specific transaction pattern: Latvia to Russia transactions in a specific amount range.

In [13]:
from graphistry import contains

chain_operations = [
    e_forward(hops=1, edge_match={"originator_bank_country": "Latvia", "beneficiary_bank_country": "Russia"}),
    n({"nodeId": contains(pat="")}, name="is_rus_beneficiary"),
]
g_lva_rus = g.gfql(chain_operations)

In [16]:
g_lva_rus._edges.head()

Unnamed: 0,id,icij_sar_id,filer_org_name_id,filer_org_name,begin_date,end_date,originator_bank_id,originator_bank,originator_bank_country,originator_iso,beneficiary_bank_id,beneficiary_bank,beneficiary_bank_country,beneficiary_iso,number_transactions,amount_transactions
0,223935,2359,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2011-02-10,2011-02-10,latvian-trade-commercial-bank,Latvian Trade Commercial Bank,Latvia,LVA,transcredit-bank-moscow-russia-rus,Transcredit Bank,Russia,RUS,1.0,3485.0
1,223936,2359,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2011-02-03,2011-02-03,latvian-trade-commercial-bank,Latvian Trade Commercial Bank,Latvia,LVA,vnesheconombank-moscow-russia-rus,Vnesheconombank,Russia,RUS,1.0,599088.0
2,223937,2359,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2011-02-02,2011-02-02,latvian-trade-commercial-bank,Latvian Trade Commercial Bank,Latvia,LVA,vnesheconombank-moscow-russia-rus,Vnesheconombank,Russia,RUS,1.0,1400000.0
3,223939,2359,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2011-02-28,2011-02-28,latvian-trade-commercial-bank,Latvian Trade Commercial Bank,Latvia,LVA,ojsc-nomos-bank-moscow-russia-rus,Ojsc 'Nomos-Bank',Russia,RUS,1.0,205.48
4,223940,2359,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2011-02-28,2011-02-28,latvian-trade-commercial-bank,Latvian Trade Commercial Bank,Latvia,LVA,ojsc-nomos-bank-moscow-russia-rus,Ojsc 'Nomos-Bank',Russia,RUS,1.0,790753.42


### Find Oleg Deripaska's transaction

Use contains() and between() predicates to find a specific transaction pattern.

In [14]:
from graphistry import between, contains

chain_operations = [
    n({"nodeId": contains(pat="expo")}),
    e_forward(hops=1, edge_match={
        "originator_bank_country": "Latvia",
        "beneficiary_bank_country": "Russia",
        "amount_transactions": between(15800000, 16000000)}),
    n({"nodeId": contains(pat="soyuz")}, name="is_soyuz")
]
g_od = g.gfql(chain_operations)

In [15]:
g_od._edges

Unnamed: 0,id,icij_sar_id,filer_org_name_id,filer_org_name,begin_date,end_date,originator_bank_id,originator_bank,originator_bank_country,originator_iso,beneficiary_bank_id,beneficiary_bank,beneficiary_bank_country,beneficiary_iso,number_transactions,amount_transactions
0,239549,2718,the-bank-of-new-york-mellon-corp,The Bank of New York Mellon Corp.,2013-08-15,2013-08-15,as-expobank,AS Expobank,Latvia,LVA,bank-soyuz-moscow-russia-rus,Bank Soyuz,Russia,RUS,1.0,15900000.0


### Visualize the specific transaction

In [None]:
g_od_styled = (
    g_od
    .encode_edge_color('amount_transactions',
                      palette=nonlinear_palette_generator(1.05, palette),
                      as_continuous=True)
    .settings(url_params={'play': 2000})
)

g_od_styled.plot()

In [None]:
g_od_styled = (
    g_od
    .encode_edge_color('amount_transactions',
                      palette=nonlinear_palette_generator(1.05, palette),
                      as_continuous=True)
    .settings(url_params={'play': 2000})
)

g_od_styled.plot()

## In closing

This notebook demonstrated the key GFQL operations:

- **Node filtering**: `n()` with attribute matching and predicates
- **Edge traversal**: `e_forward()` with hop counts and edge matching
- **Chaining operations**: `graphistry.Chain()` to combine multiple operations
- **Predicates**:
  - `is_in()` for matching multiple values
  - `contains()` for substring matching
  - `between()` for numeric range filtering

### Additional Resources

- [Python GFQL API Documentation](https://pygraphistry.readthedocs.io/en/latest/graphistry.compute.html#chain)
- [GFQL REST API Documentation](https://hub.graphistry.com/docs/GFQL/gfql-api/)
- [How GFQL Chain Works](https://hub.graphistry.com/docs/GFQL/)