<div style="width: 30%; float: right; margin: 10px; margin-right: 5%;">
    <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/FHNW_Logo.svg/2560px-FHNW_Logo.svg.png" width="500" style="float: left; filter: invert(50%);"/>
</div>


<h1 style="text-align: left; margin-top: 10px; float: left; width: 60%;">
    SAN Projekt:<br> Schweizer Offshore Firmen
</h1>


<p style="clear: both; text-align: left;">
    Bearbeitet durch Florin Barbisch, Gabriel Torres Gamez und Tobias Buess im FS 2024.
</p>

Wir führen eine Voranalyse für das Bundesamt für Statistik durch, um die kürzlich aufgetretenen Leaks aus den Offshore Papers zu untersuchen. 


Diese Analyse zielt darauf ab, Umfang und Natur der Verbindungen in Schweizer Offshore-Strukturen zu ermitteln. Wir verwenden dafür Daten aus der [Offshore Leaks Database](https://offshoreleaks.icij.org/), um mögliche Muster, wichtige Personen aufzudecken, die für die Steuerbehörden oder Regulierungsorgane von Interesse sein könnten. 


Unsere Arbeit umfasst eine detaillierte Prüfung der betroffenen Entitäten. Dies wird es dem Bundesamt für Statistik ermöglichen, fundierte Entscheidungen zur weiteren Untersuchung und möglichen Massnahmen zu treffen.

## Imports und Einstellungen

In [1]:
# Python internal modules
import os

# Project modules
import utils

# External modules
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib
import matplotlib.pyplot as plt

BACKEND = None  # use default
if "cugraph" in nx.utils.backends.backends.keys():
    import nx_cugraph as nxcg
    BACKEND = "cugraph"

print("Python Environment:")
print(f" | Python version: {os.sys.version}")
print(f" | Numpy version: {np.__version__}")
print(f" | Pandas version: {pd.__version__}")
print(f" | Matplotlib version: {matplotlib.__version__}")
print(f" | NetworkX version: {nx.__version__}")
print(f" | NetworkX backend: {BACKEND}")
print(
    f" | CuGraph version: {nxcg.__version__}"
    if BACKEND == "cugraph"
    else " | CuGraph not installed, for better performance install it like this:\n\tpip install cugraph-cu12 --extra-index-url=https://pypi.ngc.nvidia.com"
)
print()
print("Ressources:")
print(f" | CPU: {os.cpu_count()} cores")

PAPERS = "Pandora Papers"
GRAPH_PATH = f"./data/{PAPERS.lower().replace(' ', '_')}_graph.gexf"

Python Environment:
 | Python version: 3.12.2 (tags/v3.12.2:6abddd9, Feb  6 2024, 21:26:36) [MSC v.1937 64 bit (AMD64)]
 | Numpy version: 1.26.4
 | Pandas version: 2.2.2
 | Matplotlib version: 3.9.0
 | NetworkX version: 3.3
 | NetworkX backend: None
 | CuGraph not installed, for better performance install it like this:
	pip install cugraph-cu12 --extra-index-url=https://pypi.ngc.nvidia.com

Ressources:
 | CPU: 12 cores


## Daten laden

In [2]:
G = utils.get_graph(GRAPH_PATH, PAPERS)

print(f"Number of nodes: {len(G.nodes)}")
G = utils.merge_duplicate_nodes(
    G, exclude_attributes=["label", "countries", "sourceID", "valid_until", "note"]
)
print(f"Number of nodes after removing duplicates: {len(G.nodes)}")

print(f"Number of edges: {len(G.edges)}")
G = utils.remove_duplicate_edges(G)
print(f"Number of edges after removing duplicates: {len(G.edges)}")

Number of nodes: 108053


Merging duplicate nodes:   0%|          | 0/89015 [00:00<?, ?it/s]

Number of nodes after removing duplicates: 89015
Number of edges: 126762


Removing duplicate edges:   0%|          | 0/126762 [00:00<?, ?it/s]

Number of edges after removing duplicates: 111962


## Find Entity Clusters with high swiss connections

In [16]:
officers = utils.filter_nodes(G, query="node_type == 'Officer'")
entities = utils.filter_nodes(G, query="node_type == 'Entity'")
officers_entities_subgraph = G.subgraph(set(officers) | set(entities))

print(f"original graph: {len(G.nodes)} nodes, {len(G.edges)} edges")
print(
    f"officers_entities_subgraph: {len(officers_entities_subgraph.nodes)} nodes, {len(officers_entities_subgraph.edges)} edges"
)

nx.projected_graph(officers_entities_subgraph, entities).name = "Entities"

original graph: 89015 nodes, 111962 edges
officers_entities_subgraph: 60872 nodes, 54437 edges
