
## Diving deep into the Panama Papers

### Abstract:

The Panama Papers are arguably the largest leak of confidential data to date. They provide a wealth of knowledge by exposing previously hidden ties between corporations and their ecosystems.  This paper presents a social network analysis of the companies and individuals involved in this leak. We identified the countries in which the key players were operating, how we could predict the involvement of a country and what were the underlying relationships between the parties involved. We were able to shed light on the central actors of the network and provide further investigations on their connections.

### Introduction:

The Panama Papers are leaked documents that detail the internal operations of one of the world's biggest firms in incorporation of offshore entities, Mossack Fonseca. This Panamanian law firm and corporate service provider had financial and attorney-client information for more than 210 000 offshore entities revealed to the public. While offshore business entities are legal, reporters found that some of the Mossack Fonseca shell corporations were used for illegal purposes, namely fraud, tax evasion, and evading international sanctions. Therefore one may argue that it is of public and state interest to explore the Panama Papers in order to extract insights about the key players involved. We intend to discover, analyse and explain the underlying relationships inside the Panama Papers. 

### Related work:

One could argue that the most difficult part when investigating leaked data is to preprocess it and make it understandable to the human eye. To this end the ICIJ did a wonderful job. Indeed the leaks have been made available as a database powered by a graph query engine called Neo4j. Moreover the ICIJ already performed a great deal of investigative journalism which we will reproduce and complement in the first part of our work.

### Data Collection and Description:

As stated before we will resort to the  Panama Papers section of the ICIJ Offshore Leaks Database available online. In the second part of our work we used a multitude of data sources in order to extract economic and financial indicators about the countries present in the dataset.

The  database is distributed into 5 csv files:
- Entity: A company, trust or fund created by an agent. 213634 entries
- Officer: A person or company who plays a role in an offshore entity. 238402 entries
- Intermediary: A go-between for someone seeking an offshore corporation and an offshore service provider. 14110 entries
- Address: Contact postal address as it appears in the original databases obtained by the ICIJ. 93454 entries
- Edges: Relationship between two nodes and the nature of the link. 674102 entries

### Methodology, Models and Methods:

The primary goals of our project were threefold:

- Present the evolution of the number of incorporated offshore companies for the last 50 years and explain it based on historical events. Identify the countries and jurisdictions with the most offshore entities, intermediaries and officers. Test whether the number of entities is linked to the number of intermediaries and officers or not. 

- Predict the involvement of a country based on economic and financial indicators. That is find the correlates of evasion and discuss the relevance of  commonly used metrics.

- Quantify the notion of importance in the papers  and identify the key actors. Describe the communities embedded inside the network and investigate their constituents.

In order to obtain key figures about the evolution and situation of Mossack Fonseca's clients  we used the python library Pandas. Most of it has been dataframes manipulations.

Moreover we established the strength and direction of monotonic relationships between entities, intermediaries and officers with the spearman rank-order correlation statistic. It is the Pearson correlation coefficient between the ranked variables. 
r(s) = p(ranked_x, ranked_y) = cov(ranked_x, ranked_y) / std(ranked_x)*std(ranked_y)

The correlates of evasion were identified by performing linear regressions for each financial indicator while controlling for GDP and Population. That is each regression had the form:
Number of entities = a*Indicator_of_interest + b*Population + c*GDP + E with E - N(0, sigma_squared)

We then moved on to Social Network Analysis with the python library Networkx. Our objective was to zoom in on communities and actors inside the network.

We quantified node importance based on centrality measures:

- The Pagerank algorithm performs a random ergodic walk on the graph and outputs the state probability for each node. That is the probability of being on a specific node at each iteration of a random surfer.

- Betweenness centrality is based on shortest paths.
bet(v)=∑σst(v)/σst for every pair of nodes s, t. That is the number of shortest paths between s and t going through v divided by the number of shortest paths between s and t. The higher the betweenness, the more a node acts as a bridge between nodes.

- Degree is based on the number of adjacent edges for each node. Meaning that the more neighbours a node has, 
the more important it may be in the network.

Prior to community detection we assessed the global structure of the graph:

- The clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together.
The global clustering coefficient is G=number of closed triplets / number of open or closed triplets.  
The clustering coefficient can also be computed for each node individually.

- The density is a measure of how connected a network is. 
d = 2m/n(n-1) where m is the number of edges and n the number of nodes. That is the ratio of the observed sum of degrees divided by the maximum sum of degrees.

Finally we employed community detection algorithms for specific countries:

A community refers to the occurrence of groups of nodes in a network that are more densely connected internally than with the rest of the network. 
The Louvain method performs communities extraction. Initially every node is considered as a community. The communities are traversed, and for each community it is tested whether by joining it to a neighboring community, we can obtain a better clustering. The objective function to maximize is the modularity.
Q=∑(1−deg(i)∗deg(j)/2m)∗δ(ci,cj) for every pair of nodes i, j. The delta function is 1 if i and j belong to the same community.

The detection and visualisation of communities were carried in Gephi, an open graph visualisation platform. Node size was a function of centrality measures while the color code indicated communities. Besides their constituents were investigated and visualised using Neo4j.

### Results and findings

We obtained the evolution of the number of incorporated offshore companies for the last 50 years:

PLOT NUMBER OF INCORPORATED OFFSHORE COMPANIES PER YEAR

We put this graph into an historic timeline:

Past-1970: For decades, offshore finance had a relatively modest profile in Panama.
1975-2000: It took off in the 1970s as world oil prices surged. During this time, the Republic of Panama passed legislation entrenching corporate and individual financial secrecy.
1985-1990: Political crisis resulting in an invasion by the USA from 1989 to 1990.
2000-2005: The 2000 OECD report, "Towards Global Tax Co-operation", included a list of 35 jurisdictions that were found to meet the tax haven criteria set out in an earlier report issued in 1998. Panama was part of the report.
2007-2015: The subprime crisis and the years after it. We notice a net decline in the number of incorporated offshore companies during that period. Ultimately the papers were leaked and that is where the records stop.

The following charts present the jurisdictions and countries with the most offshore entities, intermediaries and officers.

CHART ENTITY COUNTRY/JURISDICTION, INTERMEDIARY, OFFICER
Footnote: nan indicates that we could not retrieve the country for the intermediary

The most represented jurisdictions are tax heavens themselves. They act as intermediary countries for tax evasion.

We confirm the existence of a strong positive monotonic relationship between the number of entities and the number of intermediaries and officers. The statistics are reported below:

TABLE RESULTAT SPEARMAN TEST AVEC P VALUES
Footnote: the data was inspected for outliers and further testing still suggested a strong positive monotonic relationship

Now that we had understood which were the key countries we explained their involvement based on economic and financial indicators.

GRAPHE COEFFICIENT POUR CHAQUE COVARIATE
EXPLICATION SELON GRAPHE. QUELLES BONNES COVARIATES ?

We extracted the most important actors based on centrality measures:
FOOTNOTE For the rest of the analysis we focused on the giant component, that is the largest connected component in the network
TABLE DEGREE
TABLE PAGERANK

We observed that the degree distribution was a pseudo power law characterised by its heavy tail. Intermediaries being the cause of this distribution as suggested by a small median to mean ratio.
GRAPHE POWER LAW
GRAPHE DISTRIBUTION PAR TYPE

Moreover we concluded that the network was not a small world because of its zero clustering coefficient. The idea of a highly dispersed network was supported by its density measure d = 5.37*10^-6. This left us with communities detection and investigation. 

Every country we worked on displayed a community structure based on independent clusters of nodes as indicated by a large modularity metric. We present the results we got on COUNTRY’s ego graph.

GRAPHE ET INVESTIGATION POUR COUNTRY.

### Conclusions

In this paper we identified the countries in which the key players were operating, most often tax heavens themselves such as the British Virgin Islands or Switzerland. We saw that actors were distributed interdependently and that this distribution could be inferred from financial secrecy indexes. Moreover, we shed light on intermediaries as the main drivers of the network. A network that would be mostly dispersed and organised in independent communities which we provided further investigations on their structure.

### References TO FORMAT

https://offshoreleaks.icij.org/  
https://pandas.pydata.org/  
https://networkx.github.io/  
https://gephi.org/  
https://neo4j.com/  
https://www.britannica.com/place/Panama  
http://ilpubs.stanford.edu:8090/422/  
http://web.stanford.edu/class/cs224w/#resources  
https://arxiv.org/abs/0803.0476
