# Key Insights:

Despite the enormous number of numbers in the data, relatively few named individuals are the actual beneficiaries of offshore accounts. Just 13,800 unique names appear as beneficiaries for entities (out of 2160,000 named officers), of which I estimate 10-20% are themselves shell companies. 

Although China has the most beneficiaries in absolute terms, the top offenders for their populations are Peru, Ecuador, the UK, Guatemala, Argentina, the Dominican Republic, and Russia.

Relative to their populations, most wealthy countries have few beneficiaries of offshore accounts. Britain is a striking exception. Most of the named beneficiaries appear to be real people, and a large percentage have Anglophone names, so neither shell companies nor notional residency can fully explain this discrepancy. It's not clear why Brits make such frequent use of offshore accounts. It may be that the particular companies whose information was leaked are tend to serve more Brits (and Peruvians, etc). Or it may be that the UK has an unusual tax evasion problem.

import pandas as pd
import numpy as np
import os
import plotly
plotly.offline.init_notebook_mode() 

In [7]:
path = 'offshore_leaks_csvs-20160621/'
files = os.listdir(path)

In [87]:
tables = {}
for name in files:
    if name[-4:] == '.csv':
        tables[name[:-4]] = pd.DataFrame.from_csv(path+name, index_col=False)
locals().update(tables)

In [88]:
for key, v in tables.items():
    if key == 'all_edges':
        pass
    else:
        v.set_index('node_id', inplace=True)

In [91]:
benef_edges = all_edges[all_edges.rel_type.str.contains('benef')]

In [244]:
print "Number of unique beneficiaries in the data:",len(benef_edges.node_1.unique())

Number of unique beneficiaries in the data: 13112


In [249]:
beneficiaries = Officers.loc[benef_edges.node_1.unique()]

In [246]:
country_beneficiaries = beneficiaries.groupby(['country_codes']).count()['name']

In [202]:
population = pd.DataFrame.from_csv('population.csv', index_col=None)
population = population[population.Year == 2014]
population.set_index('Country Code', inplace=True)

df = pd.DataFrame(country_beneficiaries)
df = df.merge(population, how='right', left_index=True, right_index=True)
df['normalized'] = df.name/df.Value*10**6
df['normalized'].fillna(0, inplace=True)
df['normalized'][df.Value < 10**7] = 0

In [243]:
print 'Most Offshore Beneficiaries:'
print df['normalized'][df['normalized']>5]

Most Offshore Beneficiaries:
Country Code
ARG    7.750617
DOM    6.078477
ECU    9.197530
GTM    7.818552
PER    9.360047
RUS    5.986668
GBR    8.448253
Name: normalized, dtype: float64


In [188]:
import plotly.offline as ply

In [241]:

data = [ dict(
        type = 'choropleth',
        locations = df.index.values,
        z = df['normalized'],
        text = df.index.values,
        colorscale = [[0,"rgb(5, 10, 172)"],[.5,"rgb(40, 60, 190)"],[1.0,"rgb(70, 100, 245)"],\
            [1.5,"rgb(90, 120, 245)"],[3.0,"rgb(106, 137, 247)"],[5.0,"rgb(220, 220, 220)"]],
        autocolorscale = False,
        reversescale = True,
        marker = dict(
            line = dict (
                color = 'rgb(0,0,0)',
                width = 0.5
            ) ),
        colorbar = dict(
            autotick = False,
            tickprefix = '',
            title = 'Number of Beneficiaries per Million Residents'),
      ) ]
layout = dict(
    title = 'Panama Papers Offshore Beneficiaries',
    geo = dict(
        showframe = False,
        showcoastlines = False,
        projection = dict(
            type = 'Mercator'
        )
    )
)

fig = dict( data=data, layout=layout )
ply.offline.iplot( fig, validate=False, filename='d3-world-map' )