# Analysis of the FTM dataset on EU elections 2019
Data source : https://www.ftm.nl/artikelen/eurosceptici-donaties

Data licence, according to the above page : https://creativecommons.org/licenses/by-sa/4.0/

Given the licence, the data files are available in github, to make it easier to  extend this analysis. 

In [1]:
import pandas as pd
import ipysankeywidget as sw
import ipywidgets as ipw

all_data = pd.read_stata('data/EUdonations1418.dta')
countries = pd.read_csv('data/CountryCodesAndNames.csv')
#Add the country names to the table for easyer manipulation
all_data = all_data.merge(countries, how='left', left_on='country', right_on='CountryID')

In [2]:
#Lets have a quick look at the data
all_data.head()

Unnamed: 0,party,partynr,thinktank,year,don,name,country,org,sec1,sec2,det,sum_pcoun,sum_coun,sum_org,sum_year,id,CountryID,CountryName
0,ACRE (ND),1,0,2014,595.86,Geoffrey Clifton-Brown,28.0,5.0,23.0,,Conservative MP UK,164493.875,378392.5,198809.359375,241829.015625,1.0,28.0,UK
1,ACRE (ND),1,0,2014,1914.5,George Rukhadze,33.0,3.0,27.0,,Vice-president ECPM / founder Georgian Strateg...,2514.5,28310.5,143904.734375,241829.015625,2.0,33.0,Georgia
2,ACRE (ND),1,1,2014,2000.0,Bendukidze Stichting,20.0,2.0,13.0,,Foundation of Olga Yuriyevna Novikova co-owner...,32100.0,426397.625,378524.0625,241829.015625,3.0,20.0,Netherlands
3,ACRE (ND),1,0,2014,4116.0,SIA Contex,16.0,2.0,37.0,24.0,Tabacco products / computer programming,19651.550781,61771.550781,378524.0625,241829.015625,4.0,16.0,Latvia
4,ACRE (ND),1,1,2014,12000.0,Manfred Kastner,1.0,1.0,18.0,7.0,CEO Austrian oilcompany + founder Microfinance...,74000.0,400097.8125,328750.375,241829.015625,5.0,1.0,Austria


# Source of the  money by country.
Here we sum all money up, coming from a speciffic country and see how much goes to which party.

In [3]:
#Lets prepare the data.
links = all_data.filter(items=['CountryName','party','don'])

#Group the data by country and by party, and sum up the groups
links = links.groupby(by=['CountryName','party']).sum()

#Convert the dataframe to a dict, that the SankeyWidget needs
sankeyList = []
for row in links.iterrows():
    d = {
        'source':row[0][0],
        'target':row[0][1],
        'value':row[1][0],
    }
    sankeyList.append(d)

In [4]:
layout = ipw.Layout(width="1000", height="2500")
sankey = sw.SankeyWidget(links=sankeyList, margins=dict(top=0, bottom=0, left=150, right=100), layout=layout)

# Source of the  money by country
From which country does funding of EU parties come from?

In [5]:
sankey

A Jupyter Widget

# Newly politicized money

Here we show all the money that is coming from individuals and companies. Money that is clearly already political is removed

Organisation classes:
* 1 = individual
* 2 = company
* 3 = NGO
* 4 = government(al organization)
* 5 = political party / ideological affiliation (e.g. thinktank)

In [6]:
#We want data only if the org is 1 or 2
countries_with_only_companies_and_individuals = all_data.query('org < 3').filter(items=['CountryName','party','don'])

#Group the data by country and by party, and sum up the groups
countries_with_only_companies_and_individuals = countries_with_only_companies_and_individuals.groupby(by=['CountryName','party']).sum()

#Convert the dataframe to a dict, that the SankeyWidget needs

sankeyList = []
for row in countries_with_only_companies_and_individuals.iterrows():
    d = {
        'source':row[0][0],
        'target':row[0][1],
        'value':row[1][0],
    }
    sankeyList.append(d)

In [7]:
layout = ipw.Layout(width="1000", height="2500")
sankey = sw.SankeyWidget(links=sankeyList, margins=dict(top=0, bottom=0, left=150, right=100), layout=layout)

# Newly politicized money
From which country does company and individual money come from? 

In [8]:
sankey

A Jupyter Widget