Obtaining Data
==============

All data is obtained from http://parltrack.euwiki.org, where complete [raw data dumps](http://parltrack.euwiki.org/dumps) of all the votes are compiled daily from [European Parliament data](http://www.europarl.europa.eu/tools/disclaimer/default_en.htm).  
More specifically, data used in this project are extracted from
 * http://parltrack.euwiki.org/dumps/ep_meps_current.json.xz : details about the MEPs (past and current)
 * http://parltrack.euwiki.org/dumps/ep_votes.json.xz : all the votes that happened over (at least) the last 5 years

License
-------
As per the http://parltrack.euwiki.org website, the original data mentioned above is published under [ODbL v1.0](http://www.opendatacommons.org/licenses/odbl/) .  
The derived data produced as part of this work is therefore also publisher under ODbL.

Pre-processing
-------------
The raw [MEP data ](http://parltrack.euwiki.org/dumps/ep_meps_current.json.xz) will be processed to extract a summary for each MEP. The details on how this is done can be found in [this notebook](ep_meps_extract.ipynb), and results in records of the form:


In [11]:
import json

# read MEPs details
with open('computed/meps_summary.json') as json_file:  
    meps_details = json.load(json_file)

# print first record
print (json.dumps(next(iter(meps_details.values())), indent=2))

{
  "picture": "http://www.europarl.europa.eu/mepphoto/1.jpg",
  "surname": "Georg",
  "mep_id": 1,
  "eu_homepage": "http://www.europarl.europa.eu/meps/en/1/_history.html",
  "active": false,
  "current_constituency": "Christlich Demokratische Union Deutschlands",
  "name": "JARZEMBOWSKI",
  "current_group": "PPE-DE",
  "gender": "M",
  "birthdate": -722995200000.0,
  "country": "Germany",
  "email": null
}



Likewise, the [raw votes data](http://parltrack.euwiki.org/dumps/ep_votes.json.xz) is processed as described in [this notebook](ep_votes_extract.ipynb).  
This results in two datasets :
 * the votes summary file, containing a summary of each vote, in the form :

In [34]:
# read votes details
with open('computed/votes_summary.json') as json_file:  
    votes_details = json.load(json_file)

# print first record
print (json.dumps(next(iter(votes_details.values())), indent=2))

{
  "voteid": "60387",
  "title": "Modification de l'ordre du jour",
  "issue_type": null,
  "url": "http://www.europarl.europa.eu/RegData/seance_pleniere/proces_verbal/2015/11-11/liste_presence/P8_PV(2015)11-11(RCV)_XC.xml",
  "report": null,
  "ts": 1447255116000.0
}


 * the MEPs votes file, containing a table with each vote of each MEPs :

In [33]:
import pandas

# read all votes CSV (about 3000 MEPs x 17000 votes)
votes_data = pandas.read_csv('computed/meps_votes.csv')

# Display a sample and replace missing values by empty spaces for display purposes
votes_data.iloc[55:60, 0:10].replace(float('nan'), '')

Unnamed: 0,mep_id,group,country,active,73468,98690,73396,103692,73394,101767
55,124759,PPE,Lithuania,False,,,,,,
56,124758,ENF,France,True,1.0,1.0,-1.0,1.0,0.0,1.0
57,124757,ENF,France,True,1.0,1.0,-1.0,1.0,0.0,1.0
58,124756,Verts/ALE,Croatia,True,-1.0,-1.0,1.0,1.0,1.0,1.0
59,124755,ENF,France,True,1.0,1.0,-1.0,1.0,0.0,1.0
