# Scraping of http://www.parlament.ch

First, we need to scrap some information from the website http://parlament.ch. In this notebook, we will scrap different information. These information will be stored in the folder *data*. If you just cloned the repo and you need some data, please run this python notebook to scrap all the data. 

For the scraping, we are using the library `requests`. The metadata of the website are provided and working with XOData. So, we get the urls using XOData, then we get the XML using `requests` and we transform the XML into JSON using the library `xmltodict`.

URL of the metadata: https://ws.parlament.ch/odata.svc/$metadata

In [20]:
# Import some useful libraries
%matplotlib inline
import pandas as pd
import urllib
import xml.etree.ElementTree as ET
import loader
import numpy as np
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Scrap


Tables: Party, Person, Council

In [22]:
df_party = loader.party()
df_person = loader.person()
df_member_council = loader.member_council()



https://ws.parlament.ch/odata.svc/Party?$filter=Language%20eq%20'FR'
https://ws.parlament.ch/odata.svc/Person?$filter=Language%20eq%20'FR'%20and%20ID%20ge%200%20and%20ID%20lt%201000
https://ws.parlament.ch/odata.svc/Person?$filter=Language%20eq%20'FR'%20and%20ID%20ge%201000%20and%20ID%20lt%202000
https://ws.parlament.ch/odata.svc/Person?$filter=Language%20eq%20'FR'%20and%20ID%20ge%202000%20and%20ID%20lt%203000
https://ws.parlament.ch/odata.svc/Person?$filter=Language%20eq%20'FR'%20and%20ID%20ge%203000%20and%20ID%20lt%204000
https://ws.parlament.ch/odata.svc/Person?$filter=Language%20eq%20'FR'%20and%20ID%20ge%204000%20and%20ID%20lt%205000
https://ws.parlament.ch/odata.svc/Person?$filter=Language%20eq%20'FR'%20and%20ID%20ge%205000%20and%20ID%20lt%206000
https://ws.parlament.ch/odata.svc/MemberCouncil?$filter=Language%20eq%20'FR'%20and%20ID%20ge%200%20and%20ID%20lt%201000
https://ws.parlament.ch/odata.svc/MemberCouncil?$filter=Language%20eq%20'FR'%20and%20ID%20ge%201000%20and%20ID%20lt%20

We know that their is 4211 persons registered in database, but API only allows 1000 load at a time.
Let's scrap!

In [49]:
import urllib
def count(table_name):
    url = "https://ws.parlament.ch/odata.svc/" + table_name + "/$count?$filter=Language%20eq%20'FR'"
    with urllib.request.urlopen(url) as response:
        n = response.read()
    # get the number from the bytes
    n = int(str(n).split("'")[1])
    return n
    
    
print(count("Council"))
print(count("Person"))
print(count("MemberCouncil"))
a = count("MemberCouncil")

3
3525
3514


In [19]:
person.head(10)

Unnamed: 0,DateOfBirth,DateOfDeath,FirstName,GenderAsString,ID,Language,LastName,MaritalStatus,MaritalStatusText,MilitaryRank,...,Modified,NativeLanguage,NumberOfChildren,OfficialName,PersonIdCode,PersonNumber,PlaceOfBirthCanton,PlaceOfBirthCity,Title,TitleText
0,1938-03-02T00:00:00,,Pierre,m,1,FR,Aguet,2.0,marié(e),5.0,...,2015-05-17T21:18:19.387,F,,Aguet,2200,1,Vaud,Pompaples,,
1,1928-02-22T00:00:00,,Heinz,m,2,FR,Allenspach,,,,...,2015-05-17T21:18:19.387,D,,Allenspach,2002,2,,,,
2,1931-01-27T00:00:00,,Manfred,m,6,FR,Aregger,2.0,marié(e),7.0,...,2015-05-17T21:18:19.387,D,5.0,Aregger,2004,6,Lucerne,Hasle,9.0,dipl. Bauing. HTL
3,1928-03-04T00:00:00,,Geneviève,f,7,FR,Aubry,,,,...,2015-05-17T21:18:19.387,F,,Aubry Geneviève,2005,7,,,,
4,1947-12-01T00:00:00,,Rosmarie,f,8,FR,Bär,,,,...,2015-05-17T21:18:19.387,D,,Bär,2008,8,,,,
5,1947-11-11T00:00:00,,Ruedi,m,9,FR,Baumann,2.0,marié(e),,...,2015-05-17T21:18:19.387,D,2.0,Baumann Ruedi,2268,9,Berne,Suberg,10.0,dipl. Ing. Agr. ETH
6,1942-09-03T00:00:00,,Peter,m,10,FR,Baumberger,2.0,marié(e),11.0,...,2015-05-17T21:18:19.387,D,2.0,Baumberger Peter,2269,10,Zurich,Winterthur,6.0,Dr. iur.
7,1938-05-12T00:00:00,,Ursula,f,11,FR,Bäumlin,2.0,marié(e),,...,2015-05-17T21:18:19.387,D,2.0,Bäumlin Ursula,2011,11,Berne,Berne,12.0,lic. phil. I
8,1953-03-26T00:00:00,,Christine,f,12,FR,Beerli,2.0,marié(e),,...,2015-05-17T21:18:19.387,D,,Beerli,2335,12,Berne,Bienne,115.0,lic. iur.
9,1947-12-02T00:00:00,,Thierry,m,13,FR,Béguin,2.0,marié(e),2.0,...,2015-05-17T21:18:19.387,F,4.0,Béguin Thierry,2202,13,Neuchâtel,La Chaux-de-Fonds,3.0,lic. en droit


In [13]:
member_council = loader.member_council()
member_council.head()

https://ws.parlament.ch/odata.svc/MemberCouncil?$filter=Language%20eq%20'FR'%20and%20ID%20ge%200%20and%20ID%20lt%201000
https://ws.parlament.ch/odata.svc/MemberCouncil?$filter=Language%20eq%20'FR'%20and%20ID%20ge%201000%20and%20ID%20lt%202000
https://ws.parlament.ch/odata.svc/MemberCouncil?$filter=Language%20eq%20'FR'%20and%20ID%20ge%202000%20and%20ID%20lt%203000
https://ws.parlament.ch/odata.svc/MemberCouncil?$filter=Language%20eq%20'FR'%20and%20ID%20ge%203000%20and%20ID%20lt%204000
https://ws.parlament.ch/odata.svc/MemberCouncil?$filter=Language%20eq%20'FR'%20and%20ID%20ge%204000%20and%20ID%20lt%205000
https://ws.parlament.ch/odata.svc/MemberCouncil?$filter=Language%20eq%20'FR'%20and%20ID%20ge%205000%20and%20ID%20lt%206000


Unnamed: 0,Active,AdditionalActivity,AdditionalMandate,BirthPlace_Canton,BirthPlace_City,Canton,CantonAbbreviation,CantonName,Citizenship,Council,...,ParlGroupAbbreviation,ParlGroupFunction,ParlGroupFunctionText,ParlGroupName,ParlGroupNumber,Party,PartyAbbreviation,PartyName,PersonIdCode,PersonNumber
0,False,Réducteur TSV de 1971 à 1983,"Prés. Féd. Romande des Sociolistes ch., Prés. ...",,Pompaples,22,VD,Vaud,"Sullens (VD),Lutry (VD)",1,...,,,,,,12.0,PSS,Parti socialiste suisse,2200,1
1,False,,,,,1,ZH,Zurich,"Kreuzlingen (TG),Fällanden (ZH)",1,...,,,,,,15.0,PLR,PLR.Les Libéraux-Radicaux,2002,2
2,False,Zentralpräsident Schweiz. Skliverband 1985 bis...,,,Hasle,3,LU,Lucerne,Hasle (LU),1,...,,,,,,15.0,PLR,PLR.Les Libéraux-Radicaux,2004,6
3,False,,,,,2,BE,Berne,Tavannes (BE),1,...,,,,,,,,,2005,7
4,False,,,,,2,BE,Berne,"Siselen (BE),Richterswil (ZH)",1,...,,,,,,,,,2008,8
