# Mapping Cantons to their Codes

The idea behind this helper file is to scrape the Wikipedia page in order to define a dictionary linking cantons to their codes (needed to work with the ids in order to draw the choropleth maps of Switzerland).

The method was inspired by [this code](https://adesquared.wordpress.com/2013/06/16/using-python-beautifulsoup-to-scrape-a-wikipedia-table/) but was still adapted to fit our needs.

In [1]:
#We start with our most important imports
import urllib3
import pickle as pkl
from bs4 import BeautifulSoup

In [2]:
#We set the important variables
http = urllib3.PoolManager()
urllib3.disable_warnings()
WIKI = "https://fr.wikipedia.org/wiki/Canton_(Suisse)"
header = {'User-Agent': 'Mozilla/5.0'} #Needed to prevent 403 error on Wikipedia

In [3]:
#We get find the table to work with
r = http.request('GET', WIKI, headers = header)
soup = BeautifulSoup(r.data, "lxml")
table = soup.find("table", { "class" : "wikitable sortable" })

In order to find our data, we looked at the source code to understand what were the elements we needed to retrieve. We then simply applied the algorithm to retrieve these elements and append them to our lists.

In [4]:
code = []
canton = []

for row in table.findAll("tr"):
    code_cells = row.findAll("th")
    if len(code_cells) == 2:
        code.append(str(code_cells[0].find(text=True)))
    canton_cells = row.findAll("td")
    if len(canton_cells) == 10:
        canton.append(str(canton_cells[0].find(text=True)))

To prove the sanity of our data, we display the head of each list.

In [5]:
code[:5]

['ZH', 'BE', 'LU', 'UR', 'SZ']

In [6]:
canton[:5]

['Zurich', 'Berne', 'Lucerne', 'Uri', 'Schwytz']

However, we still face an issue with 2 Swiss cantons, Schwytz and Saint-Gall, which do not have the same name in the data retrieved from "amstat". Thus, we decide to change them manually in order to have the ease the computations on the main notebook.

In [7]:
canton = ['Schwyz' if c == 'Schwytz' else c for c in canton]
canton = ['St-Gall' if c == 'Saint-Gall' else c for c in canton]
canton[:5] #We can already see from here 'Schwytz' was changed to 'Schwyz'

['Zurich', 'Berne', 'Lucerne', 'Uri', 'Schwyz']

In [8]:
dico = dict(zip(canton, code))
pkl.dump(dico, open('Data/map_cantons.pkl', 'wb')) 