# Choropleth Map

***Summary***
- [Load Data](#load_data)
- [Data Cleansing](#data-cleansing)
- [Display Result](#display-result)

A choropleth map is a thematic map that uses colors to visualize an aggregate summary of a geographic characteristic within spatial units. Choropleth maps provide an easy way to visualize how a variable varies across a geographic area or show the level of variability within a region.

In this Jupyter Notebook we will visualize the election turnout for the 2019 National Council elections (Nationalratswahlen) in each municipality in the canton St. Gallen.
The geometric / geographic data is provided by [seantis gmbh](https://www.seantis.ch/). The topoJSON file with all municipalities of the canton St. Gallen can be downloaded [here](https://github.com/OneGov/onegov-cloud/tree/master/src/onegov/election_day/static/mapdata).
The results of the elections 2019 can be downloaded [here](https://wab.sg.ch/election/erneuerungswahl-des-nationalrates-2/data).

In a first step we have to convert the topoJSON file to a GeoJson file because plotly only works with GeoJSON data.
Next, the election data needs to be cleansed and merged with the geographic data.

Choropleth maps can be created with [GeoPandas](https://geopandas.org/en/stable/index.html) and with [plotly](https://plotly.com/), which is a more interactive form of visualization.

First work through the example and try to understand the code. Then you can create other choropleth maps, with different data or for other cantons.

In [None]:
!pip install geopandas

In [None]:
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px

from google.colab import files
import io

<a id='load_data'></a>
## I. Load Data
The geo data are provided as json file and the election results as csv files.
The files are available on the script server or you can download them under the following links ([geo data](https://github.com/OneGov/onegov-cloud/tree/master/src/onegov/election_day/static/mapdata) and [election data](https://wab.sg.ch/election/erneuerungswahl-des-nationalrates-2/data)).

### Geo Data
First, let's load the geo data into a GeoDataFrame and get familiar with its content.
The GeoDataFrame consists of three columns, the municipality id, the municipality name, and a representation of the municipalitie's outline, called POLYGON or MULTIPOLYGON.
Based on this GeoDataFrame to figure out the northernmost, southernmost, easternmost, and westernmost municipality of St. Gallen.

In [None]:
# Upload sg.json as soon as `Choose Files` button appears
uploaded_gdf = files.upload()
gdf_l = gpd.GeoDataFrame.from_file(io.BytesIO(uploaded_gdf['sg.json']))

In [None]:
# Print bounds of geopandas
north = gdf_l.loc[gdf_l.bounds.miny.argmin(),'name']
south = gdf_l.loc[gdf_l.bounds.maxy.argmax(),'name']
east = gdf_l.loc[gdf_l.bounds.maxx.argmax(),'name']
west = gdf_l.loc[gdf_l.bounds.minx.argmin(),'name']

print(gdf_l.head(), end='\n\n')

print('North: {:s}, South: {:s}, East: {:s}, West: {:s}'.format(north, south, east, west))

### Election Data
Next, we load the election data into a DataFrame and take a look at the column names and the DataFrame content.
Clearly not all columns are needed for the calculation of the turnout, moreover some rows are not assigned to any municipality and each municipality occurs several times (several rows). This calls for data cleansing.

In [None]:
# Upload election_data.csv as soon as `Choose Files` button appears
uploaded_data = files.upload()
data = pd.read_csv(io.BytesIO(uploaded_data['election_data.csv']))

print(data.columns)

data.head()

<a id='data-cleansing'></a>
## II. Data Cleansing
Next, we need to extract the data we need for the choropleth map and merge both DataFrames into a single DataFrame.

### Geo Data
Using the `plot` method of a GeoDataFrame object, we can display the map which results form the POLYGON and MULTIPOLYGON shapes.
In doing so, it turns out that the map is upside down.

Furthermore, the GeoDataFrame only contains geometric shapes but no geo-reference (vertices are not provided as longitude latitude positions).
This is a problem for plotly choropleth maps, since this only works with geographic coordinates.
To fix this, we figure out the coordinates of the northernmost, southernmost, easternmost, and westernmost point in St. Gallen and transform the geometric shapes accordingly.

In [None]:
# Flip map
gdf_l.geometry = gdf_l.geometry.scale(1, -1, origin=(0,0))
gdf_l.geometry = gdf_l.geometry.translate(-gdf_l.total_bounds[0], -gdf_l.total_bounds[1])

_, ax = plt.subplots(1,1)
ax.set_title('Municipalities of St. Gallen (not georeferenced)')
gdf_l.plot('id', ax=ax)

In [None]:
# Approx. transformation to geographic coordinates
LONG_N = 47.531943
LONG_S = 46.872883
LAT_E = 9.674830
LAT_W = 8.795622

x_scale = (LAT_E - LAT_W) / gdf_l.total_bounds[2]
y_scale = (LONG_N - LONG_S) / gdf_l.total_bounds[3]

gdf_g = gdf_l.copy()
gdf_g.geometry = gdf_l.geometry.scale(x_scale, y_scale, origin=(0,0))
gdf_g.geometry = gdf_g.geometry.translate(LAT_W, LONG_S)

In [None]:
# Sort rows by municipality names
gdf_g.sort_values('name', inplace=True)
gdf_g.reset_index(drop=True, inplace=True)

# Drop useless id column
gdf_g.drop(['id'], inplace=True, axis=1)
gdf_g.head()

### Election Data
Using the election DataFrame we calculate the election turnout for each municipality in St. Gallen and append the result as column to the GeoDataFrame.

In [None]:
# Extract only relevant columns
data_1 = data.loc[:,['entity_name','entity_eligible_voters','entity_received_ballots']]

# Drop all rows which contain a NaN
data_1.dropna(inplace=True)

# Drop duplicate rows
data_1.drop_duplicates(inplace=True)

# Sort rows by municipality names
data_1.sort_values('entity_name', inplace=True)
data_1.reset_index(drop=True, inplace=True)

# Calculate election turnout and add new column
data_1['entity_turnout'] = data_1['entity_received_ballots'] / data_1['entity_eligible_voters'] * 100

data_1.head()

In [None]:
# Add turnout column to GeoDataFrame
gdf_g['turnout'] = data_1['entity_turnout']
gdf_g.set_index('name', inplace=True)
gdf_g.head()

<a id='display-result'></a>
## III. Display Result
After we prepare the data, we create an interactive plot using plotly.
Try to figure out yourself, how to apply and improve such a choropleth map plot by plotly (see [here](https://plotly.com/python/choropleth-maps/) and [here](https://plotly.github.io/plotly.py-docs/generated/plotly.express.choropleth.html)).

In [None]:
fig = px.choropleth(gdf_g,
                    geojson=gdf_g.geometry,
                    locations=gdf_g.index,
                    color='turnout',
                    hover_name=gdf_g.index,
                    hover_data=['turnout'],
                    color_continuous_scale='greens',
                    # range_color=(30, 60),
                    projection='mercator',
                    title='Election Turnout St. Gallen')
fig.update_geos(fitbounds='locations', visible=False)
fig.update_layout(margin={"r":0,"t":50,"l":0,"b":0})
fig.show()