# STA 141B Data & Web Technologies for Data Analysis

### Lecture 19, 12/5/24, Interactive Visualization: Cartography

### Announcements 

- 

### Today's topics
- Cartography
    - Chloropeth maps

Cloropeth maps are similar to heat maps, in which the units of display are (usually) political entities. They were first introduced in France in the 19th century to color _départements_, which are administrative structures roughly equal in size. 

<div>
    <center>
<img src="https://upload.wikimedia.org/wikipedia/commons/3/38/Carte_figurative_de_l%27instruction_populaire_de_la_France.jpg" width="1000"/>
</center>
    </div>

The preceding example about the proportion of literate population is a textbook example of chloropeth maps for unclassed data: The gradient ranges from low to high. 

Classed maps color political entities by categorical features. The following example shows the party of each winner of constituencies for the 2019 United Kingdom election. 

<div>
    <center>
<img src="https://upload.wikimedia.org/wikipedia/commons/e/e2/2019UKElectionMap.svg" width="1000"/>
</center>
</div>

This maps is not appropriate, because the each constituency corresponds to one seat. However, the larger (rural) constituencies overinflate the success of the Conservative party (blue). 

<div>
    <center>
<img src="https://miro.medium.com/v2/resize:fit:1400/1*hfA55y_xlYTs5v3k-_AxCA.png" width="1000"/>
</center>
</div>

This is one example of preferring regular shapes over accurate constituency boundaries. The size of the constituencies are equal, as each correspond to one seat in parliament. They convey a more truthful message on the election results than constituencies that scale with area. 

Another issue with the categorical data is that we are not able to learn in which areas each party did particularly good - we only know which party did better than all others. Next, we will create chloropeth maps with a gradient layer for each party on the geographical constituencies to explore where certain parties are particularly strong (and weak). 

We can scrape the election results from wikipedia. Some data processing is in order. 

In [None]:
import pandas as pd

In [None]:
elections = pd.read_html('https://en.wikipedia.org/wiki/Results_of_the_2019_United_Kingdom_general_election') 

In [None]:
# England 
england = elections[0].iloc[1:534,]
england.columns = [i[1] for i in england.columns.to_flat_index()]
england = england.rename(columns = {'Lab[b][c]': 'Lab'})
england = england[['Constituency', 'Con', 'Lab', 'LD', 'Grn', 'Total']]

In [None]:
# Scotland 
scotland = elections[2].iloc[1:60,]
scotland.columns = [i[1] for i in scotland.columns.to_flat_index()]
scotland = scotland.rename(columns = {'Lab[b]': 'Lab'})
scotland = scotland[['Constituency', 'Con', 'Lab', 'LD', 'Grn', 'Total']]

In [None]:
# Wales 
wales = elections[3].iloc[1:41,]
wales.columns = [i[1] for i in wales.columns.to_flat_index()]
wales = wales.rename(columns = {'Lab[b]': 'Lab'})
wales = wales[['Constituency', 'Con', 'Lab', 'LD', 'Grn', 'Total']]

In [None]:
election = pd.concat([england, scotland, wales]).set_index('Constituency').fillna(0)
election.head()

In [None]:
for col in election.columns:
    election[col] = election[col].astype(int) / election['Total'].astype(int)
election = election.drop('Total', axis = 1)

In [None]:
election.head()

Some constituencies have non-unicode names. They will not be matched correctly. 

In [None]:
election.index

In [None]:
import re
from unidecode import unidecode

In [None]:
standardize = lambda x: unidecode(re.sub(',', '', x))
election.index = [standardize(i) for i in election.index]

In [None]:
election.index

In [None]:
election.index[508] # given as Weston-Super-Mare in boundaries! 

In [None]:
election = election.rename(index = {'Weston-super-Mare': 'Weston-Super-Mare'})

Any remaining mismatches of the data and GeoJSON file that contains the polygons will have to be dealt with later.  

We want to color the map according to how good each party did in the constituency. 

In [None]:
election = dict(election)

In [None]:
election['Con']['Aldershot'] 

Lets assign each party a color. `branca.colormap.LinearColormap` create a linar interpolation between two colors. 

In [None]:
import branca.colormap as cmp

In [None]:
colors = {party: cmp.LinearColormap(['white', color], vmin=0, vmax=max(election[party])) \
          for party, color in zip(election.keys(), ['#3a85d6', '#ed4224', '#e8ca54', '#6cbd6c'])}

In [None]:
colors['Grn']

In [None]:
colors['Lab'](election['Lab']['Aldershot'])

The custom coloring `get_color` takes the constituency name from the GeoJSON, removes commas (to deal with another mismatch: 'Birmingham, Edgbaston' to 'Birmingham Edgbaston') and, if data is available for that polygon, colors it according to the vote share.  

In [None]:
def get_color(feature, party):
    value = feature['properties']['PCON13NM']
    value = re.sub(',', '', value)
    
    return colors[party](election[party][value])

The geographical information on the consituencies is available as GeoJSON online. 

In [None]:
import requests
boundaries = requests.get('https://github.com/martinjc/UK-GeoJSON/blob/master/json/electoral/gb/wpc.json?raw=true').json()

In [None]:
boundaries['features'][0]#['properties']

In [None]:
boundaries['features'][0]['properties']['PCON13NM']

Lets create a map. We set `tiles` to `False` to remove the standard openstreetview map. Instead, lets use the world terrain map as background. 

In [None]:
import folium 
m = folium.Map(location=[52, 0.0], zoom_start=7, 
               width=1200, height=1000, 
               tiles = None)
folium.TileLayer(
    tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Terrain_Base/MapServer/tile/{z}/{y}/{x}',
    attr='Esri',
    name='Esri Satellite', overlay=True, control=False
).add_to(m)

In [None]:
m.save("map.html")

In [None]:
!open ./map.html

I am afraid that `folium` does not handle loops as expected. Note that we pass `get_color` to the `style_function` argument. The additional parameters govern the boundaries, opacity, and `overlay=False` ensures that each object is given a radio buttion, not a checkmark button. 

In [None]:
folium.GeoJson(
        boundaries,
        name=str('Con'),
        style_function=lambda feature: {
            "fillColor": get_color(feature, 'Con'),
            "color": "gray",
            "weight": 1,
            "dashArray": "1",
            "fillOpacity": 1,
        }, overlay=False, 
    ).add_to(m)
folium.GeoJson(
        boundaries,
        name=str('Lab'),
        style_function=lambda feature: {
            "fillColor": get_color(feature, 'Lab'),
            "color": "gray",
            "weight": 1,
            "dashArray": "1",
            "fillOpacity": 1,
        }, overlay=False,
    ).add_to(m)

In [None]:
folium.GeoJson(
        boundaries,
        name=str('LD'),
        style_function=lambda feature: {
            "fillColor": get_color(feature, 'LD'),
            "color": "gray",
            "weight": 1,
            "dashArray": "1",
            "fillOpacity": 1,
        }, overlay=False, 
    ).add_to(m)
folium.GeoJson(
        boundaries,
        name=str('Grn'),
        style_function=lambda feature: {
            "fillColor": get_color(feature, 'Grn'),
            "color": "gray",
            "weight": 1,
            "dashArray": "1",
            "fillOpacity": 1,
        }, overlay=False,  
    ).add_to(m)

In [None]:
folium.LayerControl(collapsed=False).add_to(m)
m

In [None]:
m.save("map.html")

In [None]:
!open ./map.html

Even though this map does not use regular shapes do map each constituency, we learn, e.g., that the Tories do better in rural areas, while Labour underperformes in these. With notable exceptions, the LibDems are stronger in the rural south. 

While gradual color schemes are most appropriate for chloropeth maps, they only allow to show a single feature. 

Another problem in chloropeth maps is that they do not accurately depict data over geographic space with the use of large blocks. 

Dasyncretic maps address this issue. They use auxiliary information to portray the data more accurately. They intersect geographical objects to filter out spatial information that does not contribute to the data. 

<div>
    <center>
<img src="https://upload.wikimedia.org/wikipedia/commons/7/7e/Utah_Valley_dasymetric_map.png" width="1000"/>
</center>
</div>

Another popular map format are dot maps. Consider the following map from the 1931 Polish census. 

<div>
    <center>
<img src="https://upload.wikimedia.org/wikipedia/commons/2/25/GUS_languages1931_Poland.jpg" width="1000" />
        </center>
</div>

Lets give this map a modern touch! We will draw from [Paul Dziemielas](https://dziemiela.com/personal/interwar_poland.html) geographical boundaries and census results. 

In [None]:
import requests
r = requests.get('https://www.dziemiela.com/personal/Interwar_Poland_1934_20142.json', headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'
})
topoJSON = r.json() # this is in topoJSON format!

In [None]:
topoJSON['objects']['Palatinates']['geometries']

In [None]:
import folium
m = folium.Map(width=1300, height=800, tiles = None,
               location=[53, 23], zoom_start=5)
folium.TileLayer(
    tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Terrain_Base/MapServer/tile/{z}/{y}/{x}',
    attr='Esri',
    name='Esri Satellite'
).add_to(m)

In [None]:
#topoJSON['objects']['Districts']['geometries']#[0]['properties']['GEOID']

In [None]:
folium.TopoJson(topoJSON,
    object_path='objects.Districts', 
    style_function=lambda feature: {
        "fillColor": None,
        "fillOpacity": 0.0,
        "color": "lightgray",
        "weight": 1,
        "dashArray": "1",
    }, overlay=True, control=False).add_to(m)

In [None]:
folium.LayerControl().add_to(m)
m

Lets retrieve the census data from the same source.

In [None]:
import requests, zipfile, io

r = requests.get('https://www.dziemiela.com/personal/Interwar_Poland_1934_20142.zip', headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'
})
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("../data/polish_census")

`fiona` is a module to handle geopackages. We have data for the 1931 and 1921 census, and a school census of 1926. We are only interested in the 1931 census. 

In [None]:
import fiona
fiona.listlayers('../data/polish_census/Interwar_Poland_1934.gpkg')

In [None]:
import geopandas
districts = geopandas.read_file("../data/polish_census/Interwar_Poland_1934.gpkg", 
                                layer='Census_1931_Districts') 
districts.head(3)

In [None]:
print('\n'.join(districts.columns))

Lets craft the data set that is used to plot dots. 

In [None]:
import numpy as np
import pandas as pd
data = districts[['GEOID', 'POLISH', 'UKRAINIAN', 'RUSKI', 
                    'BELARUSIAN', 'LITHUANIAN', 'GERMAN', 'YIDDISH', 'HEBREW']].set_index('GEOID').dropna()
#data.head()

In [None]:
data = data.apply(lambda x: np.floor(x / 10000).astype(int), axis = 1)

In [None]:
data.head()

As for the UK election, choose colors for each category. 

In [None]:
colorpicker = {lang: color for lang, color in zip(data.columns, 
    ['#de3e16', '#f7d914', '#1cbd87', '#36a334', '#b569e0', '#64a8ed', '#b9d676', '#f781b2'])}

In [None]:
import matplotlib.pyplot as plt

y = [0, 1]
x = [1, 1]

fig, axes = plt.subplots(ncols=4,nrows=2, sharex=True, sharey=True,
                         figsize=(5,2), subplot_kw={'xticks': [], 'yticks': []})

for ax, key in zip(axes.flat, colorpicker.keys()):
    ax.plot(x, y)
    ax.fill_betweenx(y, 0, 1, facecolor=colorpicker[key])
    ax.set_xlim(0, 0.1)
    ax.set_ylim(0, 1)
    ax.set_title(str(key))

plt.tight_layout()
plt.show()

Even though topoJSON is a more economical data format, we want to generate random points in each geometric object. To do so, we need to recast the topoJSON in to geoJSON format. 

In [None]:
from pytopojson import feature
feature_ = feature.Feature()
geojson = feature_(topoJSON, 'Districts')

In [None]:
geojson['features'][0] # navigate through... / do not print

In [None]:
gdf = geopandas.GeoDataFrame.from_features(geojson['features'])
gdf.head(2)

In [None]:
gdf['geometry'][2].bounds

In [None]:
gdf['geometry'][2]

Random (on the cartesian plane) points are generated in each object. 

In [None]:
import shapely, random
def generate_random_points(number, GEOID):

    # Select list entry of given object
    polygon = gdf[gdf['GEOID'] == GEOID]['geometry']#[0]
    # Extract bounding box (extent) from the GeoDataFrame
    minx, miny, maxx, maxy = polygon.bounds.squeeze()
    
    # Generate random points within the bounding box
    random_points = []
    while len(random_points) < number:
        random_point = shapely.geometry.Point(random.uniform(minx, maxx), random.uniform(miny, maxy))
        # Check if the point is inside any of the polygons
        if all(random_point.intersects(polygon)):
            random_points.append(random_point)

    return geopandas.GeoDataFrame(geometry=random_points)['geometry']

In [None]:
generate_random_points(2, 'P1613')

In [None]:
generate_random_points(1, 'P1613')[0]

Finally, lets add the dots to the map. 

In [None]:
m = folium.Map(width=1300, height=800, tiles = None,
               location=[53, 23], zoom_start=5)
tile = folium.TileLayer(
    tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Terrain_Base/MapServer/tile/{z}/{y}/{x}',
    attr='Esri',
    name='Esri Satellite'
).add_to(m)

folium.TopoJson(topoJSON,
    object_path='objects.Districts', 
    style_function=lambda feature: {
        "fillColor": None,
        "fillOpacity": 0.0,
        "color": "lightgray",
        "weight": 1,
        "dashArray": "1",
    }, overlay=True, control=False).add_to(m)

for lang, countsvector in dict(data).items():
    color = colorpicker[lang]
    fg = folium.FeatureGroup(name=lang).add_to(m)
    for GEOID, counts in dict(countsvector).items(): 
        for point in generate_random_points(counts, GEOID): 
            folium.CircleMarker(location=[point.y, point.x], 
                    stroke=False,
                    fill=True,
                    color=color, 
                    fill_opacity=1,
                    radius=2).add_to(fg)

In [None]:
folium.LayerControl(collapsed = False).add_to(m)
m 

<div>
    <center>
<img src="https://upload.wikimedia.org/wikipedia/commons/2/25/GUS_languages1931_Poland.jpg" width="1000"/>
</center>
    </div>

So why did the Polish census agency decide for a dot map? Lets create a plurality map. 

In [None]:
district_colors = districts[['GEOID', 'POLISH', 'UKRAINIAN', 'RUSKI', 
                            'BELARUSIAN', 'LITHUANIAN', 'GERMAN', 'YIDDISH', 'HEBREW']].set_index('GEOID').dropna().idxmax(axis=1)
district_colors

In [None]:
colorpicker

Lets add the palatinates as well. 

In [None]:
palatinates = geopandas.read_file("../data/polish_census/Interwar_Poland_1934.gpkg", layer='Census_1931_Palatinates')
palatinate_colors = palatinates[['GEOID', 'POLISH', 'UKRAINIAN', 'RUSKI', 
                                 'BELARUSIAN', 'LITHUANIAN', 'GERMAN', 'YIDDISH', 'HEBREW']].set_index('GEOID').dropna().idxmax(axis=1)

In [None]:
m = folium.Map(width=800, height=800, tiles = None,
               location=[53, 23], zoom_start=5)
base_map = folium.FeatureGroup(name='Basemap', overlay=True, control=False)
folium.TileLayer(
    tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Terrain_Base/MapServer/tile/{z}/{y}/{x}',
    attr='Esri',
    name='Esri Satellite'
).add_to(base_map)
base_map.add_to(m)

folium.TopoJson(topoJSON,
    name = "Districts",
    object_path='objects.Districts', 
    style_function=lambda feature: {
        "fillColor": colorpicker[district_colors[feature['properties']['GEOID']]],
        "fillOpacity": 0.8,
        "color": "lightgray",
        "weight": 1,
        "dashArray": "1",
    }, overlay=False).add_to(m)

folium.TopoJson(topoJSON,
    name = 'Palatinates',
    object_path='objects.Palatinates', 
    style_function=lambda feature: {
        "fillColor": colorpicker[palatinate_colors[feature['properties']['GEOID']]],
        "fillOpacity": 0.8,
        "color": "lightgray",
        "weight": 1,
        "dashArray": "1",
    }, overlay=False).add_to(m)

In [None]:
folium.LayerControl(collapsed = False).add_to(m)
m

The actual map from the census did only consider the categories 'Polish' or 'Other'. 

In [None]:
district_colors = districts[['GEOID', 'POLISH', 'UKRAINIAN', 'RUSKI', 
                            'BELARUSIAN', 'LITHUANIAN', 'GERMAN', 'YIDDISH', 'HEBREW']].set_index('GEOID').dropna()

district_colors = pd.DataFrame({"POLISH": district_colors['POLISH'], 
                                "OTHER": district_colors.drop('POLISH', axis=1).sum(axis=1)}).idxmax(axis=1)

In [None]:
palatinates = geopandas.read_file("../data/polish_census/Interwar_Poland_1934.gpkg", layer='Census_1931_Palatinates')
palatinate_colors = palatinates[['GEOID', 'POLISH', 'UKRAINIAN', 'RUSKI', 
                                 'BELARUSIAN', 'LITHUANIAN', 'GERMAN', 'YIDDISH', 'HEBREW']].set_index('GEOID').dropna()

palatinate_colors = pd.DataFrame({"POLISH": palatinate_colors['POLISH'], 
                                  "OTHER": palatinate_colors.drop('POLISH', axis=1).sum(axis=1)}).idxmax(axis=1)

In [None]:
colorpicker["OTHER"] = '#b9d676'

In [None]:
m = folium.Map(width=800, height=800, tiles = None,
               location=[53, 23], zoom_start=5)
base_map = folium.FeatureGroup(name='Basemap', overlay=True, control=False)
folium.TileLayer(
    tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Terrain_Base/MapServer/tile/{z}/{y}/{x}',
    attr='Esri',
    name='Esri Satellite'
).add_to(base_map)
base_map.add_to(m)

folium.TopoJson(topoJSON,
    name = "Districts",
    object_path='objects.Districts', 
    style_function=lambda feature: {
        "fillColor": colorpicker[district_colors[feature['properties']['GEOID']]],
        "fillOpacity": 0.8,
        "color": "lightgray",
        "weight": 1,
        "dashArray": "1",
    }, overlay=False).add_to(m)

folium.TopoJson(topoJSON,
    name = 'Palatinates',
    object_path='objects.Palatinates', 
    style_function=lambda feature: {
        "fillColor": colorpicker[palatinate_colors[feature['properties']['GEOID']]],
        "fillOpacity": 0.8,
        "color": "lightgray",
        "weight": 1,
        "dashArray": "1",
    }, overlay=False).add_to(m)

In [None]:
folium.LayerControl(collapsed = False).add_to(m)
m 