# Country maps prototype V2

### Summary
The polygon geometries of all countries are clustered in a distance based manner by using a 2-pass algorithm. The resulting list of geometry group is then visualized in a ipyleaflet map, which allows to label the groups. The list is continuously saved as a pickle object, therefore saving any changes made to it.

#### Setup
For dependency handling, create a new virtual environment from the requirements.txt, then add it as a jupyter kernel to use it in this notebook:
1. Assuming you have virtualenv wrapper, create a new virtual environment using *mkvirtualenv venv_name*.
2. Enter virtual environment using *workon venv_name*.
3. Install dependencies from requirements file using *pip install -r requirements.txt*.
4. Install jupyter kernel using *ipython kernel install --user --name=venv_name*.
5. Select newly installed kernel in jupyert notebook via *Kernel -> Change kernel*.
6. (optional) To uninstall the kernel later, use *jupyter kernelspec uninstall kernel_name*.

In [None]:
# Hidden depedency of geopandas: descartes
import geopandas as gpd
import matplotlib.pyplot as plt
import pickle
from ipyleaflet import *
import ipywidgets as widgets

### Preprocess Datasets
Datasets are from [naturalearthdata](https://www.naturalearthdata.com/downloads/10m-cultural-vectors/) with public license, meaning they are free to use for everybody. For countries the dataset **Admin 0 – Countries** is used.

In [None]:
all_countries_4326 = gpd.read_file('data/ne_50m_admin_0_countries/ne_50m_admin_0_countries.shp')

### Algorithm
The following algorithm groups performs a distance based clustering on the polygons of each country in order to facilitate plotting. A two pass approach is applied, whereby in the first pass all polygons within a smaller radius around the largest polygon are grouped together. In a second pass, all polygons within a larger radius around the resulting group of pass 1 are also merged together. All merged polygons from pass1 and pass2 are then removed and the process is repeated until no more polygons are left. The two pass approach aims at reducing artifacts that arise when a country consists of several large polygons in close distance and some smaller polygon groups further away (e.g. Japan).
### Ouput
A list of dictionaries, of which each describes a single group generated by the algorithm. Each country therefore has 1 or several dictionary objects in the list belonging to it. Each dictionary contains the following:
- The country name and ISO signature that this dictionary belongs to.
- A polygon or multipolygon geometry.
- A name that corresponds to the polygon group. It is here assumed that the group with the largest area is the country. All other groups are labelled "Unknown X" with X being a numbering.

### Pseudo Algorithm
0. Read countries and project to EPSG 3857 (pseudomercator).
1. Iterate over all countries, and perform the following...
2. Split the country geometry into single polygons.
3. Pass 1: Find largest polygon, create a buffer around it and use the resulting bounding box to find all polygons that have their centroid within and merge all of them into a multipolygon.
4. Pass 2: Using the resulting geometry of step 3, create a larger buffer and again find all polygons with their centroid within the resulting bounding box and merge them.
5. Remove the resulting multipolygon from the two passes from the dataset and add it to a geometry list.
6. If there still are polygons left, go to step 3.
7. If the country consists of a single polygon, skip steps 3-6 and directly add it to the geometry list.
8. Convert the resulting geometries from the geometry list back to EPSG 4326 and add them to the result list.

In [None]:
country_list = []
dissolve_buffer_m_step1 = 500000
dissolve_buffer_m_step2 = 1000000
# 0. Read countries and project to EPSG 3857 (pseudomercator).
# Project layer to EPSG 3857, because shapely only supports buffer calculations on cartesian plane.
all_countries_3857 = all_countries_4326.to_crs(epsg=3857)
# 1. Iterate over all countries, and perform the following...
for index, country in all_countries_3857.iterrows():
    print(f'Processing: {country["ADMIN"]} ({country["ISO_A3"]})')
    geometry_groups = []
    geoseries = []
    if country.geometry.geom_type == 'Polygon':
        # Process single polygon
        # 7. If the country consists of a single polygon, skip steps 3-6 and directly add it to the geometry list.
        geometry_groups.append(country.geometry)
    else:
        # Process multipolygon
        # 2. Split the country geometry into single polygons.
        country_parts = gpd.GeoDataFrame({'geometry': country.geometry}, crs=all_countries_3857.crs)
        while country_parts.shape[0] > 0:
            # 3. Pass 1: Find largest polygon, create a buffer around it and use the resulting bounding box to find all polygons that have their centroid within and merge all of them into a multipolygon.
            primary_step1 = country_parts[country_parts.area == country_parts.area.max()]
            primary_buffered_bbox_step1 = primary_step1.buffer(distance=dissolve_buffer_m_step1).envelope.iloc[0]
            within_primary_bool_step1 = country_parts.centroid.within(primary_buffered_bbox_step1)
            merged_primary_step1 = country_parts.loc[within_primary_bool_step1].unary_union
            if country_parts.shape[0] == 0:
                # If no more geometries are left, add primary from step 1 and break.
                geometry_groups.append(merged_primary_step1)
                break
            # 4. Pass 2: Using the resulting geometry of step 3, create a larger buffer and again find all polygons with their centroid within the resulting bounding box and merge them.
            primary_buffered_bbox_step2 = merged_primary_step1.buffer(distance=dissolve_buffer_m_step2).envelope
            within_primary_bool_step2 = country_parts.centroid.within(primary_buffered_bbox_step2)
            merged_primary_step2 = country_parts.loc[within_primary_bool_step2].unary_union
            # 5. Remove the resulting multipolygon from the two passes from the dataset and add it to a geometry list.
            country_parts = country_parts.loc[within_primary_bool_step2 == False]
            # APPEND PRIMARY FOUND IN PASS 2
            geometry_groups.append(merged_primary_step2)
            # 6. If there still are polygons left, go to step 3.

    # 8. Convert the resulting geometries from the geometry list back to EPSG 4326 and add them to the result list.
    # Create a list of country parts that allows easy navigation for subsequent name entering. Assume largest Polygon is has country name as title.
    geometry_groups = [gpd.GeoSeries(geom, crs=3857).to_crs(epsg=4326).geometry for geom in geometry_groups]
    country_list.append({
        'country': country['ADMIN'],
        'ISO_A3': country['ISO_A3'],
        'name': country['ADMIN'],
        'geometry': geometry_groups.pop(0),
    })
    for i, geometry in enumerate(geometry_groups):
        country_list.append({
            'country': country['ADMIN'],
            'ISO_A3': country['ISO_A3'],
            'name': f'Unknown {i}',
            'geometry': geometry
        })

# Annotation of country parts
The following two cells intiialize an ipyleaflet map, that interactively lets you annotate the country parts. use the following controls:
- Zoom via slider OR **mouse wheel**.
- Click the **Next** and **Previous** buttons to jump from country part to country part.
- The **Country** textfield shows you to what country the current country prt belongt to. It is not editable.
- Enter a fitting name (e.g. Alaska, Azores) for the country part into the textfield **Name** in the upper right.
- **Note:** Each time you navigate from one place to another, your changes are automatically saved in memory as well as a pickle object of the list.

In [None]:
m = Map(center=(0,0), zoom=9, scroll_wheel_zoom=True, zoom_control=False, basemap=basemaps.Esri.WorldStreetMap)
marker = Marker(location=(0,0), draggable=False)
m.add_layer(marker)

zoom_slider = widgets.IntSlider(description='Zoom level:', min=0, max=15, value=7)
widgets.jslink((zoom_slider, 'value'), (m, 'zoom'))
widget_control_zoom = WidgetControl(widget=zoom_slider, position='topleft')
m.add_control(widget_control_zoom)

country_text = widgets.Text(
    value='',
    placeholder='',
    description='Country:',
    disabled=True,
    layout={'width': '500px'}
)

text_input = widgets.Text(
    value='',
    placeholder='',
    description='Name:',
    disabled=False,
    layout={'width': '500px'}
)

goto_next = widgets.Button(
    description='Next',
    disabled=False,
    button_style='',
    tooltip='Next geometry',
    icon=''
)

goto_pevious = widgets.Button(
    description='Previous',
    disabled=False,
    button_style='',
    tooltip='Prevous geometry',
    icon=''
)

In [None]:
index = 0

def goto_index(index):
    text_input.value = country_list[index]['name']
    country_text.value = country_list[index]['country']
    centroid = country_list[index]['geometry'].centroid.iloc[0]
    center = (centroid.y, centroid.x)
    m.center = center
    marker.location = center
    
def next_clicked(button):
    global index
    country_list[index]['name'] = text_input.value
    pickle.dump(country_list, open('country_list.p', 'wb'))
    index += 1
    if index == len(country_list):
        index = 0
    goto_index(index)
    
def previous_clicked(button):
    global index
    country_list[index]['name'] = text_input.value
    pickle.dump(country_list, open('country_list.p', 'wb'))
    index -= 1
    if index < 0:
        index = len(country_list) - 1
    goto_index(index)
    
goto_next.on_click(next_clicked)
goto_pevious.on_click(previous_clicked)

widget_control_country = WidgetControl(widget=country_text, position='topright')
m.add_control(widget_control_country)
widget_control_input = WidgetControl(widget=text_input, position='topright')
m.add_control(widget_control_input)
widget_control_next = WidgetControl(widget=goto_next, position='bottomright')
m.add_control(widget_control_next)
widget_control_previous = WidgetControl(widget=goto_pevious, position='bottomleft')
m.add_control(widget_control_previous)

goto_index(index)
m

Running the cell above should show a map. If this is not the case, try to reboot your jupyter notebook.