# 03 GDV Classification Schemes
---

Choropleth map classifcation scheme examples.

In [None]:
import geopandas as gpd
import mapclassify
import matplotlib.pyplot as plt
import seaborn as sns

###  Create GeoDataFrame from GeoPackage (GPKG)

In [None]:
# Create a GeoPandas GeoDataFrame from a GeoPackage (GPKG)
osogs_pop_est_lad_merged = gpd.read_file(
    filename="../../data/lad-os-open-greenspace-area-per-head.gpkg"
)

In [None]:
# List GeoDataFrame columns
osogs_pop_est_lad_merged.columns

In [None]:
# Return head of GeoDataFrame
osogs_pop_est_lad_merged.head()

### Data classification
---

Categorising data into groups for plotting can be seen as a classification problem. The method chosen should take into account the goal of the map (e.g. highlighting outliers vs depicting the overall distribution of values). To help decide on the breaks to use please see these helpful links [here](https://censusgis.wordpress.com/students/lesson-5-visualisation-cartographic-practice/), and [here](https://gisgeography.com/choropleth-maps-data-classification/).

The [`mapclassify`](https://github.com/pysal/mapclassify) Python package can be used for choropleth map classification. mapclassify classification schemes can be used via the `scheme` key word argument in both a GeoDataFrame [`plot`](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html) or [`explore`](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.explore.html) call.  

The content below looks at four commonly used classification schemes:

* Equal Interval
* Jenks (Natural Breaks)
* Quantiles
* Standard Deviation

### Equal Interval

The boundaries between classes are at regular intervals. It is simple and easy to interpretate. However, it considers the extreme values of the distribution and, in some cases, this can result in one or more classes being sparse.

In [None]:
# Equal Interval scheme
ei5 = mapclassify.EqualInterval(osogs_pop_est_lad_merged["area2population"], k=5)

ei5

### Jenks Natural Breaks

The classes are defined according to natural groupings of data values. It minimises differences between points within a class and maximises between class differences (clusters data into similar groups).

In [None]:
# Fisher Jenks scheme
fj5 = mapclassify.FisherJenks(osogs_pop_est_lad_merged["area2population"], k=5)

fj5

### Quantiles

Splits the data into classes so there are the same number of data values in each class. Numeric size of each class is rigidly imposed and can lead to misleading interpretation. The placement of boundaries may assign almost identical attributes to adjacent classes, or features with quite widely different values in the same class.

In [None]:
# Quantiles scheme
q5 = mapclassify.Quantiles(osogs_pop_est_lad_merged["area2population"], k=5)

q5

### Standard Deviation

Shows the distance of the observation from the mean and then generates class breaks in standard deviation measures above and below it. Can be good for identifying outliers.

In [None]:
# Standard mean scheme
msd = mapclassify.StdMean(osogs_pop_est_lad_merged["area2population"])

msd

In [None]:
# Create figure and axes objects
f, axs = plt.subplots(nrows=2, ncols=2, figsize=(15, 10))

# Set title
f.suptitle("Classification Scheme Profiles", fontsize=16, y=1.01)

# Flatten axes object
axs = axs.flatten()

# Iterate over schemes
for i, scheme in enumerate([ei5, fj5, q5, msd]):
    # Plot Kernel Density Estimate (KDE) plot showing distribution of areas per head
    sns.kdeplot(osogs_pop_est_lad_merged["area2population"], ax=axs[i], fill=True)

    # Add vertical lines at scheme class divisions
    for cut in scheme.bins:
        axs[i].axvline(cut, color="grey", linewidth=0.75)

    # Draw ticks along the x axes showing area per head observations
    sns.rugplot(
        osogs_pop_est_lad_merged["area2population"], height=0.05, color="red", ax=axs[i]
    )

    # Set title
    axs[i].set_title(f"{scheme.name}")