# Lets make a Choropleth

A choropleth is essentially a data driven map, that changes the map in some regard based on those values.

A very familiar example are the red and blue styles shading of states for US Elections.

## Configure the Map

Maps don't show year over year very well.

To simplify finding this, here is the configuration:

In [None]:
currentYear = 1955;

In [None]:
currentMetric = 'prop';

# Datasets

There are a number of datasets you can choose from when trying to render a choropleth, and most depend on:

* the format (ex: shapefile, geojson, topojson)
* the level of detail (ex: 110m has LESS detail than looking at a map at the 10m level)
* the features of data (ex: countries, counties, rivers - and for the right area)

## Natural Earth

[Natural Earth](https://www.naturalearthdata.com/downloads/) is a public domain map dataset available at 1:10m, 1:50m, and 1:110 million scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software.

Natural Earth was built through a collaboration of many volunteers and is supported by NACIS (North American Cartographic Information Society), and is free for use in any type of project ([see their terms of use](https://www.naturalearthdata.com/about/terms-of-use/)).

## TopoJSON

[TopoJSON](https://github.com/topojson/topojson-specification) is an an open format extension from the [GeoJSON](#geojson) format, that can be converted to and from GeoJSON.

TopoJSON has two special caveats over some other GIS formats:

* it can additionally encode non-geographical data
* it eliminates redundancy - resulting in potentially 80% reduction in file sizes.

For example, the shared boundary between California and Nevada is represented only once, rather than being duplicated for both states.

## GapMinder Life Expectancy Study

The [GapMinder Life Expectancy Study](https://www.gapminder.org/answers/how-does-income-relate-to-life-expectancy/) is a facinating dataset and writeup by the GapMinder group, including Professor Hans Rosling.

We'll access this through the [vega-datasets](https://github.com/vega/vega-datasets) library

It provides: 

Property    | Type   | Description
--          | --     | --
year        | Number | The year of the sample
country     | String | Name of the country
pop         | Number | Population of the country
life_expect | Number | Expected Lifespan within that country at that time
fertility   | Number | Reproduction coefficient

**NOTE: the country names are not standardized** - so we'll need to address that.

## Country ISO Codes - 3166

We will use the [i18n-iso-countries](https://www.npmjs.com/package/i18n-iso-countries) library to help us correlate countries by looking them up to the ISO 3166 standard.

[ISO 3166](https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes) specifies the Numerical, 2 character and 3 character Country Codes, and will allow us to relate the countries to their geometry.

# Libraries

We will use the following libraries:

In [None]:
utils = require('jupyter-ijavascript-utils');
geographyDatastore = require('sane-topojson');
countryISO = require('i18n-iso-countries');
topojson = require('topojson-client');
['utils', 'geographyDatastore', 'countryCodes', 'topojson'];

## topojson-client - `topojson`

The [topojson-client](https://github.com/topojson/topojson-client) library provides a way to:

* convert shape / geojson files to and from topojson files
* access geographic features

## sane-topojson - `geographyDatastore`

In our case, we'll be using the [sane-topojson](https://www.npmjs.com/package/sane-topojson) library as it provides a 'cleaned version' of the Natural Earth GIS data that can be accessed directly within node.

(As opposed to the [world-atlas](https://github.com/topojson/world-atlas) library that is only accessible through CDNs)

We'll be using this to:

* access the country geographies that we will render

## i18n-iso-countries - `countryISO`

The [i18n-iso-countries](https://www.npmjs.com/package/i18n-iso-countries) library will allow us to:

* identify 3 character iso codes for country names (joining)
* verify country names that need manual alignment

Next we want to pull the latest gapminder data.

(As an async method, we can use await to fetch the data)

In [None]:
utils.ijs.await(async ($$, console) => {
    gapMinder = await utils.datasets.fetch('gapminder.json');
    return ['gapMinder'];
});

# Understanding the data

The ultimate goal is to marry the gapMinder data to the geography data.

## GapMinder

First we will want to understand the data in gapMinder:

In [None]:
utils.object.getObjectPropertyTypes(gapMinder)

Next, let's see which years are available.

In [None]:
utils.agg.unique(gapMinder, 'year');

Next let's see how the data spreads across for those years

In [None]:
utils.group.by(gapMinder, 'year')
    .reduce((recordsWithinYear) => ({ isCountryUnique: utils.aggregate.isUnique(recordsWithinYear, 'country') }));

## Distribution of Countries

So lets take a look at the division of countries:

In [None]:
//-- prints all countries, but harder to read
// utils.group.by(gapMinder, 'year')
//     .reduce((yearRecords) => ({ countries: utils.agg.unique(yearRecords, 'country')}))

utils.group.by(gapMinder, 'year')
     .reduce((yearRecords) => ({ numCountries: utils.agg.unique(yearRecords, 'country').length }))

How do countries change year over year?

In [None]:
gapMinderCountries = new Set(utils.agg.unique( gapMinder.filter(r => r.year === 1955), 'country'));

utils.group.by(gapMinder, 'year')
    .reduce((yearRecords) => ({
        countriesDiff: utils.aggregate.notIn(
            yearRecords,
            'country',
            gapMinderCountries
        )}))

Looks like the same countries are available every year

# Translate Countries to ISO Codes

So ultimately we need to translate the countries in the GapMinder set to those supported by the map

(We'll come back to this under the [WorldGeography Organization - geographyDatastore section](#World-Geography-organization---geographyDatastore) below)

In [None]:
topojson.feature(geographyDatastore.world_50m, 'countries').features.map(r => r.id)

## Country Codes

In particular - notice the `id` field under the feature,
in this case they are the [iso 3166 standard of country codes](https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes)

For example:

Country name                            |Official state name             |Sovereignty|Alpha-2 code|Alpha-3 code|Numeric code|Subdivision code links|Internet ccTLD
--                                         |--                                 |--                  |--             |--             |--             |--                       |--               
Islamic Republic of Afghanistan Afghanistan|The Islamic Republic of Afghanistan|UN member state     |AF             |AFG            |004            |ISO 3166-2:AF            |.af 

Notice there are three main codes to understand:

* Alpaa-3 Code - a 3 letter code for the country - ex: 'AFG'
* Alpha-2 Code - a 2 letter code for the country - ex: 'AF'
* Numeric Code - a numeric code for the country - ex: '004' or just '4'

In the case for `sane-topojson`, it uses the three letter `Alpha-3 code`, with other cases like the [topojson/topojson](https://github.com/topojson/topojson) library, uses the Numeric code instead.

In [None]:
utils.array.peekFirst(gapMinder).country;

We can translate that through `countryISO.getSimpleAlpha3Code(countryName, supportedLanguage)`

In [None]:
countryISO.getSimpleAlpha3Code(
    utils.array.peekFirst(gapMinder).country,
    'en'
)

Are there any countries that cannot be translated?

In [None]:
countriesToTranslateManually = [...gapMinderCountries].filter(
    //-- find the ones where there is no iso translation
    (gapMinderCountryName) => !countryISO.getSimpleAlpha3Code(gapMinderCountryName, 'en')
);

Looks like all countries can be translated to ISO

In [None]:
gapMinder = gapMinder.map((record) => ({
    ...record,
    //-- add on the property countryISO 
    countryISO: countryISO.getSimpleAlpha3Code(record.country, 'en')
}));

utils.array.peekFirst(gapMinder);

Looks like they all were assigned, lets just verify they were all translated:

In [None]:
gapMinder.filter(r => !r.countryISO)

In [None]:
// utils.ijs.markdown(`
console.log(`
## World Geography organization - geographyDatastore

Now, lets look at the geography data available.

The data for \`sane-topojson\` is stored is as follows:

* \[top level\]
  * document
    * feature
      * geometries

### Document

Where the documents can be found by Object.keys(atlas) and are as follows:

\`${Object.keys(geographyDatastore).join(', ')}\`

Each representing a dataset (like the world or asia) and the detail level (50m having more detail than at 110m for example)

`);

### Features Available

The Features available are under \`geographyDatastore.[document].objects.[feature name]\`

Different documents can have different features available.

In the case of the \`sane-topojson\`, this is the breakdown
(it seems fairly even across)

In [None]:
new utils.TableGenerator(
    Object.keys(geographyDatastore).map((documentName) => ({
        document: documentName,
        featuresSupported: Object.keys(geographyDatastore[documentName].objects)
    }))
)
    .render()

## Countries

However, instead of accessing directly, we would recommend you use the "topojson" library to access these feature:

ex: topojson.feature(atlas.world_50m, 'countries')

That looks like this:

In [None]:
console.log(
utils.format.ellipsify(
    `topojson.feature(geographyDatastore.world_50m, 'countries'):\n` +
    JSON.stringify(
        topojson.feature(geographyDatastore.world_50m, 'countries'), null, 2
    ), 500
)
)

## Making a simple Map

We can use the geography data to create a simple map

In [None]:
utils.vega.svgFromSpec({
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "width": 500,
  "height": 300,
  "data": {
    values: geographyDatastore.world_50m,
    //-- note the feature is specific to countries - one of the features of the dataset.
    "format": {"type": "topojson", "feature": "countries"}
  },
  //-- projection type from one of the following:
  "projection": {"type": 'naturalEarth1'},
  "mark": {"type": "geoshape", "fill": "lightgray", "stroke": "gray"}
});

//-- other projection types:
// albers,albersUsa,azimuthalEqualArea,azimuthalEquidistant,conicConformal,
// conicEqualArea,conicEquidistant,equalEarth,equirectangular,gnomonic,mercator,
// naturalEarth1,orthographic,stereographic,transverseMercator

What we want to do is change the color of the country based on the metric.

# Map Countries by ISO 3166 Code

In our case, the 'id' property on those features gives us the `ISO 3166 - 3 Alpha` code for the country

In [None]:
topojson.feature(geographyDatastore.world_50m, 'countries').features.map(r => r.id)

We want to append the `properties` object for those countries to include the gapMinder data

In [None]:
topojson.feature(geographyDatastore.world_50m, 'countries').features.filter(feature => !countryISO.isValid(feature.id))

In [None]:
countriesWithIsoCode = topojson.feature(geographyDatastore.world_50m, 'countries').features
    .filter((feature) => feature.id);
countriesWithIsoCode.length;

Let's make a map of the countries by their isoCode

In [None]:
countriesByIsoCode = utils.group.index(
    countriesWithIsoCode, 'id'
)
countriesByIsoCode.size

## Verify GapMinder Aligns to Countries

Now, lets verify  the gapMinder data to those countries

Lets make a set of the iso codes from within the geography

In [None]:
geographyCountryIsoCodes = new Set(Array.from(countriesByIsoCode.values()).map(r => r.id));
geographyCountryIsoCodes.size

And a set of the iso codes from within gap minder

In [None]:
gapMinderIsoCodes = new Set(utils.aggregate.unique(gapMinder, 'countryISO'));
gapMinderIsoCodes.size

and see if there are any iso codes we use in gap minder that are not found:

In [None]:
utils.set.findItemsNotContained(geographyCountryIsoCodes, gapMinderIsoCodes);

Lastly, are there any gapMinder records that do not map to any country?

In [None]:
gapMinder.filter((gapMinderRecord) => {
    const countryISO = gapMinderRecord.countryISO;
    const countryGeography = countriesByIsoCode.get(countryISO);
    return !countryGeography
})

Nope. Looks like we're good to go.

### 110m vs 50m level countries

**NOTE: that the number of countires DOES change at the 110m vs the 50m size**

In [None]:
countryCodes110m = new Set(
    topojson.feature(geographyDatastore.world_110m, 'countries').features.map(feature => feature.id)
);

utils.set.findItemsNotContained(geographyCountryIsoCodes, countryCodes110m);

There are some countries at the 110m level that DO NOT have an id

In [None]:
topojson.feature(geographyDatastore.world_110m, 'countries').features.filter(feature => !feature.id).length

And there are 63 countries at the 50m level NOT in the 110m level

In [None]:
Array.from(
    utils.set.findItemsNotContained(countryCodes110m, geographyCountryIsoCodes)
)

# Making maps

As a refresher, we can make a simple map like so:

In [None]:
utils.vega.svgFromSpec({
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "width": 500,
  "height": 300,
  "data": {
    values: geographyDatastore.world_50m,
    //-- note the feature is specific to countries - one of the features of the dataset.
    "format": {"type": "topojson", "feature": "countries"}
  },
  //-- projection type from one of the following:
  "projection": {"type": 'naturalEarth1'},
  "mark": {"type": "geoshape", "fill": "lightgray", "stroke": "gray"}
});

//-- other projection types:
// albers,albersUsa,azimuthalEqualArea,azimuthalEquidistant,conicConformal,
// conicEqualArea,conicEquidistant,equalEarth,equirectangular,gnomonic,mercator,
// naturalEarth1,orthographic,stereographic,transverseMercator

## Merge the Data

For simplicity's sake, we will update the records on the Geography to have a `mapValue` property.

(There are ways to do the transformations within Vega, but they are complex and difficult to troubleshoot,
so we will handle them in a different doc, with an example below just for demonstration).

### Transformation function

Function that determines a metric for a given year and countryISO code

In [None]:
getCountryValue = (metric, year, countryISO) => utils.array.peekFirst(
        gapMinder.filter((r) => r.year === year && r.countryISO === countryISO),
        {}
    )[metric];

Let's verify it gets us a value for a specific year and date

In [None]:
getCountryValue('life_expect', 1955, 'AFG');

Next, lets verify the value returns null, if the record cannot be found

In [None]:
getCountryValue('life_expect', 1955, null) === null

## Create the Choropleth Data

Now let's create a specific version of the data we can use for charting.

(Note - in an immutable manner to avoid race conditions between cells)

In [None]:
generateMapData = (metric, year) => topojson.feature(geographyDatastore.world_50m, 'countries')
    .features
    .map((entry) => ({ mapValue: getCountryValue(metric, year, entry.id), ...entry }));

Then we check if it worked

In [None]:
utils.array.peekFirst(gapMinder)

and then check the country for that year

In [None]:
generateMapData('pop', 1955)
    .filter(entry => entry.id === 'AFG')

It looks like the two numbers match, so let's run the chart.

## Create the Choropleth

We can create the map for each of the countries that have values, and for a specific year.

In [None]:
utils.vega.svgFromSpec({
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "mark": {
    "type": "geoshape",
    "stroke": "white"
  },
  "data": {
      "values": generateMapData('life_expect', 1955)
  },
  "encoding": {
    "color": {
      "field": "mapValue",
      "type": "quantitative",
      "scale": {
        "scheme": "spectral"
      }
    }
  },
  "projection": {
    "type": "naturalEarth1",
  },
  "width": 900,
  "height": 500
});

## Why are the other countries missing?
    
Vega-Lite removes records with null values. [See issue #3261 for more](https://github.com/vega/vega-lite/issues/3261)
                                                                       
We want instead to show those countries, but have then show up as grey.
                                                                       
To show the null values you must add in the following `config`:

```
  "config": {
    "mark": {"invalid": null}
  }
```

We also want to show the null values as our own color of our choosing,
so we add a conditional to explicitly set the color:

```
{ "condition": {
    "test": { not: "isDefined(datum.mapValue)" },
    "value": "darkgrey"
}
```

Changing the `color` attribute to:

```
    "color": {
      "condition": {
        "test": { not: "isDefined(datum.mapValue)" },
        "value": "darkgrey"
      },
      "field": "mapValue",
      "type": "quantitative",
      "scale": {
        "scheme": "spectral"
      }
    }
```

With the full spec as follows:

In [None]:
utils.vega.svgFromSpec({
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "mark": {
    "type": "geoshape",
    "stroke": "white"
  },
  "data": {
      "values": generateMapData('life_expect', 1955)
  },
  "encoding": {
    "color": {
      "condition": {
        "test": { not: "isDefined(datum.mapValue)" },
        "value": "darkgrey"
      },
      "field": "mapValue",
      "type": "quantitative",
      "scale": {
        "scheme": "spectral"
      }
    }
  },
  "projection": {
    "type": "naturalEarth1",
  },
  "width": 900,
  "height": 500,
  "config": {
    "mark": {"invalid": null}
  }
});

# Bonus

@TODO

While we said we wouldn't get into it further in this document,
here is an example that instead aligns the gapMinder values within the Vega-Lite specification:

* data of the geographyStore is loaded the same as above
* it is then transformed through the [lookup](https://vega.github.io/vega-lite-v3/docs/lookup.html)
  * copying the `life_expect`, `pop` and `fertility` fields over
* the field we want can be 'parameterized' under params as the 'chartField' variable
* we then calculate a new value called `chartValue` based on the param
* default colors are set the same as above

In [None]:
utils.vega.svgFromSpec({
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "params": [
    { "name": "chartField", "value": "life_expect"}
  ],
  "width": 900,
  "height": 500,
  "data": {
    // "url": "https://vega.github.io/vega-lite/examples/data/us-10m.json",
    "values": geographyDatastore.world_50m,
    "format": {
      "type": "topojson",
      "feature": "countries"
    }
  },

  "transform": [{
    "type": "lookup",
    "lookup": "id",
    "from": {
      "data": {
        "values": gapMinder.filter(r => r.year === 1955)
      },
      "fields": ["life_expect", "pop", "fertility"],
      "key": "countryISO"
    }
  },
  {
    "calculate": "datum[chartField]",
    "as": "chartValue",
  }],
  "projection": {
    "type": "naturalEarth1"
  },
  "mark": "geoshape",
  "encoding": {
    "color": {
      "condition": {
        "test": "datum.chartValue === null",
        "value": "darkgrey"
      },
      "field": "chartValue",
      "type": "quantitative",
      "scale": {
        "scheme": "spectral"
      },
    }
  },
  "config": {
    "mark": {"invalid": null}
  }
});

In [None]:
utils.vega.svgFromSpec({
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "width": 500,
  "height": 300,
  "data": {
    "url": "https://vega.github.io/vega-lite/examples/data/us-10m.json",
    "format": {
      "type": "topojson",
      "feature": "counties"
    }
  },
  "transform": [{
    "lookup": "id",
    "from": {
      "data": {
        "url": "https://vega.github.io/vega-lite/examples/data/unemployment.tsv"
      },
      "key": "id",
      "fields": ["rate"]
    }
  }],
  "projection": {
    "type": "albersUsa"
  },
  "mark": "geoshape",
  "encoding": {
    "color": {
      "field": "rate",
      "type": "quantitative"
    }
  }
});


# Appendix

## Other Formats

### ShapeFile

One of the more heavily standardized formats are ShapeFiles - meant as a way to spacially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature. The format is meant to provide a standard for interoperability between ESRI systems and other GIS software.

The [ShapeFile](https://en.wikipedia.org/wiki/Shapefile) format is a semi-open standard designed and regulated by [Environmental Systems Research Institute - ESRI](https://en.wikipedia.org/wiki/Esri) - an international supplier of [GIS - GeoCoded data](https://en.wikipedia.org/wiki/Geographic_information_system).

### GeoJSON

An alternative format is [GeoJSON](https://geojson.org/) is an [open standard format - rfc7946](https://tools.ietf.org/html/rfc7946) designed for representing simple geographical features, along with their non-spatial attributes and is based on the JSON format.

Various providers, like [Natural Earth](https://www.naturalearthdata.com/) 