# Geospatial data visualisation

This lab will demonstrate how to generate exploratory and interactive visualisations of geospatial data including multispectral satellite images, point-based samples with in a field, global maps of crop yield, and crop types in orchards and plantations. This lab will provide an introduction to:

* raster and vector geospatial data
* generating interactive visualisations of vector datasets
* generating static and interactive visualisations of raster data

This week the focus will be on quick, exploratory, and interactive visualisations of geospatial data. For a more detailed discussion of visualising geospatial data please read <a href="https://clauswilke.com/dataviz/geospatial-data.html" target="_blank">Wilke (2019)</a>. For a focus on generating cartographic quality static maps please see the open course <a href="https://courses.spatialthoughts.com/python-dataviz.html#matplotlib-basics" target="_blank">Mapping and Data Visualization with Python</a>.

## Setup

### Run the labs

You can run the labs locally on your machine or you can use cloud environments provided by Google Colab. **If you're working with Google Colab be aware that your sessions are temporary and you'll need to take care to save, backup, and download your work.**

<a href="https://colab.research.google.com/github/geog3300-agri3003/coursebook/blob/main/docs/notebooks/week-2_2.ipynb" target="_blank">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

### Download data

If you need to download the data for this lab, run the following code snippet. 

In [None]:
import os
import subprocess

if "data_lab-2_2" not in os.listdir(os.getcwd()):
    subprocess.run('wget "https://github.com/geog3300-agri3003/lab-data/raw/main/data_lab-2_2.zip"', shell=True, capture_output=True, text=True)
    subprocess.run('unzip "data_lab-2_2.zip"', shell=True, capture_output=True, text=True)
    if "data_lab-2_2" not in os.listdir(os.getcwd()):
        print("Has a directory called data_lab-2_2 been downloaded and placed in your working directory? If not, try re-executing this code chunk")
    else:
        print("Data download OK")

### Install packages

If you're working in Google Colab, you'll need to install the required packages that don't come with the colab environment.

In [None]:
if 'google.colab' in str(get_ipython()):
    !pip install rioxarray
    !pip install mapclassify
    !pip install rasterio

## Geospatial data

Geospatial data represents geographic features or phenomenon as data in computer program or file. 

There are two main types of geospatial data: 

* **vector data** - point, line, or polygon geometries
* **raster data** - images and arrays

There are two components to geospatial data:

* **Positional information** describing location, shape, and extent (e.g. an `(x, y)` coordinate pair representing the location of a weather station)
* **Attribute information** describing characteristics of the phenomenon or entity (e.g. a name:value pair recording the name of the weather station `name:'Perth Airport'`)

## Import modules

In [None]:
import os

import rioxarray as rxr
import xarray as xr
import plotly.express as px
import numpy as np
import geopandas as gpd

import plotly.io as pio

# setup renderer
if 'google.colab' in str(get_ipython()):
    pio.renderers.default = "colab"
else:
    pio.renderers.default = "jupyterlab"

In [None]:
data_path = os.path.join(os.getcwd(), "data_lab-2_2")

## Raster data in Python

### Raster data model

Raster data breaks the Earth's surface up into a grid of cells (pixels). Each pixel is assigned a value that corresponds to the geographic feature or phenomenon of interest. For example, pixels in a raster precipitation dataset would be assigned a numeric value that represents the amount of precipitation that fell at that location. Pixels in a land cover map would have an integer value that corresponds to a land cover class label. The values assigned to pixels in a raster dataset are the attribute information. 

![](https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/week-2-raster-data-lc.png)

The size of the pixels relative to their position on the Earth's surface determines the spatial detail that can be resolved. A land cover map with pixels that represent a 1 km x 1 km portion of the Earth's surface will not be able to identify features such as individual buildings.

The figure below shows the 2018 European Space Agency (ESA) Climate Change Initiative (CCI) land cover map. Each pixel represents a 300 m x 300 m area on the Earth’s land surface and a pixel can only represent a single land cover type. If you look at the bottom two zoomed in maps you can see some limitations of representing land cover using 300 m x 300 m spatial resolution pixels. The shape of land cover features are poorly represented by the “block-like” arrangement of pixels and there is variation in land cover within a single pixel (a mixed pixel problem).

![](https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/week-2-raster-data-model.png)

Raster data represents geographic features and variables (e.g. elevation, reflectance in UAV images) as a grid of values (pixels). In Python, a data structure called an array is used to store and organise pixels in a raster dataset. Typically, <a href="https://numpy.org/doc/stable/user/absolute_beginners.html#what-is-an-array" target="_blank">NumPy `ndarray`</a> objects are used for storing raster data in Python programs. 

### Arrays: NumPy `ndarray`s

NumPy is a library used for scientific and numerical computing and is based around an N-dimensional `ndarray` object. An `ndarray` is a grid of elements of the same data type. The dimensions of a NumPy `ndarray` are called axes. NumPy `array`s can be created from sequences of values (e.g. stored in lists, tuples, other `ndarray`s).

![](https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/week-2-numpy-ndarray.jpg)

We can create a simple 2-dimensional `ndarray` using the `array()` function. A `ndarray` with 2-dimensions is a matrix with rows arranged on the 0 axis and columns arranged on the 1 axis.

In [None]:
# create a 2D ndarray
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
arr2d

The rank (or number of dimensions) of a `ndarray` is the number of axes.

In [None]:
# the rank (ndim) of an ndarry is the number of axes 
print(f"the rank of the ndarray is {arr2d.ndim}")

The `shape` of an `ndarray` tells us the size of each axis (how many elements are arranged along that axis). 

In [None]:
# the shape of the ndarray 
print(f"the shape of the ndarray is {arr2d.shape}")

`ndarray`s can be multidimensional. They can have more than two dimensions. Remote sensing images typically comprise multiple 2-dimensional arrays with each array corresponding to a raster of reflectance measured in a particular wavelength. This 3-dimensional raster data structure can be represented as a NumPy `ndarray` with the bands dimension on axis 0 (each band is a raster for a given wavelength), rows (height of each raster) on axis 1, and columns (width of each raster) on axis 2. 

Let's create a `ndarray` with 3-dimensions. 

In [None]:
# create a 3D ndarray
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d

In [None]:
print(f"the rank of the ndarray is {arr3d.ndim}")
print(f"the shape of the ndarray is {arr3d.shape}")

The concept of N-dimensional arrays can be extended further. For example, a 4-dimensional `ndarray` could store a sequence of 3-dimensional `ndarray`s where the fourth dimension is time and the object represents remote sensing images captured across multiple dates. 

### Subsetting NumPy ndarrays

A subsetting opertation is when you select values from a NumPy `ndarray` object based on their index locations. These operations are generally referred to as indexing and slicing when working with NumPy `ndarray` objects.

![](https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/week-2-subset-numpy-ndarray.jpg)

We can extract a value from a NumPy `ndarray` based on its index location. For example, the first element of a 2-Dimensional `ndarray` is at location `[0, 0]` (i.e. the 0th row and 0th column). 

In [None]:
first_element = arr2d[0, 0]
print(first_element)

We can use the `:` symbol to specify slices of a NumPy `ndarray` to subset. For example, the following are three different ways of slicing the first two rows.

Note that the slice is not inclusive of the index location after the `:` symbol. So, `arr2d[0:2, ]` would select the first two rows of `arr2d` - row 0 and row 1 (remember Python indexes from 0).

In [None]:
two_rows_1 = arr2d[0:2, ]
print(two_rows_1)

two_rows_2 = arr2d[0:2]
print(two_rows_2)

two_rows_3 = arr2d[:2]
print(two_rows_3)

We can use multiple slices for different axes. For example, if we wanted to subset values from a selection of rows and columns. 

In [None]:
two_rows_cols = arr2d[:2, 1:]
print(two_rows_cols)

It is important to remember that subsetting a NumPy `ndarray` using index locations and slicing only considers the location of elements within an array. You will need to consider and understand how locations within an array relate to "real world" dimensions such as geographic location or time.

### Xarray

<a href="" target="_blank">Xarray</a> is a Python package that builds on top of NumPy's array-based data structures, but provides extra tools and functions that are useful for working with geospatial and Earth Science datasets. For example, `xarray.DataArray` data structures are objects that store multidimensional arrays of raster values and also store metadata information that describe the raster values. 

`xarray` also provides convenient functions for reading raster data from geospatial data files on disk into memory as `xarray.DataArray` objects which we can use in our Python programs while retaining geographic and temporal information about the raster values stored in the array.

Specifically, while a NumPy `ndarray` stores just the raster values and has some properties such as the `shape` (number of elements along each axis) and `ndim` (the dimensions of the array) it does not explicitly store any geospatial, temporal, or other geographic metadata. `xarray` solves this problem by reading raster data into an `xarray.DataArray` object with:

* `values`: the multidimensional array of raster values
* `dims`: a list of names for the dimensions of the array (e.g. instead axis 0 describing the 0th (row) dimension of an array that dimension can have a descriptive label such as longitude)
* `coordinates`: a `list` of array-like objects that describe the location of an array element along that dimension (e.g. a 1D array of longitude values describing the location on the Earth's surface for each row in the array)
* `attrs`: a `dict` of metadata attributes describing the dataset

`xarray.DataArray` objects can be stored within a larger container called `xarray.Dataset`. An `xarray.Dataset` can store many `xarray.DataArray` objects that share `dims` and `coordinates`. This is useful if you have different arrays of different `Variables` that correspond to the same locations and time-periods (e.g. you could have a separate array for temperature and precipitation values organised within a single `xarray.Dataset`).

![Schematic of an xarray.Dataset (source: xarray Getting Started)](https://docs.xarray.dev/en/stable/_images/dataset-diagram.png)

**Why is `xarray` useful for geospatial data?** 

* The `dims` and `coordinates` of an `xarray.DataArray` mean we can subset values from an array using latitude, longitude, time, or whatever a coordinate describes; we're not just restricted to subsetting values based on their index location within an array
* `xarray.Dataset` objects provide a container to store multidimensional arrays (e.g. many variables and time points) that are common in geography, Earth Sciences, meteorology, and agriculture. For example, multispectral satellite images of the same location over time; arrays of different meteorological variables)
* useful functions for reading, analysing and visualising raster or array-like geospatial data that are common across many spatial data science workflows

### Data input

The `rioxarray` package provides tools for reading and writing raster geospatial data files into `xarray.DataArray` objects.

Let's pass the path to a GeoTIFF file of raster data into the `rioxarray` `open_rasterio()` function:

In [None]:
s2_summer_path = os.path.join(data_path, "week-2-s2-summer-2020.tif")
rds = rxr.open_rasterio(s2_summer_path)

We have used `rioxarray` to read raster data stored in a GeoTIFF file into our program as an `xarray.DataArray` object referenced by the variable `rds`. We can print the `xarray.DataArray` object to inspect its metadata. 

In [None]:
rds

The raster data stored in the GeoTIFF file is a satellite image of a field in the Wheatbelt. The data is captured by the <a href="https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED" target="_blank">Sentinel-2</a> satellite and it measures reflectance in many spectral bands (e.g. blue light, green light, red light, and near infrared light ...). This raster dataset has 23 bands, including some ancillary bands providing information about image quality and atmospheric conditions when the image was captured (we want to know if clouds are obscuring the satellite's view of the land surface).  

The attributes property of `rds` stores geospatial metadata such as the coordinate reference system (CRS) and the extent of the dataset. We can access this information via the `rds` object's `rio` accessor

In [None]:
print('CRS:', rds.rio.crs)
print('Resolution:', rds.rio.resolution())
print('Bounds:', rds.rio.bounds())
print('Width:', rds.rio.width)
print('Height:', rds.rio.height)

The raster data values are stored as arrays within the `xarray.DataArray` object and can be accessed via the `values` property.

In [None]:
arr = rds.values
print(f"the shape of the array is {arr.shape}")
arr

In subsequent labs we will explore `xarray` and raster data formats in more detail. For now, let's focus on visualising raster data stored in `xarray.DataArray` objects.

### Static images

`xarray.DataArray` objects have a <a href="https://docs.xarray.dev/en/stable/generated/xarray.plot.imshow.html" target="_blank">`plot.imshow()`</a> method that will render array based data as an image. Band 4 of the `xarray.DataArray` stores reflectance of red light off the Earth's surface. Let's plot it: 

In [None]:
rds.sel(band=4).plot.imshow()

The default visualisation using `plot.imshow()` returns a static image with `xarray.DataArray` labels and coordinates plotted. The default colour palette is viridis (yellow-green-blue shades). However, we can change this to a colour palette that relates to red reflectance using the `cmap` parameter of the `plot.imshow()` method. Let's use the `"Reds"` colour palette here. 

In [None]:
rds.sel(band=4).plot.imshow(cmap="Reds")

#### Recap quiz

**The blue band (storing reflectance of blue light) in Sentinel-2 images is band 2 in the `xarray.DataArray` object referenced by `rds`. Can you plot blue band reflectance and select a sensible colour palette for visualing the spatial variation in blue reflectance?**

**Hint: you can find a list of colour palettes <a href="https://matplotlib.org/stable/users/explain/colors/colormaps.html#classes-of-colormaps" target="_blank">here</a>.**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>
    
```python
rds.sel(band=2).plot.imshow(cmap="Blues")
```
</details>

**The plots that you have generated so far have the `xarray.DataArray` labels attached to them (e.g. plot titles and all the band names listed by the colourbar). Can you look at the different parameters of the <a href="https://docs.xarray.dev/en/stable/generated/xarray.plot.imshow.html" target="_blank"><code>plot.imshow()</code></a> and identify which parameter we can use to stop labels being rendered? Generate a plot of green reflectance (band 3) without labels and using the `"Greens"` colour palette.**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

Here, we need to use the <code>add_labels</code> parameter and pass in False as an argument. 

```python
rds.sel(band=3).plot.imshow(cmap="Greens", add_labels=False)
```
</details>

### Colour and composite images

A particular colour is defined by the intensity of light in different parts of the visible spectrum (e.g. yellow is a mixture of light in red and green wavelengths).

Colour is represented by combinations (addition) of red, green, and blue light. Red, green, and blue are primary colours and combine to form white. An absence of red, green, and blue is black. Secondary colours can be formed by the addition of primary colours of varying intensities (e.g. yellow is the addition of red and green, magenta is the addition of red and blue, and cyan is the addition of green and blue). 

Computer displays consist of red, green, and blue sub-pixels, which when activated with different intensities, are perceived as different colours. The range of colours that can be displayed on a computer display is called the gamut. Colour in computer programs is represented as a three byte hexadecimal number with byte 1 corresponding to red, byte 2 corresponding to green, and byte 3 corresponding to blue. Each byte can take the range of 0 to 255 in decimal. 0 indicates the absence of a colour and 255 indicates saturation of that colour:

* white: 255 255 255
* black: 0 0 0
* red: 255 0 0 
* green: 0 255 00
* blue: 0 0 255

![Additive and subtractive colour models (source: CRCSI (2017))](https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/week-2-colour-models-crcsi.png)

Computer displays represent colour through varying the intensity of sub-pixel displays of red, green, and blue light. Variability in data values in multiband rasters can be visualised by relating data values in one band to the intensity of one of the primary colours on the computer display. Visualising a multiband raster in this way creates an additive RGB or colour composite image - it is called a composite image because each pixel is a composite of red, green, and blue light.

Above we rendered the red, green, and blue band reflectance from the Sentinel-2 image separately. However, if we combine these reflectance measures into a composite image (e.g. where red reflectance is represented by sub-pixel intensity of red light) we can create a true colour image as if we were looking down on the Earth's surface with our eyes.

We can select multiple bands from the `xarray.DataArray` object that correspond to red, green, and blue reflectance to render a true colour image. We need to pass a `list` of bands into the `sel()` method of the `xarray.DataArray` object referenced by the variable `rds`.

In [None]:
rds.sel(band=[4, 3, 2]).plot.imshow(vmin=0, vmax=0.4, add_labels=False)

### Interactive raster visualisations



Using the <a href="https://plotly.com/python-api-reference/generated/plotly.express.imshow.html" target="_blank">Plotly Express</a> `imshow()` function we can create interactive visualisations of raster data stored in `xarray.DataArray` objects. 

The `px.imshow()` function takes in a 2D or 3D (for RGB images) NumPy `ndarray` as its first argument. The `values` of a `xarray.DataArray` are stored in NumPy `ndarray` objects, so we can select a band and pass it into `px.imshow()`. Let's visualise red band reflectance as an interactive image (hovering your cursor over the image will return a text popup with the red reflectance value at that location). 

In [None]:
px.imshow(rds.sel(band=4).values)

#### Recap quiz

**Above, we have an interactive image of red band reflectance. However, the image is quite small and the colour palette could be more intuitive to indicate it's an image of red light reflectance. Can you look at the parameters in the <a href="https://plotly.com/python-api-reference/generated/plotly.express.imshow.html" target="_blank">`px.imshow()` docs</a> that we could use to visualise reflectance using a red colour palette and increase the height of the plot?**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

The <code>height</code> parameter lets you set the figure height in pixels.

The <code>color_continuous_scale</code> parameter lets you set the colour palette.
    
```python
px.imshow(rds.sel(band=4).values, color_continuous_scale="Reds", height=500)
```
</details>

### Plotting raster time series

In many cases, we'll have raster data that covers the same location on the Earth's surface but is captured on different dates. A good example of this is remotely sensed satellite data which captures spectral reflectance data for the same location each time the satellite overpasses a location. In this instance, each band (or 2D array) might represent an observation for a variable on different dates. For example, green normalised difference vegetation index (GNDVI) values use green and near infrared reflectance to represent the greenness of a location. We can visualise GNDVI through time to represent vegetation growth dynamics (e.g. the green up of a crop after planting). 

Let's read in a raster dataset of GNDVI values for different dates.

In [None]:
s2_gndvi_path = os.path.join(data_path, "gndvi_2020_bf66_fitted.tif")
gndvi_rds = rxr.open_rasterio(s2_gndvi_path)

In [None]:
gndvi_rds

If we inspect the dataset, we can see that it has 52 bands. Each band is an array of GNDVI corresponding to a week of the year. In 2020, canola was grown in this field. 

In a previous lab we introduced the concept of a subplot. This is a good use case for a faceted figure. We can represent each week as a subplot, align the subplots sequentially through time, and use the same mapping of data values to colour to make comparisons of greenness across weeks easy.

The `xarray.DataArray` `plot.imshow()` method has `col` and a `col_wrap` parameters. The `col` parameter can be passed a `dim` for which the faceted subplots are created. Passing in the `"band"` `dim` here will generate a separate image for each of the arrays along the dimension specified. Here, that will create a new GNDVI image for each week. The `col_wrap` parameter specifies how many images are laid out along one row on the display. 

In [None]:
gndvi_rds.plot.imshow(col="band", col_wrap=10)

You should be able to see the green up and green down dynamics of vegetative crop growth in this facet plot.

#### Recap quiz

**There is a GeoTIFF file `ndyi_2020_bf66_fitted.tif` in the `data_lab-2_2` folder. It stores normalised difference yellowness index (NDYI) values for each week of the year. Can you read this file into your program and plot each week's NDYI values as an image to visualise change in yellowness of the canola crop canopy through the growing season? Generate this figure as a faceted plot and select a suitable colour map to visualise change in yellowness.**

**Hint: you can find a list of colour palettes <a href="https://matplotlib.org/stable/users/explain/colors/colormaps.html#classes-of-colormaps" target="_blank">here</a>.**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>
    
```python
s2_ndyi_path = os.path.join(data_path, "ndyi_2020_bf66_fitted.tif")
ndyi_rds = rxr.open_rasterio(s2_ndyi_path)
gndvi_rds.plot.imshow(col="band", col_wrap=10, cmap="afmhot")
```
</details>

## Vector data in Python

### Vector data model

Vector data uses point, line, or polygon geometries to represent geographic features. 

**Coordinate pairs:** point locations or the vertices in lines and polygons are represented using coordinate pairs. The coordinate pairs indicate where that feature is located on the Earth's surface (relative to an origin); longitude and latitute are commonly used as coordinate pairs. 

**Attribute information:** vector data also stores non-spatial attribute information which describe characteristics of the geographic phenomenon or entity represented by the geometry feature.

![](https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/week-2-vector-data.jpg)

### GeoPandas GeoDataFrame

A GeoPandas `GeoDataFrame` is a tabular data structure for storing vector geospatial data and is based on a regular pandas `DataFrame`. 

A `GeoDataFrame` consists of columns of non-spatial attributes similar to a pandas `DataFrame`. However, a `GeoDataFrame` also has a `geometry` column which is a `GeoSeries` of geometries for the spatial data associated with each row.

In Python, geometries are represented as <a href="https://shapely.readthedocs.io/en/stable/geometry.html" target="_blank">Shapely</a> `Geometry` objects. The `geometry` column in a GeoPandas `GeoDataFrame` is a `Series` of Shapely `Geometry` objects. Printing a <a href="https://shapely.readthedocs.io/en/stable/geometry.html" target="_blank">Shapely</a> `Geometry` object returns a Well Known Text (WKT) string description of the geometry (e.g. `POINT (0, 1)`). The `geometry` column of a `GeoDataFrame` (or a `GeoSeries`) can be viewed as a sequence of Shapely `Geometry` objects:

```
a_geoseries = [POINT (0, 1), POINT (0, 2), POINT (2, 3)]
```

Shapely provides tools for representing geometries in Python programs. It does not provide tools for reading geometry data from disk or handling attribute data. GeoPandas `GeoDataFrame` and `GeoSeries` combine Shapely's functionality for handling geometries with tools for reading and writing vector data, handling attributes, and visualisation. Therefore, we will focus on using `GeoDataFrame`s in these labs.  

Let's read in a GeoJSON file storing the elevation of points sampled across the same field in Western Australia that we have been exploring using raster data.

In [None]:
elev_gdf_path = os.path.join(os.getcwd(), "data_lab-2_2", "week-2-bf66-elevation.geojson")
elev_gdf = gpd.read_file(elev_gdf_path)
elev_gdf.head()

Printing out the `head()` of the `GeoDataFrame` `elev_gdf` clearly illustrates the tabular structure for representing vector data. Attributes are stored in columns, the locational information which is a `POINT` geometry object is stored in a `geometry` column, and each row corresponds to one geographic feature.

### Interactive vector visualisations

`GeoDataFrame`s have a helpful <a href="https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.explore.html" target="_blank">`explore()` method</a> for rendering spatial data on a "slippy" web map.

In [None]:
elev_gdf.explore()

This clearly shows the location of points within the field. However, it is not very informative about each point's elevation value. We can change the colour of each point to represent the variability in elevation across the field. To do this we need to use the `column` paramter of the `explore()` method to specify the column in the `GeoDataFrame` that we wish to represent using colour. We can also specifiy a colour palette to use with the `cmap` parameter.

Executing the following code will render the elevation data with low elevations in blue shades and higher locations in yellow shades. 

In [None]:
elev_gdf.explore(column="Elevation", cmap="cividis")

### Chloropleth mapping

<a href="https://clauswilke.com/dataviz/geospatial-data.html#choropleth-mapping" target="_blannk">Chloropleth maps</a> use a feature's fill colour to visualise spatial variation in a variable. A continuous colour palette is used to represent variation in the values of attributes of a vector spatial dataset. For a more detailed review of chloropleth maps please see <a href="https://geographicdata.science/book/notebooks/05_choropleth.html#choropleth-mapping" target="_blank">Rey et al. (2020)</a> and <a href="https://clauswilke.com/dataviz/geospatial-data.html#choropleth-mapping" target="_blank">Wilke (2019)</a>.

Let's create a chloropleth map of wheat crop yields at the national level in 2020 downloaded from <a href="https://www.fao.org/faostat/en/#home" target="_blank">FAOSTAT</a>.  

In [None]:
gdf_wheat_yield_2020 = gpd.read_file(os.path.join(data_path, "fao_wheat_crop_yield_2020.geojson"))

In [None]:
gdf_wheat_yield_2020.head()

We'll get a simple chloropleth map with the default viridis colour palette if we pass in the column label for wheat yield `"yield_100g_ha"` as an argument to the `column` parameter.

In [None]:
gdf_wheat_yield_2020.explore(column="yield_100g_ha")

You might find that the thick borders for country outlines obscures spotting spatial patterns and trends in wheat crop yields. Let's remove the borders. If you look at the docs for <a href="https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.explore.html" target="_blaknk">`explore()`</a> you will see there is a `style_kwds` parameter we can pass a `dict` of styling configurations. 

A `dict` is a data structure with key:pairs. The `dict` passed to `style_kwds` contains a keys that descibe a visual element of the display we wish to adjust and a value that specifies how it should be adjusted. For example, the `stroke` key determines how the border of geometric features on the map is represented. We can set this to `False` to remove the border. Note, `dict` objects are specified by enclosing key:value pairs in braces `{}`.

```python
{"stroke": False}
```

In [None]:
gdf_wheat_yield_2020.explore(column="yield_100g_ha", style_kwds={"stroke": False})

#### Recap quiz

**The opacity of the fill colour on the chloropleth maps is set to 0.5. Can you find a parameter in the  <a href="https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.explore.html" target="_blaknk">`explore()` docs</a> to change the `fillOpacity` to 0.75?**

**Hint: you can use search tools and find tools to find words on the <a href="https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.explore.html" target="_blaknk">`explore()` docs</a> page.**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>
    
```python
gdf_wheat_yield_2020.explore(column="yield_100g_ha", style_kwds={"stroke": False, "fillOpacity": 0.75})
```
</details>

### Categorical vector maps

Sometimes the variable that we wish to visualise on a map using colour is not continuous but is categorical. An example could be the crop type associated with a polygon feature of field boundaries. In these cases we're mapping a categorical value to a colour and a change in colour does not represent an increase or decrease in a numeric value, but a change in group or class. 

In these cases qualitative (or discrete) colour scales should be used to represent groups (i.e. data where there is no logical ordering). Thus, qualitative colour scales should not represent gradients of light to dark or use colours that can be interpreted as having an implied ordering. Often, it is sensible to select colours that relate to the category (e.g. on land cover maps using green for vegetated categories, blue for water etc.). 

Let's make a categorical map of the crop type in a field for a selection of fields near Canarvon in Western Australia. The data is derived from the <a href="https://www.agriculture.gov.au/abares/aclump/catchment-scale-land-use-of-australia-update-december-2020" target="_blank">Catchment scale land use of Australia</a> product.

In [None]:
clum_carnarvon_path = os.path.join(data_path, "clum_land_use_carnarvon.geojson")
clum_gdf = gpd.read_file(clum_carnarvon_path)

In [None]:
clum_gdf.head()

#### Recap quiz

**Can you make a categorical interactive map of the crop type for each field? You will need to use the <a href="https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.explore.html" target="_blaknk">`explore()` docs</a> to identify which parameter to use to let `explore()` know it is visualising categorical data. You should pass in a qualitative colour palette as an argument to the this parameter. A list of <a href="https://matplotlib.org/stable/users/explain/colors/colormaps.html#qualitative" target="_blank">qualitative colour palettes can be found here</a>.**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>
    
```python
clum_gdf.explore(column="Commod_dsc", categorical="set2")
```
</details>