# Overview

Many climate and meteorological datasets come as gridded rasters in data formats such as NetCDF and GRIB. We will use [XArray](http://xarray.pydata.org/) to read, process and visualize the gridded raster dataset.

Xarray is an evolution of rasterio and is inspired by libraries like pandas to work with raster datasets. It is particularly suited for working with multi-dimensional time-series raster datasets. It also integrates tightly with dask that allows one to scale raster data processing using parallel computing. XArray provides [Plotting Functions](https://xarray.pydata.org/en/stable/user-guide/plotting.html) based on Matplotlib. 

In this section, we will take the [Gridded Monthly Temperature Anomaly Data](https://data.giss.nasa.gov/gistemp/) from 1880-present from GISTEMP and visualize the temperature anomaly for the year 2021.

## Setup and Data Download

The following blocks of code will install the required packages and download the datasets to your Colab environment.

In [46]:
import os
import matplotlib.pyplot as plt
import xarray as xr

In [5]:
data_folder = 'data'
output_folder = 'output'

if not os.path.exists(data_folder):
    os.mkdir(data_folder)
if not os.path.exists(output_folder):
    os.mkdir(output_folder)

In [6]:
def download(url):
    filename = os.path.join(data_folder, os.path.basename(url))
    if not os.path.exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)

filename = 'gistemp1200_GHCNv4_ERSSTv5.nc'
data_url = 'https://github.com/spatialthoughts/python-dataviz-web/raw/main/data/gistemp/'

download(data_url + filename)

Downloaded data/gistemp1200_GHCNv4_ERSSTv5.nc


## XArray Basics

By convention, XArray is imported as `xr`. We use Xarray's `open_dataset()` method to read the gridded raster. The result is a `xarray.Dataset` object.


In [8]:
file_path = os.path.join(data_folder, filename)
ds = xr.open_dataset(file_path)

The NetCDF file contains a grid of values for each month from 1880-2021 at a spatial resolution of 2 degrees. Let's understand what is contained in a Dataset.

* *Variables*: This is similar to a band in a raster dataset. We have 2 variables in this dataset: `tempanomany` and `time_bnds`. Each variable contains an array of values.
* *Dimensions*: This is similar to number of array axes. We have a 4-dimensional dataset. A 2D grid of pixels (`lat` and `lon`) at multiple time intervals `time` with multiple variables `nv`. 
* *Coordinates*: These are the labels for values in each dimension. We have labels for `lat`, `lon` and `time`.
* *Attributes*: This is the metadata associated with the dataset.


In [43]:
ds

A Dataset consists of one or more `xarray.DataArray` object. This is the main object that consists of a single variable with dimension names, coordinates and attributes. You can access each variable using `dataset.variable_name` syntax.

Let's see the `time_bnds` variable. This contains a 2d array which has both a starting and ending time for each one averaging period.

In [48]:
ds.time_bnds

The main variable of interest is = `tempanomaly` - containing the grid of temperature anomaly values at different times. Let's select that variable and store it as `da`.

In [50]:
da = ds.tempanomaly
da

## Selecting Data

XArray provides a very powerful way to select subsets of data, using similar framework as Pandas. Similar to Panda's `loc` and `iloc` methods, XArray provides `sel` and `isel` methods. Since DataArray dimensions have names, these methods allow you to specify which dimension to query.

Let's select the temperature anomany values for the last time step. Since we know the index (-1) of the datam we can use `isel` method.

In [58]:
da.isel(time=-1)

We can also specify a value to query using the `sel()` method.

In [60]:
da.sel(time='2021-12-15')

We can specify multiple dimensions to query for a subset. Let's extract the temperature anomaly at `lat=49`, `lon=-123` and `time='2021-06-15'`. This region experienced abnormally high temperatures in June 2021.

In [63]:
da.sel(lat=49, lon=-123, time='2021-06-15')

The `sel()` method also support nearest neighbor lookups. This is useful when you do not know the exact label of the dimension, but want to find the closest one. 

> Tip: You can use `interp()` instead of `sel()` to interpolate the value instead of closest lookup.

In [65]:
da.sel(lat=28.6, lon=77.2, time='2021-05-01', method='nearest')

In [67]:
da.interp(lat=28.6, lon=77.2, time='2021-05-15')

## Aggregating Data

A very-powerful feature of XArray is the ability to easily aggregate data across dimensions - making it ideal for many remote sensing analysis. We can aggregate the data to yearly time-steps using the `resample()` method, reducing the `time` dimension.

In [70]:
yearly = da.resample(time='Y').mean(dim='time')
yearly

In [68]:
da.mean(dim='time')

We extract the `tempanomaly` variable and use the indexing method `isel()` to extract the latest time slice.

In [None]:
anomaly = yearly['tempanomaly']
anomaly2021 = anomaly.isel(time=-1)

We can now plot this data using the `imshow()` method from xarray.

In [None]:
from xarray.plot import imshow
imshow(anomaly2021)

To create more informative map visualization, we need to reproject this grid to another projection. CartoPy supports a wide range of projections and can plot them using matplotlib. CartoPy creates a [GeoAxes](https://scitools.org.uk/cartopy/docs/latest/reference/generated/cartopy.mpl.geoaxes.GeoAxes.html) object and replaces the default `Axes` with it. This allows you to plot the data on a specified projection.

Reference: [CartoPy List of Projections](https://scitools.org.uk/cartopy/docs/latest/reference/crs.html?highlight=list#list-of-projections)

In [None]:
ax = plt.axes(projection=ccrs.Orthographic(0, 40))
ax.coastlines()
fig = plt.gcf()
fig.set_size_inches(5,5)
plt.show()

We can create a GeoAxes with a custom Orthographic projection and plot the temperature anomaly data on it. The `transform` argument specifies the CRS of the original dataset.

In [None]:
ax = plt.axes(projection=ccrs.Orthographic(0, 30))
ax.coastlines()
anomaly2021.plot.imshow(ax=ax,
    vmin=-4, vmax=4, cmap='coolwarm',
    transform=ccrs.PlateCarree())

fig = plt.gcf()
fig.set_size_inches(5,5)
plt.tight_layout()
plt.show()

We can further customize the map by adjusting the colorbar. 

Reference: [matplotlib.pyplot.colorbar](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.colorbar.html)

In [None]:
cbar_kwargs = {
    'orientation':'horizontal',
    'location': 'bottom',
    'fraction': 0.025,
    'pad': 0.05,
    'extend':'neither'
}

ax = plt.axes(projection=ccrs.Orthographic(0, 30))
ax.coastlines()
anomaly2021.plot.imshow(
    ax=ax,
    vmin=-4, vmax=4, cmap='coolwarm',
    transform=ccrs.PlateCarree(),
    add_labels=False,
    cbar_kwargs=cbar_kwargs)

fig = plt.gcf()
fig.set_size_inches(10,10)
plt.title('Temprature Anomaly in 2021 (°C)', fontsize = 14)

output_folder = 'output'
output_path = os.path.join(output_folder, 'anomaly.jpg')
plt.savefig(output_path, dpi=300)
plt.show()

## Exercise

Display the map in an Equal Earth projection.