<font color="red"><h1>JOINT METEODATA \& PY4CA WEBINAR SERIES ON ```Handling NetCDF Data In Python```</h1></font>


<img width = "300px" align="right" src="img/flyer.jpg" />

<p>
    <font color='red'>YouTube:</font> <a href="https://www.youtube.com/channel/UCIjBRO4kq2a8sGOjZ1qM0wg">https://www.youtube.com/@meteodata</a> <br />
<font color='red'>Google Scholar:</font> <a href="https://scholar.google.com/citations?user=awGdXUsAAAAJ&hl=en">https://scholar.google.com/citations?user=awGdXUsAAAAJ&hl=en</a> <br />
<font color='red'>ResearchGate:</font> <a href="https://www.researchgate.net/profile/Jeffrey-Aryee">https://www.researchgate.net/profile/Jeffrey-Aryee</a> <br />
<font color='red'>ORCID:</font> <a href="https://orcid.org/0000-0002-4481-1441">https://orcid.org/0000-0002-4481-1441</a> <br />
    <font color='red'>Facebook:</font> <a href="https://www.facebook.com/honourable.gyerph.7/">https://www.facebook.com/honourable.gyerph.7/</a>  <br/>
<font color='red'>LinkedIn:</font> <a href="https://www.linkedin.com/in/jnaaryee/">https://www.linkedin.com/in/jnaaryee/</a> <br /><br /><br />
<font color='red'>E-mail:</font> jnaaryee@knust.edu.gh / metdata.knust@gmail.com
</p>
<hr>

<h2> <a href="https://github.com/jeffjay88/Joint_MeteoData_PY4CA_Webinar_Series"><font color='blue'>Notebook is accessible here</font></a></h2>

<br /><br />


<h3> Day 1 Outline</h3>

<ul>
    <li> Basic Terminologies </li>
    <li> Import Xarray </li>
    <li> Open and Close .NC File(s) </li>
    <li> Datasets & DataArrays </li>
    <li> Data Attributes </li>
    <li> Data Variables Selection</li>
    <li> Data Slicing Along Dimension(s) </li>
    <li> Basic Data Visualization Across Specific Dimension(s)</li>
    <li> Question Time </li>
    
</ul>


<br /><br />



# What is a NetCDF file?

NetCDF (network Common Data Form) is a file format for storing multidimensional scientific data (variables) containing a combination of:

* <b>Spatial information</b> — Location on the surface of the Earth.
* <b>Time information</b> — At what time of the day and year, the measurements were taken.
* <b>Scientific values</b> — Like Temperature, Rainfall, etc. which we discussed before.

The above features contribute to the huge size of the dataset and we want it to be scalable and appendable (as new data is generated every day). Simply put, this is what the NetCDF data format does. It holds spatial information in the form of latitudes and longitudes, time, and also scientific measurements in an easy-to-read manner. Network Common Data Form Or NetCDF (in short) is a data format and also a set of software libraries created to aid the scientific community and more particularly the Geo Sciences. The primary source of information about netCDF data is the <a href="https://www.unidata.ucar.edu/software/netcdf/">Unidata community</a>. <br /><br />


<b><font color="red">Data in netCDF format is:</font></b>

* <b>Self-Describing:</b> A netCDF file includes information about the data it contains.
* <b>Portable:</b> A netCDF file can be accessed by computers with different ways of storing integers, characters, and floating-point numbers.
* <b>Scalable:</b> Small subsets of large datasets in various formats may be accessed efficiently through netCDF interfaces, even from remote servers.
* <b>Appendable:</b> Data may be appended to a properly structured netCDF file without copying the dataset or redefining its structure.
* <b>Sharable:</b> One writer and multiple readers may simultaneously access the same netCDF file.
* <b>Archivable:</b> Access to all earlier forms of netCDF data will be supported by current and future versions of the software.<br /><br />

<img align="center" width="900px" src="img/netCDF.png" />

<br /> <br />    
In Python, there are several packages that help to read and write NetCDF data file. Example:

- Xarray
- NetCDF4 <br />
etc...

# Xarray

<img src="img/xarray.png" />

# Details on Xarray: <a href="https://earth-env-data-science.github.io/lectures/xarray/xarray_intro.html"> https://earth-env-data-science.github.io/lectures/xarray/xarray_intro.html </a>

# Remote Access To The Working Data (Hosted on My Google Drive)

***use gdown for the data download***

![image.png](attachment:image.png)

In [None]:
try:
    import gdown
except:
    !pip install gdown
    import gdown

In [None]:
url = 'https://drive.google.com/file/d/1oJI2GUL3S4aPEzmtlaQb4tJ1LHhH9rM3/view?usp=share_link'
FILE_ID = url.split('/')[-2]
download_id = 'https://drive.google.com/uc?id='+FILE_ID


import os
if os.path.exists('Africa_cru_data.nc'):
    pass
else:
    gdown.download(download_id)

# Import the Xarray package

In [None]:
import xarray as xr
from warnings import filterwarnings
filterwarnings('ignore')

# Open and Close NetCDF File(s)
<br />
You can read a <b>single .nc file</b> using the <b>open_dataset</b> method and <b>open_mfdataset</b> if you are dealing with <b>multiple .nc files</b>.

In [None]:
file = 'Africa_cru_data.nc'
ds = xr.open_dataset(file)

In [None]:
# Close Dataset
ds.close()

# Variable Selection

In [None]:
ds_Data = xr.open_dataset('Africa_cru_data.nc')

In [None]:
# Option 1
ds_Data.pre

In [None]:
# Option 2
ds_Data['pre']

# Datasets (ds) & DataArrays (da)


***DataArray***
- A multi-dimensional array with labeled or named dimensions. DataArray objects add metadata such as dimension names, coordinates, and attributes (defined below) to underlying “unlabeled” data structures such as numpy and Dask arrays. If its optional name property is set, it is a named DataArray. <br /><br />

***Dataset***
- A dict-like collection of DataArray objects with aligned dimensions. Thus, most operations that can be performed on the dimensions of a single DataArray can be performed on a dataset. Datasets have data variables (see Variable below), dimensions, coordinates, and attributes.

In [None]:
type(ds_Data)

In [None]:
type(ds_Data.pre)

# Check Dataset Properties

***Check Data Variables***

```Data Variables: ``` data_vars

In [None]:
ds_Data.data_vars

***Check Shape and Size of Data Variables***

In [None]:
ds_Data.pre.shape

In [None]:
ds_Data.pre.size

***Co-ordinates & Dimensions***

In [None]:
ds_Data.coords

In [None]:
ds_Data.dims

***Data Attributes***

In [None]:
ds_Data.attrs

In [None]:
ds_Data.attrs['Conventions']

***Checking Attributes of A DataArray***

In [None]:
ds_Data.pre.attrs

In [None]:
ds_Data.pre.attrs['units']

# Selection / Subsetting Data

***Selections or slicing can be performed along any dimension.***

In [None]:
da = ds_Data.pre

***Selection along a single dimension***

In [None]:
da.sel(time='2020-01')

***Point Selection***

In [None]:
da.sel(lon=-3.1, lat=-12, method='nearest')

In [None]:
da.sel(lon=-3.1, lat=-12, method='ffill')

**Areal Selection/Slicing**

In [None]:
da.sel(lon=slice(-4,5), time=slice('2019-01','2020-12'))

In [None]:
da.lon

# Changing Longitude convention   (0 -- 360    to     -180 -- 180)

<img align="left"  src="img/change_long.png" />

***Converting to 0~--~360 range***

In [None]:
ds_Data.coords['lon'] = (ds_Data.coords['lon'] + 360) % 360 
ds_Data = ds_Data.sortby(ds_Data.lon)

***Converting to -180~--~180 range***

In [None]:
ds_Data.coords['lon'] = (ds_Data.coords['lon'] + 180) % 360 - 180
ds_Data = ds_Data.sortby(ds_Data.lon)

# Groupby & Resampling

***Time-based Groupings***

***Producing Annual Totals / Annual Means***

In [None]:
da.groupby('time.year').sum('time')

In [None]:
da.groupby('time.year').mean('time')

***Monthly Climatologies***

In [None]:
da.groupby('time.month').sum('time')

***Seasonal Groupings***

In [None]:
da.groupby('time.season').mean('time')

***Resampling*** <br />
Handles both downsampling and upsampling. The resampled dimension must be a datetime-like coordinate. If any intervals contain no values from the original object, they will be given the value NaN.

In [None]:
da.resample(time='1Y').mean('time')

In [None]:
da.resample(time='3M').mean('time')

***Seasonal Resampling***

In [None]:
da.resample(time='QS-Dec').mean('time')#[0].plot()

# Simple Visualizations

***1D    - line plots*** <br />
***2D    - Spatial / contour plots*** <br />
***3D ++ - Histogram*** <br />



In [None]:
ds_Data.pre.sel(lon=5, lat=12, method='nearest').plot()

In [None]:
ds_Data.pre.sel(lon=5, lat=12, method='nearest').resample(time='Y').sum('time').plot()

In [None]:
ds_Data.pre.sel(time='2020-01').plot()

In [None]:
ds_Data.pre.sel(time=slice('2020-01','2020-12')).plot()

# Simplistic Tasks

<ol>
    <li> Split the CRU data into 4 different climate regimes(CR). (1901-1930, 1931-1960; 1961-1990; 1991-2020) </li>
    <li> Visualize the long-term climatology of Annual Totals and standard deviations for each climate regime. </li>
    <li> Create a latitude-by-month Hovmoller plot for each climatic regime. </li> 
    <li> Using CR1 as reference, estimate the magnitude of change for each CR, relative to CR1. </li>
</ol>

***Task 1*** <br />
***Split the CRU data into 4 different climate regimes(CR).*** <br />
***(1901-1930, 1931-1960; 1961-1990; 1991-2020)***

In [None]:
da_CR1 = da.sel(time=slice('1901','1930'))
da_CR2 = da.sel(time=slice('1931','1960'))
da_CR3 = da.sel(time=slice('1961','1990'))
da_CR4 = da.sel(time=slice('1991','2020'))


***Task 2*** <br />
***Visualize the long-term climatological means and standard deviations for each climate regime***

In [None]:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(ncols=4, nrows=2, figsize=(18,8))
plt.subplots_adjust(right=0.87, wspace=0.5, hspace=0.5)

# Means
da_CR1.groupby('time.year').sum('time').mean('year').plot(ax=axes[0, 0], add_colorbar=False, vmax=1500, cmap='terrain_r')
da_CR2.groupby('time.year').sum('time').mean('year').plot(ax=axes[0, 1], add_colorbar=False, vmax=1500, cmap='terrain_r')
da_CR3.groupby('time.year').sum('time').mean('year').plot(ax=axes[0, 2], add_colorbar=False, vmax=1500, cmap='terrain_r')
cb = da_CR4.groupby('time.year').sum('time').mean('year').plot(ax=axes[0, 3], add_colorbar=False, vmax=1500, cmap='terrain_r')

cax = fig.add_axes([0.9, 0.55, 0.01, 0.35])
fig.colorbar(cb, cax=cax, orientation='vertical', extend='max')



# Standard Deviations
da_CR1.groupby('time.year').sum('time').std('year').plot(ax=axes[1, 0], add_colorbar=False, vmin=-200, vmax=200, cmap='RdBu')
da_CR2.groupby('time.year').sum('time').std('year').plot(ax=axes[1, 1], add_colorbar=False, vmin=-200, vmax=200, cmap='RdBu')
da_CR3.groupby('time.year').sum('time').std('year').plot(ax=axes[1, 2], add_colorbar=False, vmin=-200, vmax=200, cmap='RdBu')
cb = da_CR4.groupby('time.year').sum('time').std('year').plot(ax=axes[1, 3], add_colorbar=False, vmin=-200, vmax=200, cmap='RdBu')

cax = fig.add_axes([0.9, 0.1, 0.01, 0.35])
fig.colorbar(cb, cax=cax, orientation='vertical', extend='both')




***Task 3*** <br />
***Create a latitude-by-month Hovmoller plot for each climatic regime.***

In [None]:
fig, axes = plt.subplots(ncols=2, nrows=2, figsize=(12,9))
plt.subplots_adjust(right=0.87, wspace=0.5, hspace=0.5)

da_CR1.groupby('time.month').mean(['lon','time']).plot.contourf(y='lat', ax=axes[0,0], add_colorbar=False, vmin=50, vmax=200, cmap='terrain_r')
axes[0,0].set_title('CR1', fontsize=20)

da_CR2.groupby('time.month').mean(['lon','time']).plot.contourf(y='lat', ax=axes[0,1], add_colorbar=False, vmin=50, vmax=200, cmap='terrain_r')
axes[0,1].set_title('CR2', fontsize=20)

da_CR3.groupby('time.month').mean(['lon','time']).plot.contourf(y='lat', ax=axes[1,0], add_colorbar=False, vmin=50, vmax=200, cmap='terrain_r')
axes[1,0].set_title('CR3', fontsize=20)

cb = da_CR4.groupby('time.month').mean(['lon','time']).plot.contourf(y='lat', ax=axes[1,1], add_colorbar=False, vmin=50, vmax=200, cmap='terrain_r')
axes[1,1].set_title('CR4', fontsize=20)

# Add A Single Colorbar
cax = fig.add_axes([0.9, 0.1, 0.025, 0.75])
fig.colorbar(cb, cax=cax, label='Rainfall (mm/month)', orientation='vertical', extend='both')


***Task 4*** <br />
***Using CR1 as reference, estimate the magnitude of change for each CR, relative to CR1.***

In [None]:
da_clim_CR1 = da_CR1.groupby('time.year').sum('time')
da_clim_CR2 = da_CR2.groupby('time.year').sum('time')
da_clim_CR3 = da_CR3.groupby('time.year').sum('time')
da_clim_CR4 = da_CR4.groupby('time.year').sum('time')

In [None]:
fig, axes = plt.subplots(ncols=4, figsize=(18,4))
plt.subplots_adjust(bottom=0.25, wspace=0.5, hspace=0.5)

cb = da_clim_CR1.mean('year').plot(ax = axes[0], add_colorbar=False, vmax=1500, cmap='terrain_r')
axes[0].set_title('CR1')

# Add A Single Colorbar 
cax = fig.add_axes([0.1, 0.075, 0.2, 0.025])
fig.colorbar(cb, cax=cax, label='Rainfall (mm/month)', orientation='horizontal', extend='max')



( da_clim_CR2.mean('year') - da_clim_CR1.mean('year') ).plot(ax = axes[1], vmin=-50, vmax=50, cmap='RdBu', add_colorbar=False)
( da_clim_CR3.mean('year') - da_clim_CR1.mean('year') ).plot(ax = axes[2], vmin=-50, vmax=50, cmap='RdBu', add_colorbar=False)
cb = ( da_clim_CR4.mean('year') - da_clim_CR1.mean('year') ).plot(ax = axes[3], vmin=-50, vmax=50, cmap='RdBu', add_colorbar=False)
axes[1].set_title('CR2')
axes[2].set_title('CR3')
axes[3].set_title('CR4')


# Add A Single Colorbar For The Differences
cax = fig.add_axes([0.35, 0.075, 0.5, 0.025])
fig.colorbar(cb, cax=cax, label='Difference (mm/month)', orientation='horizontal', extend='both')


# RECAP (ITEMS COVERED)

- Basic Terminologies
- Import Xarray
- Open and Close .NC File(s)
- Datasets & DataArrays
- Data Attributes
- Data Variables Selection
- Data Slicing Along Dimension(s)
- Basic Data Visualization Across Specific Dimension(s)

# THANK YOU

# QUESTIONS?