![image](https://hydro-jules.org/sites/default/files/Hydro-JULES_Logo_Positive.png)
# Hydro-JULES - netCDF access examples

This notebook provides a very basic walk through of how to use a Datalabs python notebook, and how to open and read netCDF files in python using the netCDF4 package. It shows how the spatial and time coordinates are commonly used within the netCDF files we have created for Hydro-JULES (which are relatively simple).

For an overview of netCDF files, this blog is s useful place to start: https://cslocumwx.github.io/blog/2015/01/19/python-netcdf-part1/


Contents (note: to run cells later in the notebook, the first few cells must be run in order to import packages and initialise variables):
- [Import packages required for this work](#import_packages)
- [Check what files are in the local store](#import_packages)
- [Browsing the file store and uploading files](#browse_files)
- [Quick file access to show how easy it is](#quick_file_access)
- [Explore the variables in a netCDF file](#explore_variables)
- [A closer look at the air temperature variable and data](#closer_look)
- [Explore the spatial variables: latitude/longitude, and X/Y](#explore_spatial)
- [Find the nearest cell for a given latitude and longitude](#find_nearest_cell)
- [Explore the time variable](#explore_time)
- [Read a netCDF dataset over the web](#explore_web)


<a id='import_packages'></a>
### Import packages required for this work

In [None]:
from netCDF4 import Dataset

import numpy as np

import matplotlib.pyplot as plt

import os
from datetime import date
from datetime import datetime
from datetime import timedelta
import time
from calendar import monthrange

In [None]:
# The following line allows matplotlib plots to be shown in the notebook
%matplotlib inline

# Set a default figure size for matplotlib plots
plt.rcParams['figure.figsize'] = [15,10]

# Initialise a file name variable used throughout these examples
file = '/data/example-data/chess-met_tas_gb_1km_daily_20151201-20151231.nc'

#### Check what files are in the local data store

In [None]:
print("Current working directory is:")
print(os.getcwd()+"\n")

print("This directory contains this notebook and other notebooks in this data lab")
print(str(os.listdir('./'))+"\n")

print("Root directory contains:")
print(str(os.listdir('/'))+"\n")

print("The '/data' directory contains the principle folders for storing files relevant to this data lab")
print(str(os.listdir('/data'))+"\n")

print("The '/data/example-data' 'folder contains an example netCDF file")
print(str(os.listdir('/data/example-data'))+"\n")


<a id="browse_files"></a>
### Browsing the file store and uploading files

To view the files within the data storage in the browser, go back to the Datalabs window. When you are in the Hydro-JULES project, click "Storage" on the left, and select the "initialhj" storage. This will open a new window (titled "Minio browser").

![image](images/file-store-image-2.png)

This points to the /data folder.

Within the /data folder:
- /conda contains files for the conda environments for this project
- /notebooks contains these notebook files
- /example-data contains data files used in this notebook

You can create new folders in the /example-data folder and upload files.

<a id="quick_file_access"></a>
### Quick netCDF file access to show how easy it is

In [None]:
# Get a month's worth of CHESS air temperature data from a local file
file = '/data/example-data/chess-met_tas_gb_1km_daily_20151201-20151231.nc'

# Open the file as a netCDF4 "Dataset"
my_dataset = Dataset(file)

# Get the air temperature for the cell y=300, x=400, for all time intervals
tasData = my_dataset.variables['tas'][:,300,400]
# Get air temperature for all cells for the 1st time interval
tasMap = my_dataset.variables['tas'][0,:,:]
# Print this data
print(tasData)

# Close the dataset
my_dataset.close()

# Quick plot of time series and map
fig, axes = plt.subplots(ncols=2, nrows=1, figsize=(10, 5))
axes[0].plot(tasData)
mp = axes[1].imshow(tasMap, origin={'lower'})
# Add easting and northing point
axes[1].scatter(400,300,s=50)
fig.colorbar(mp, ax=axes[1])


<a id="explore_variables"></a>
### Explore the variables in a netCDF file

In [None]:
# Write out the variables in a netCDF file
file = '/data/example-data/chess-met_tas_gb_1km_daily_20151201-20151231.nc'
my_dataset = Dataset(file)

print("Loop through variables and print variable name:")
for v in my_dataset.variables:
    print(v)
print("\n")

print("Select one variable by its name and print one metadata attibute ('units'), \
 and the dimensions and shape of the variable:")

print("Units : "+my_dataset.variables['tas'].units)
print("Dimensions : "+str(my_dataset.variables['tas'].dimensions))
print("Shape : "+str(my_dataset.variables['tas'].shape))
print("\n")

print("Print full metadata for all variables:")
print("\n")
print(my_dataset.variables)

my_dataset.close()  

<a id="closer_look"></a>
### A closer look at the air temperature variable and data

In [None]:
my_dataset = Dataset(file)

print("First get the air temperautre 'tas' variable, and show its dimensions")
tas_var = my_dataset.variables['tas']
print("Dimensions : "+str(tas_var.dimensions))
print("\n")

print("Then get the air temp from this variable and check its type and shape:")
tas=tas_var[:]
print("Type : "+str(type(tas)))
print("Shape : "+str(tas.shape))
print("\n")

print("It's a 3D masked array (array of data values, plus another array of mask values) \
with a grid of X-Y values for each day of December 2015 ")
print("\n")

print("First, look at the mask")
print("It has the same shape as the values:")
tas_mask = tas.mask
print("Type : "+str(type(tas_mask)))
print("Shape : "+str(tas_mask.shape))
print("Unique values : "+str(np.unique(tas_mask)))
print("\n")

print("Make a quick plot of the mask for the first time step (note it is upside down):")
plt.imshow(tas_mask[0,:,:])

my_dataset.close() 


In [None]:
print("Note that even though the file is closed the data is available through the previously used python variable")
print("But the netCDF4 variable is not accessible now the file is closed")
print("\n")

try:
    print(tas_var)
except RuntimeError as err:
    print("Accessing the netCDF variable gives runtime error: {0}".format(err))
print("\n")

print("Select a small area of the mask to look at it in more detail")
fig,axs = plt.subplots(2)
axs[0].imshow(tas_mask[1,400:500,480:580])
axs[1].imshow(tas_mask[1,420:430,535:545])
plt.show()

print("Mask data for second image (False means values are not masked, True means they are masked):")
print(tas_mask[1,420:430,535:545])
print("\n")

tas_data = tas.data
print(tas_data[1,420:430,535:545])
print("\n")

print("Now look at the air temperature data (the right way up this time)")
print("Plot the mask, the data values only, and the entire masked array:")
plt.rcParams['figure.figsize'] = [15,10]
fig,axs = plt.subplots(1,3)
mp = axs[0].imshow(tas_mask[1,:,:], origin=[0,0])
axs[0].set_title('Mask')
fig.colorbar(mp,ax=axs[0],fraction=0.07, pad=0.04)
mp = axs[1].imshow(tas_data[1,:,:], origin=[0,0])
axs[1].set_title('Data')
fig.colorbar(mp,ax=axs[1],fraction=0.07, pad=0.04)
mp = axs[2].imshow(tas[1,:,:], origin=[0,0])
axs[2].set_title('Masked array')
fig.colorbar(mp,ax=axs[2],fraction=0.07, pad=0.04)
print("Plotting the whole masked array allows matplotlib to use the mask and understand the range of the values")
plt.show()



<a id="explore_spatial"></a>
### Explore the spatial variables: latitude/longitude, and X/Y

In [None]:
# Find the lat long of the grid cell at x = 400, y=500
my_dataset = Dataset(file)

print("First get the x and y variables and show its dimensions and shape")
x_var = my_dataset.variables['x']
print("Dimensions : "+str(x_var.dimensions))
print("Shape : "+str(x_var[:].shape))
y_var = my_dataset.variables['y']
print("Dimensions : "+str(y_var.dimensions))
print("Shape : "+str(y_var[:].shape))
print("\n")

print("They are 1D arrays of the British National Grid easting and northing coordinate values")
print("of the centre points of the data cells at the equivalent grid cells in the data array")
print("\n")

print("For example, the easting and northing of the value at index 500,400 is:")
print("Easting : "+str(x_var[500]))
print("Northing : "+str(y_var[400]))
print("\n")

print("There is also a variable containing the minimum and maximum easting and northing for each grid cell")
x_bnd_var = my_dataset.variables['x_bnd']
print("Dimensions : "+str(x_bnd_var.dimensions))
print("Shape : "+str(x_bnd_var[:].shape))
y_bnd_var = my_dataset.variables['y_bnd']
print("Dimensions : "+str(y_bnd_var.dimensions))
print("Shape : "+str(y_bnd_var[:].shape))
print("\n")

print("The boundaries of the cell at index 500,400 is:")
print("Easting : "+str(x_bnd_var[500,:]))
print("Northing : "+str(y_bnd_var[400,:]))
print("\n")

print("Next get the lat variable, and show its dimensions")
lat_var = my_dataset.variables['lat']
print("Dimensions : "+str(lat_var.dimensions))
print("\n")

print("Then get the latitude from this variable and check its type and shape:")
lat=lat_var[:]
print("Type : "+str(type(lat)))
print("Shape : "+str(lat.shape))
print("\n")

print("It's a 2D masked array (array of values. plus another array of mask values)")
print("The dimensions are X and Y : there is a different latitude values for each combination of X and Y")
print("\n")

print("But look at the shape of the mask - it's different from the values:")
lat_mask = lat.mask
print("lat mask type : "+str(type(lat_mask)))
print("lat mask shape : "+str(lat_mask.shape))
print("lat mask : "+str(lat_mask))
print("\n")

lat_data=lat.data
print("lat data type : "+str(type(lat_data)))
print("lat data shape : "+str(lat_data.shape))
print("\n")
print("The mask is just a single value 'False'. \
netCDF4 variables will be masked arrays, but we cannot expect the mask to be the same size as the data.")
print("Though often, particularly in Hydro-JULES datasets, the data variables will have a mask to enable \
the points with data (land points) to be readily identified")
print("\n")

print("The data is a numpy array, and we can simply select the value at the indices of interest.")
grid_lat = lat_data[500,400]
print("\n")

# Get the longitude directly in one line
grid_lon = my_dataset.variables['lon'][:].data[500,400]

print("Lat and Long at y=500, x=400 is:")
print([grid_lat,grid_lon])

my_dataset.close()  

<a id="find_cell"></a>
## Find the nearest cell for a given easting and northing, or latitude and longitude

In [None]:
easting = 529010
northing = 180220

my_dataset = Dataset(file)

# Get the easting and northing variables from the dataset
x = my_dataset.variables['x'][:]
y = my_dataset.variables['y'][:]

# Get the index of the closest value to the easting and northing (the index of the cell we want)
idx_e = (np.abs(x - easting)).argmin()
idx_n = (np.abs(y - northing)).argmin()
print('Original easting and northing : {}, {} \n'.format( *[ easting, northing ] ))
print('Nearest cell at easting {} , northing {} within index [{},{}] \n'.format( *[ x[idx_e] , y[idx_n] , idx_e, idx_n ]))

my_dataset.close()



The lat and long variables are 2 dimensional arrays of the value at each cell (defined by easting and northing)
This means identifying the nearest cell for a given lat and long is not so simple 
(it's not just the cell with the nearest latitude and the cell with the nearest longitude).
This code quickly identifies the X and Y indices for the nearest cell for a given lat and long

In [None]:
# Latitude and longitude of interest:
my_lat = 54.39
my_lon = -1.99

# Open the dataset
my_dataset = Dataset(file)
# retrieve the lat and long arrays
lat = my_dataset.variables['lat'][:]
lon = my_dataset.variables['lon'][:]
# Create new arrays of the distance for each cell from the given latitude and the given longitude
lat_dist = abs(lat-my_lat)
lon_dist = abs(lon-my_lon)
# Create an array of the absolute distance (in degrees)
dist_diff = np.sqrt( np.square(lat_dist) + np.square(lon_dist) )
# Find the cell where this distance is a minimum
idx = np.where( dist_diff == np.min(dist_diff) )

my_x = idx[0][0]
my_y = idx[1][0]

# Get the air temp data for this location
temp = my_dataset.variables['tas'][ :, my_y, my_x ]

my_dataset.close()

print("Nearest grid cell is : "+str(idx))
print("\n")

# Show the grids around this point
grid_size=2
minx = my_x-grid_size
maxx = my_x+grid_size+1
miny = my_y-grid_size
maxy = my_y+grid_size+1

with np.printoptions(precision=7):
    print("Latitude values nearby")
    print(lat[minx:maxx,miny:maxy])
    print("\n")
    print("Longitude values nearby")
    print(lon[minx:maxx,miny:maxy])
    print("\n")
    print("Distances from given lat/long nearby ")
    print(np.round(dist_diff[minx:maxx,miny:maxy],5))
    print("\n")
    print("Minimum distance:")
    print (np.min(dist_diff))
    print("\n")
print("Data for point [{},{}]".format( *[ my_x, my_y ]))
plt.plot(temp)

<a id="explore_time"></a>
### Explore the time variable

In [None]:
# Use the time variable to identify the correct indices for dates of interest
start_date = date ( year=2015, month=12, day=3)        # 03/12/2015
end_date = date ( year=2015, month=12, day=18)         # 18/12/2015

var = 'tas'                                            # Air temperature

# Open the dataset
my_dataset = Dataset(file)

# Get the time variable from the dataset
print('Time variable units:')
print(my_dataset.variables['time'].units+':\n')
my_dates = my_dataset.variables['time'][:]

# Get the index of the start and end date 
# (the array "dates" is a list of the dates expressed as the number of days since 1961)
s_days = ( start_date - date( year=1961, month=1, day=1 ) ).days
idx_start = (np.abs(my_dates - s_days)).argmin()
e_days = ( end_date - date( year=1961, month=1, day=1 ) ).days
idx_end = (np.abs(my_dates - e_days)).argmin()

# Get the data (air temp) variable
tas = my_dataset.variables[var]

# Write out the metadata
print('Air temperature variable information:')
print(tas)
print("\n")

# Get the data between the start and end date index, at the easting and northing indices calculated earlier
ts_data = tas[ idx_start:idx_end, my_y, my_x ]

# Close the dataset
my_dataset.close()

print("Data for index {} to {} :".format( *[ idx_start, idx_end ] ))
print(ts_data)

<a id="explore_web"></a>
### Read a netCDF dataset over the web
netCDF files are commonly made available over the web using a THREDDS server.
Datasets made accessible in this way can be readily queried using exactly the same code as described above,
but reading from a THREDDS API end point instead of a file.

This example reads air temperature data from the CHESS dataset hosted bu UKCEH's EIDC.
In this case the entire dataset is available, but the web-based access is considerably slower than file access

In [None]:
# Instead of a file, we are now going to access data over the web
# The interface to the dataset, made accessible through a THREDDS server, is at:
url = "https://eip.ceh.ac.uk/thredds/dodsC/public-chess/driving_data/aggregation/tas_aggregation"

# We access the netCDF dataset object in the same way as before
my_dataset = Dataset(url)

for v in my_dataset.variables:
    print(v)
print("\n")

print(my_dataset)
print("\n")
print(my_dataset.dimensions)
print("\n")
print(my_dataset.variables)
print("\n")

print(my_dataset.variables["tas"])
print("\n")

# Get the time variable from the dataset
print("Time variable units:")
print(my_dataset.variables["time"].units+":\n")

# Whole time series from this aggregation is >50 years, so select an arbitrary subset for an arbitrary cell
my_dates = my_dataset.variables["time"][500:800]
ts_data = my_dataset.variables["tas"][500:800,400,500]

plt.plot(ts_data)


my_dataset.close()
