# How to read a netCDF file with python

The format netCDF is a popular data fromat used in Earth science to store gridded dataset. The format is particularly well suited for model outputs and gridded observational datasets. One key advantage of netCDF is that it is self-describing, meaning that the information needed to describe the stored data (called the *metadata*) is available directly in the same file in the form of *attributes*.

The standard module to read and write netCDF file in python is called netCDF4. Complete documentation can be found here: http://unidata.github.io/netcdf4-python/netCDF4/index.html

A good tutorial on netCDF4 can be found here:
https://www.earthinversion.com/utilities/reading-NetCDF4-data-in-python/

There are also more advanced modules to read and write netCDF files, such as the module *xarray*. While these modules provide high-level functions to manipulate netCDF files, they tend to hide important details about the structure of these files. It is therefore recommended to know how to use a low-level module such as *netCDF4* before trying to use more advanced and powerful modules such as *xarray*.

## Install netCDF4 module

You must first install netCDF4 on your system.

First, you can check if the module *netCDF4* is already installed. For this, you can simply import the module. If the module is installed the *import* command will run without error. Otherwise, an error will be issued.

In [None]:
# load netCDF4 module
import numpy as np
import netCDF4 as nc 

# if an error is issued, you need to install the module.
# In that case, open a terminal and execute the following command: pip install --upgrade netCDF4
# Then restart your Kernel, and try again to execute the import command. No error should be issued anymore.

## Check what is in netCDF4 and reach help menu

In [None]:
print("Content of the module netCDF4:")
print(dir(nc))

In [None]:
# command to access helper
nc.Dataset?

In [None]:
help(nc.Dataset)

## Import data from the climatology MIMOC

During this lab, we will work with the ocean climatology MIMOC (https://www.pmel.noaa.gov/mimoc/). 

To get the datafiles, you can either find it in the *share* directory (path: shared/mav110/projekt/MIMOC), or import it from Canvas.

### To get it from the *share* directory, open a terminal and type the folowing command from your home directory:
- cp -r shared/mav110/projekt/MIMOC .

Important: do not forget the final point *.* which indicates the destination folder being the folder in which you currently are (i.e. your home directory in this case).

Important 2: the option *-r* means 'recursively', i.e. that you will copy the folder and its entire content. If you try to copy a non-empty folder without this option, you will get an error issued.

### To import it from Canvas, you must:
- first download the file MIMOC_ML_v2.2_CT_SA.zip on your computer, and 
- second, upload the zip file on jupyterhub, and
- unzip its content in your working directory. 
You should have twelve files, one for each month: MIMOC_ML_v2.2_CT_SA_MLP_monthXX.nc with XX a number from 1 to 12.

To unzip the file, you must open a terminal (using the + sign in the File Browser), move the zip file in a directory of your choice (e.g. MIMOC_ML_v2.2, use linux command mv), then uncompress the zip file (use the command unzip). Here is the list of commands that you need to enter in the terminal from your home directory:
- mkdir MIMOC
- mv MIMOC_ML_v2.2_CT_SA.zip MIMOC
- cd MIMOC
- unzip MIMOC_ML_v2.2_CT_SA.zip
- cd

Important: The last command *cd* brings you back to your home directory.

We can now load the module *netCDF4* and then load the dataset using the command *nc.Dataset*.
Use the command *print* to see what the netCDF file contains.

In [None]:
# check that MIMOC files are accessible
import netCDF4 as nc
nc_file = 'MIMOC_ML_v2.2_CT_SA_MLP_month01.nc'
nc_fid =  nc.Dataset(nc_file)
print("Content of the netCDF file: ", nc_fid)
nc_fid.close()

## Structure of netCDF files

In [None]:
import netCDF4 as nc

nc_file = 'MIMOC_ML_v2.2_CT_SA_MLP_month01.nc'

# the 'with' statement can be used to open a file temporarily and close it automatically when leaving the code block
with nc.Dataset(nc_file) as nc_fid:
    print(nc_fid)

Every netCDF files contains METADATA about the data in the file. This METADATA is broken down into variables, dimensions, and attributes.
* Variables. Variables contain data stored in the NetCDF file. This data is typically in the form of a multidimensional array. Scalar values are stored as 0-dimension arrays.
* Dimensions. Dimensions can be used to describe physical space (latitude, longitude, height, and time) or indices of other quantities (e.g. weather station identifiers).
* Attributes. Attributes are modifiers for variables and dimensions. Attributes act as ancillary data to help provide context. An example of an attribute would be a variable's units or fill/missing values.

Look at the information provided above for your netCDF file. The file has two dimensions (LONG, LAT).
Variables that gives values for each element of a dimension are called *Coordinates*. Here LATITUDE and LONGITUDE are the two coordinates.
Then, the file contains three variables of dimension (LAT, LON), i.e.e there are 2D tables. The size of these tables is thus (341,720).

This can be checked by using the .shape attribute:


In [None]:
with nc.Dataset(nc_file) as nc_fid:
    print(f"Dim of Mixed-Layer Temperature array = {nc_fid['CONSERVATIVE_TEMPERATURE_MIXED_LAYER'].shape}")
    print(f"Dim of Mixed-Layer Salinity array = {nc_fid['ABSOLUTE_SALINITY_MIXED_LAYER'].shape}")
    print(f"Dim of Mixed-Layer Depth array = {nc_fid['DEPTH_MIXED_LAYER'].shape}")

### List of dimensions

Below, we will create a dictionary with the name of dimension as entry and their value giving the size of the dimension.

In [None]:
nc_file = 'MIMOC_ML_v2.2_CT_SA_MLP_month01.nc'
with nc.Dataset(nc_file) as nc_fid:
    dict_dimensions = {}
    for dim in nc_fid.dimensions:
        dict_dimensions[dim] = nc_fid.dimensions[dim].size
    
print(dict_dimensions)

### List of variables

In [None]:
nc_file = 'MIMOC_ML_v2.2_CT_SA_MLP_month01.nc'
with nc.Dataset(nc_file) as nc_fid:
    dict_variables = {}
    for var in nc_fid.variables:
        dict_variables[var] = [nc_fid.variables[var].shape, nc_fid.variables[var].dtype]
    
print(dict_variables)

## Access data in a variable

To look at the data, you must use the syntax nc_fid['VARIABLE'][:]. Note that the last '[:]' means that you want to look the values in the variable. 

* If you want only to look at the first value, you would use index '[0]'.
* If you want only to look at the last value, you would use index '[-1]'.
* If you want only to look at every second values, you would use index '[::2]'.


In [None]:
nc_file = 'MIMOC_ML_v2.2_CT_SA_MLP_month01.nc'
with nc.Dataset(nc_file) as nc_fid:
    LATITUDE = nc_fid['LATITUDE'][:] # print all the latitude values

print('LATITUDE = ', LATITUDE)
print('First value = ', LATITUDE[0])
print('Last value = ', LATITUDE[-1])
print('Print every second values = ', LATITUDE[::2])

In [None]:
nc_file = 'MIMOC_ML_v2.2_CT_SA_MLP_month01.nc'
with nc.Dataset(nc_file) as nc_fid:
    LATITUDE = nc_fid['LATITUDE'][:] # print all the latitude values
    LONGITUDE = nc_fid['LONGITUDE'][:] # print all the latitude values
    
# print first and last value separately
print(LONGITUDE[0], LONGITUDE[-1])

# create a list with first and last values of Longitude
print(LONGITUDE[[0,-1]])

Let's read all the data from the netCDF file:

In [None]:
# load netcdf module
import netCDF4 as nc

# load latitude, longitude and data in python variable data
nc_file = 'MIMOC_ML_v2.2_CT_SA_MLP_month01.nc'
with nc.Dataset(nc_file) as nc_fid:
    LATITUDE = nc_fid['LATITUDE'][:] # print all the latitude values
    LONGITUDE = nc_fid['LONGITUDE'][:] # print all the latitude values
    TEMPERATURE = nc_fid['CONSERVATIVE_TEMPERATURE_MIXED_LAYER'][:,:] # note that it's a 2-dim variable
    SALINITY = nc_fid['ABSOLUTE_SALINITY_MIXED_LAYER'][:,:] # note that it's a 2-dim variable
    DEPTH_MIXED_LAYER = nc_fid['DEPTH_MIXED_LAYER'][:,:] # note that it's a 2-dim variable

print(TEMPERATURE)

## Some statistics and plots...

In [None]:
# print global mean and standard deviation of temperature (use nanmean and nanstd to ignore NaN values)
import numpy as np
print('mean TEMPERATURE:',np.nanmean(TEMPERATURE))
print('std TEMPERATURE:',np.nanstd(TEMPERATURE))

In [None]:
# print the maximum temperature and indicate at which location the maximum temperature is reached
import numpy as np
print('max TEMPERATURE:',np.nanmax(TEMPERATURE))
index_flattened = np.nanargmax(TEMPERATURE)
index_2d = np.unravel_index(index_flattened, TEMPERATURE.shape )
print('index_flattened = ',index_flattened)
print('index_2d = ',index_2d)
print('latitude_max = ',LATITUDE[index_2d[0]])
print('longitude_max = ',LONGITUDE[index_2d[1]])

print('check that max is reached at position given by index_2d: TEMPERATURE[index_2d]=',TEMPERATURE[index_2d])

In [None]:
# plot temperature along 30W (crossing the Atlantic basin)
print('Longitude at index 720-30*2 = 660 : ', LONGITUDE[660])
import matplotlib.pyplot as plt
plt.plot(LATITUDE,TEMPERATURE[:,660])

In [None]:
# plot temperature along different meridians
import matplotlib.pyplot as plt
for lon in range(0,360,60):
    plt.plot(LATITUDE,TEMPERATURE[:,lon*2],label = lon)
plt.legend()
plt.xlabel('latitude')
plt.ylabel('temperature [degC]')
plt.title('MIMOC surface temperature')
plt.savefig('MIMOC_surface_temperature_meridional_profiles.png')

## Try it yourself

* What is the global mean surface temperature in June?
* What is the global mean surface salinity in January? in June?
* Can you plot the meridional section of temperature at 30W for the twelve different seasons on the same figure?