# How to read a netCDF file with python

The format netCDF is a popular data fromat used in Earth science to store gridded dataset. The format is particularly well suited for model outputs and gridded observational datasets. One key advantage of netCDF is that it is self-describing, meaning that the information needed to describe the stored data (called the *metadata*) is available directly in the same file in the form of *attributes*.

The standard module to read and write netCDF file in python is called netCDF4. Complete documentation can be found here: http://unidata.github.io/netcdf4-python/netCDF4/index.html

There are also more advanced modules to read and write netCDF files, such as the module *xarray*. While these modules provide high-level functions to manipulate netCDF files, they tend to hide important details about the structure of these files. It is therefore recommended to know how to use a low-level module such as *netCDF4* before trying to use more advanced and powerful modules such as *xarray*.

In [5]:
# load netCDF4 module
import netCDF4 as nc 

# if an error is issued, you need to install the module netCDF4.
# In that case, open a terminal and execute the following command: pip install netCDF4
# Then restart your Kernel, and try again to execute the import command. No error should be issued anymore.

In [6]:
# check that the netCDF file is accessible
import netCDF4 as nc
nc_file = 'cmems_obs-ins_glo_phy-temp-sal_my_cora-oa_P1M_1734384081745.nc'
nc_fid =  nc.Dataset(nc_file)
print("Content of the netCDF file: ", nc_fid)
nc_fid.close()

Content of the netCDF file:  <class 'netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    Conventions: CF-1.11
    title: Global Ocean - Coriolis Observation Re-Analysis CORA5.2 
    institution: OceanScope
    source: ISAS-V8
    history: 20240604T073040L : Creation
    references: Szekely et al. 2020, doi: 10.17882/46219 
    comment: V8.0 reference climatology and analysis parameters
    subset:source: ARCO data downloaded from the Marine Data Store using the MyOcean Data Portal
    subset:productId: INSITU_GLO_PHY_TS_OA_MY_013_052
    subset:datasetId: cmems_obs-ins_glo_phy-temp-sal_my_cora-oa_P1M_202411
    subset:date: 2024-12-16T21:21:21.745Z
    dimensions(sizes): time(1), depth(1), latitude(1671), longitude(720)
    variables(dimensions): float64 time(time), float32 depth(depth), float32 latitude(latitude), float32 longitude(longitude), float32 PSAL(time, depth, latitude, longitude), float32 TEMP(time, depth, latitude, longitude)
    groups: 


## Structure of netCDF files

In [7]:
import netCDF4 as nc

nc_file = 'cmems_obs-ins_glo_phy-temp-sal_my_cora-oa_P1M_1734384081745.nc'

# the 'with' statement can be used to open a file temporarily and close it automatically when leaving the code block
with nc.Dataset(nc_file) as nc_fid:
    print(nc_fid)

<class 'netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    Conventions: CF-1.11
    title: Global Ocean - Coriolis Observation Re-Analysis CORA5.2 
    institution: OceanScope
    source: ISAS-V8
    history: 20240604T073040L : Creation
    references: Szekely et al. 2020, doi: 10.17882/46219 
    comment: V8.0 reference climatology and analysis parameters
    subset:source: ARCO data downloaded from the Marine Data Store using the MyOcean Data Portal
    subset:productId: INSITU_GLO_PHY_TS_OA_MY_013_052
    subset:datasetId: cmems_obs-ins_glo_phy-temp-sal_my_cora-oa_P1M_202411
    subset:date: 2024-12-16T21:21:21.745Z
    dimensions(sizes): time(1), depth(1), latitude(1671), longitude(720)
    variables(dimensions): float64 time(time), float32 depth(depth), float32 latitude(latitude), float32 longitude(longitude), float32 PSAL(time, depth, latitude, longitude), float32 TEMP(time, depth, latitude, longitude)
    groups: 


Every netCDF files contains METADATA about the data in the file. This METADATA is broken down into variables, dimensions, and attributes.
* Variables. Variables contain data stored in the NetCDF file. This data is typically in the form of a multidimensional array. Scalar values are stored as 0-dimension arrays.
* Dimensions. Dimensions can be used to describe physical space (latitude, longitude, height, and time) or indices of other quantities (e.g. weather station identifiers).
* Attributes. Attributes are modifiers for variables and dimensions. Attributes act as ancillary data to help provide context. An example of an attribute would be a variable's units or fill/missing values.

Look at the information provided above for your netCDF file. The file has two dimensions (LONG, LAT).
Variables that gives values for each element of a dimension are called *Coordinates*. Here LATITUDE and LONGITUDE are the two coordinates.
Then, the file contains three variables of dimension (LAT, LON), i.e.e there are 2D tables. The size of these tables is thus (341,720).

This can be checked by using the .shape attribute:


In [8]:
with nc.Dataset(nc_file) as nc_fid:
    print(f"Dim of Surface Temperature array = {nc_fid['TEMP'].shape}")
    print(f"Dim of Surface Salinity array = {nc_fid['PSAL'].shape}")

Dim of Surface Temperature array = (1, 1, 1671, 720)
Dim of Surface Salinity array = (1, 1, 1671, 720)


### List of dimensions

Below, we will create a dictionary with the name of dimension as entry and their value giving the size of the dimension.

In [9]:
nc_file = 'cmems_obs-ins_glo_phy-temp-sal_my_cora-oa_P1M_1734384081745.nc'
with nc.Dataset(nc_file) as nc_fid:
    dict_dimensions = {}
    for dim in nc_fid.dimensions:
        dict_dimensions[dim] = nc_fid.dimensions[dim].size
    
print(dict_dimensions)

{'time': 1, 'depth': 1, 'latitude': 1671, 'longitude': 720}


### List of variables

In [10]:
nc_file = 'cmems_obs-ins_glo_phy-temp-sal_my_cora-oa_P1M_1734384081745.nc'
with nc.Dataset(nc_file) as nc_fid:
    dict_variables = {}
    for var in nc_fid.variables:
        dict_variables[var] = [nc_fid.variables[var].shape, nc_fid.variables[var].dtype]
    
print(dict_variables)

{'time': [(1,), dtype('float64')], 'depth': [(1,), dtype('float32')], 'latitude': [(1671,), dtype('float32')], 'longitude': [(720,), dtype('float32')], 'PSAL': [(1, 1, 1671, 720), dtype('float32')], 'TEMP': [(1, 1, 1671, 720), dtype('float32')]}


## Access data in a variable

To look at the data, you must use the syntax nc_fid['VARIABLE'][:]. Note that the last '[:]' means that you want to look the values in the variable. 

* If you want only to look at the first value, you would use index '[0]'.
* If you want only to look at the last value, you would use index '[-1]'.
* If you want only to look at every second values, you would use index '[::2]'.


In [None]:
nc_file = 'cmems_obs-ins_glo_phy-temp-sal_my_cora-oa_P1M_1734384081745.nc'
with nc.Dataset(nc_file) as nc_fid:
    latitude = nc_fid['latitude'][:] # print all the latitude values

print('LATITUDE = ', latitude)
print('First value = ', latitude[0])
print('Last value = ', latitude[-1])
print('Print every second values = ', latitude[::2])

In [None]:
nc_file = 'cmems_obs-ins_glo_phy-temp-sal_my_cora-oa_P1M_1734384081745.nc'
with nc.Dataset(nc_file) as nc_fid:
    latitude = nc_fid['latitude'][:] # print all the latitude values
    longitude = nc_fid['longitude'][:] # print all the latitude values
    
# print first and last value separately
print(longitude[0], latitude[-1])

# create a list with first and last values of Longitude
print(longitude[[0,-1]])

Let's read all the data from the netCDF file:

In [14]:
# load netcdf module
import netCDF4 as nc

# load latitude, longitude and data in python variable data
nc_file = 'cmems_obs-ins_glo_phy-temp-sal_my_cora-oa_P1M_1734384081745.nc'
with nc.Dataset(nc_file) as nc_fid:
    LATITUDE = nc_fid['latitude'][:] # print all the latitude values
    LONGITUDE = nc_fid['longitude'][:] # print all the latitude values
    TEMPERATURE = nc_fid['TEMP'][:,:] # note that it's a 2-dim variable
    SALINITY = nc_fid['PSAL'][:,:] # note that it's a 2-dim variable

print(TEMPERATURE)

[[[[-0.9125576019287109 -0.9214654564857483 -0.92674320936203 ...
    -0.9547644257545471 -0.9644861817359924 -0.9725785255432129]
   [-0.917874276638031 -0.9258949160575867 -0.9338327050209045 ...
    -0.9715983867645264 -0.9786617755889893 -0.9876413941383362]
   [-0.9257180094718933 -0.934576153755188 -0.9425758123397827 ...
    -0.9865783452987671 -0.993578314781189 -1.0034362077713013]
   ...
   [-- -- -- ... -- -- --]
   [-- -- -- ... -- -- --]
   [-- -- -- ... -- -- --]]]]


## Some statistics and plots...

In [15]:
# print global mean and standard deviation of temperature (use nanmean and nanstd to ignore NaN values)
import numpy as np
print('mean TEMPERATURE:',np.nanmean(TEMPERATURE))
print('std TEMPERATURE:',np.nanstd(TEMPERATURE))

mean TEMPERATURE: 14.050055
std TEMPERATURE: 11.69037


In [16]:
# print the maximum temperature and indicate at which location the maximum temperature is reached
import numpy as np
print('max TEMPERATURE:',np.nanmax(TEMPERATURE))
index_flattened = np.nanargmax(TEMPERATURE)
index_2d = np.unravel_index(index_flattened, TEMPERATURE.shape )
print('index_flattened = ',index_flattened)
print('index_2d = ',index_2d)
print('latitude_max = ',LATITUDE[index_2d[0]])
print('longitude_max = ',LONGITUDE[index_2d[1]])

print('check that max is reached at position given by index_2d: TEMPERATURE[index_2d]=',TEMPERATURE[index_2d])

max TEMPERATURE: 30.936134
index_flattened =  480131
index_2d =  (0, 0, 666, 611)
latitude_max =  -77.0
longitude_max =  -180.0
check that max is reached at position given by index_2d: TEMPERATURE[index_2d]= 30.936134


## Try it yourself

* What is the global mean surface temperature?
* Can you plot the meridional section of surface temperature at a longitude crossing the centre Baltic proper?