# Introduction

In this example we will work with multidimensional data. I will first create a dummy dataset, so you understand exactly what the data are and how they are shaped. Then we will convert this dummy dataset to NetCDF-CF.

# Import modules

Firstly, let's import the modules that we will use in this example

In [67]:
import xarray as xr # For creating a NetCDF dataset

import pandas as pd # For reading in data (CSV, xlsx etc) to a dataframe

from datetime import datetime as dt # Handling dates and times

import uuid # Creating a UUID for the dataset

import numpy as np # Good for working with multidimensional arrays and mathematical functions

from matplotlib import pyplot as plt # For plotting data

# Loading and checking the data

Let's look at the data we are working with first. Here, we are working with a multidimensional dataset, a grid sea water surface temperatures at different longitudes and latitudes, with measurements repeated every 3 days. This is not real data, but values I made up based on a random number generator. I have no idea if they are realistic, but you should get the idea!

In [68]:
data = pd.read_excel('multidimensional_sea_water_temperature_variables.xlsx', sheet_name='Data')
data

Unnamed: 0,Date,Day,Latitude,Longitude,Sea water temperature (degC)
0,2022-07-02 12:00:00,0,78,30,2.48
1,2022-07-02 12:00:00,0,78,31,2.48
2,2022-07-02 12:00:00,0,78,32,2.49
3,2022-07-02 12:00:00,0,79,30,2.46
4,2022-07-02 12:00:00,0,79,31,1.52
5,2022-07-02 12:00:00,0,79,32,2.16
6,2022-07-05 12:00:00,3,78,30,2.33
7,2022-07-05 12:00:00,3,78,31,1.62
8,2022-07-05 12:00:00,3,78,32,2.02
9,2022-07-05 12:00:00,3,79,30,2.41


# Restructuring the data

First lets make arrays for each of our dimensions

In [69]:
latitude = sorted(list(set(data['Latitude'])))
longitude = sorted(list(set(data['Longitude'])))
time = sorted(list(set(data['Day'])))
latitude, longitude, time

([78, 79], [30, 31, 32], [0, 3, 6, 9, 12])

Now, let's create a multidimensional grid for our sea_surface_skin_temperature variable. We need to be a bit careful with the order here. The dataframe is ordered first by time (5 times), then by latitude (2 latitudes), then by longitude (3 longitudes). We should mirror that order.

In [70]:
sea_surface_skin_temperature = np.array(data['Sea water temperature (degC)']).reshape(5,2,3)
sea_surface_skin_temperature

array([[[2.48, 2.48, 2.49],
        [2.46, 1.52, 2.16]],

       [[2.33, 1.62, 2.02],
        [2.41, 2.23, 1.99]],

       [[1.64, 1.8 , 1.6 ],
        [1.8 , 1.92, 2.22]],

       [[2.35, 2.18, 2.47],
        [2.4 , 2.29, 2.36]],

       [[2.3 , 1.57, 2.13],
        [1.75, 2.37, 2.38]]])

Let's check if all the values have gone where they are supposed to. For example, let's look at the 4th time, 1st latitude and 2nd longitude. Remember in Python we count from 0, so we need to subtract 1 from each of these values.

In [71]:
sea_surface_skin_temperature[3,0,1]

2.18

# Creating an xarray dataset

With xarray, it is easy to create an xarray dataset and convert it to a NetCDF dataset.

We have to consider what components make up a NetCDF file. It has dimensions (time, latitude, longitude) that define the shape, or grid, of the data. It also has variables (sea_surface_skin_temperature), that sit on each point in the grid. The *data_vars* in an xarray dataset are analagous to variables in a NetCDF file, whilst the *coords* are analagous to the dimensions. 

In [72]:
xrds = xr.Dataset(
    data_vars = dict(
        sea_surface_skin_temperature = (["time", "latitude", "longitude"], sea_surface_skin_temperature)
    ),
    coords = dict(
        longitude = longitude, # These coordinate names are compliant with CF conventions (all lower case)
        latitude = latitude, #
        time = time #
    )
)

xrds # Checking it looks okay

# Global attributes

Global attributes describe the dataset as a whole. A list of what global attributes must be included can be found here:

https://adc.met.no/node/4

These are based on the ACDD conventions, that you can find here:

https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3

We refer to which conventions the file adheres to (including version) in the 'conventions' global attribute.

Additional global attributes can also be included, defined by the user. Make sure that the attribute names you select are understandable.

To save you from having to write them all out, I have written them in a separate file that you can load in as below.

In [79]:
global_attributes = pd.read_excel('multidimensional_sea_water_temperature_global_attributes.xlsx', index_col=0)

global_attributes

Unnamed: 0_level_0,Content
Attribute name,Unnamed: 1_level_1
title,Sea surface skin temperature measurements from...
naming_authority,University Centre in Svalbard (UNIS)
id,554b5b10-8675-500c-9ecd-9b23998c0b74
summary,ANALAGOUS TO AN ABSTRACT IN A PAPER
keywords,Earth Science > Oceans > Ocean Temperature > S...
keywords_vocabulary,GCMD
geospatial_lat_min,78
geospatial_lat_max,79
geospatial_lon_min,30
geospatial_lon_max,32


Let's flip the columns and rows the other way round

In [80]:
global_attributes_transposed = global_attributes.transpose()
global_attributes_transposed

Attribute name,title,naming_authority,id,summary,keywords,keywords_vocabulary,geospatial_lat_min,geospatial_lat_max,geospatial_lon_min,geospatial_lon_max,...,creator_url,creator_name,publisher_name,publisher_url,publisher_email,publisher_type,project,license,metadata_link,acknowledgements
Content,Sea surface skin temperature measurements from...,University Centre in Svalbard (UNIS),554b5b10-8675-500c-9ecd-9b23998c0b74,ANALAGOUS TO AN ABSTRACT IN A PAPER,Earth Science > Oceans > Ocean Temperature > S...,GCMD,78,79,30,32,...,https://www.unis.no/staff/ola-nordmann/,Ola Nordmann,NIRD Research Data Archive,https://archive.norstore.no/,archive.manager@norstore.no,institution,The Nansen Legacy (RCN # 276730),https://creativecommons.org/licenses/by/4.0/,DOI PROVIDED BY DATA CENTRE – THAT LINKS TO TH...,Funded by the Research Council of Norway. John...


Now we can convert our dataframe to a dictionary that we can easily write to the xarray dataset (that will become our NetCDF file)

In [81]:
global_attributes_dic = global_attributes_transposed.to_dict('records')[0]
xrds.attrs=global_attributes_dic
xrds.attrs

{'title': 'Sea surface skin temperature measurements from the Northern Barents Sea in July 2020',
 'naming_authority': 'University Centre in Svalbard (UNIS)',
 'id': '554b5b10-8675-500c-9ecd-9b23998c0b74',
 'summary': 'ANALAGOUS TO AN ABSTRACT IN A PAPER',
 'keywords': 'Earth Science > Oceans > Ocean Temperature > Sea Surface Temperature > Sea Surface Skin Temperature',
 'keywords_vocabulary': 'GCMD',
 'geospatial_lat_min': 78,
 'geospatial_lat_max': 79,
 'geospatial_lon_min': 30,
 'geospatial_lon_max': 32,
 'time_coverage_start': '2022-07-02T12:00:00Z',
 'time_coverage_end': '2022-07-14T12:00:00Z',
 'Conventions': 'ACDD-1.3; CF-1.8',
 'history': 'File created using Python’s xarray at INSERT TIMESTAMP HERE',
 'source': 'Satellite estimates of sea surface skin temperature',
 'processing_level': 'raw',
 'date_created': 'INSERT TIMESTAMP HERE',
 'creator_type': 'person',
 'creator_institution': 'The University Centre in Svalbard',
 'creator_email': 'olan@unis.no',
 'creator_url': 'https:/

In [82]:
xrds.attrs['date_created'] = dt.now().strftime("%Y-%m-%dT%H:%M:%SZ")
xrds.attrs['history'] = f'File create at {dt.now().strftime("%Y-%m-%dT%H:%M:%SZ")} using xarray in Python'
xrds.attrs

{'title': 'Sea surface skin temperature measurements from the Northern Barents Sea in July 2020',
 'naming_authority': 'University Centre in Svalbard (UNIS)',
 'id': '554b5b10-8675-500c-9ecd-9b23998c0b74',
 'summary': 'ANALAGOUS TO AN ABSTRACT IN A PAPER',
 'keywords': 'Earth Science > Oceans > Ocean Temperature > Sea Surface Temperature > Sea Surface Skin Temperature',
 'keywords_vocabulary': 'GCMD',
 'geospatial_lat_min': 78,
 'geospatial_lat_max': 79,
 'geospatial_lon_min': 30,
 'geospatial_lon_max': 32,
 'time_coverage_start': '2022-07-02T12:00:00Z',
 'time_coverage_end': '2022-07-14T12:00:00Z',
 'Conventions': 'ACDD-1.3; CF-1.8',
 'history': 'File create at 2022-01-28T12:12:01Z using xarray in Python',
 'source': 'Satellite estimates of sea surface skin temperature',
 'processing_level': 'raw',
 'date_created': '2022-01-28T12:12:01Z',
 'creator_type': 'person',
 'creator_institution': 'The University Centre in Svalbard',
 'creator_email': 'olan@unis.no',
 'creator_url': 'https://w

# Variable attributes

Variable attributes describe each variable. Let's add some attributes for our variables.

The *standard_name* should be selected from here: http://cfconventions.org/standard-names.html. Standard names are commonly accepted parameter names with descriptions. By selecting appropriate standard names for your variable, the data user will be clear exactly what the data represent.

The *units* should match what is provided for the standard name as listed above. You may need to convert your data.

The *long_name* is more descriptive and can be in your own words.

The *coverage_content_type* describes what type of data the variable contains

Some help on these variable attributes can be found here: https://commons.esipfed.org/acdd_1-3_references

In [83]:
xrds['time'].attrs = {
'standard_name': 'time',
'long_name':'time',
'units': 'days since 2020-07-10T12:00:00Z'
}

xrds['latitude'].attrs = {
'standard_name': 'latitude',
'long_name':'decimal latitude in degrees north',
'units': 'degrees_north'
}

xrds['longitude'].attrs = {
'standard_name': 'longitude',
'long_name':'decimal longitude in degrees east',
'units': 'degrees_east'
}

xrds['sea_surface_skin_temperature'] += 273.15 # Converting from degrees celsius to kelvin

xrds['sea_surface_skin_temperature'].attrs = {
'standard_name':'sea_surface_skin_temperature',
'long_name':'Temperature of the sea water directly below the surface',
'units': 'K',
'coverage_content_type': 'physicalMeasurement'
}


Ensure that data are written in the correct form to NetCDF. 
Data served through THREDDS Data Servers cannot have int64 datetime 
specifications, but int32. Also, encoding of missing values are done 
in this step along compression.

# Converting to NetCDF-CF

First, we specify the encoding, then we conver the data to a NetCDF file. We will now add a fill value for the missing values we introduced earlier. This should be a nonesensical value. In this case, we will just make it really high.

In [84]:
myencoding = {
            'time': {
                'dtype': 'int32',
                '_FillValue': None # Coordinate variables should not have fill values.
                },
            'latitude': {
                'dtype': 'float32',
                '_FillValue': None # Coordinate variables should not have fill values.
                },
            'longitude': {
                'dtype': 'float32',
                '_FillValue': None # Coordinate variables should not have fill values.
                },
            'sea_surface_skin_temperature': {
                '_FillValue': -999,
                'zlib': False
                }
            }
        
xrds.to_netcdf('multidimensional_sea_water_temperature.nc',encoding=myencoding)

And that is it! 

# QC the dataset

Now we can check that the file is okay by loading it in again.

In [85]:
myfile = xr.load_dataset(f'multidimensional_sea_water_temperature.nc')

In [86]:
myfile

In [87]:
myfile.attrs

{'title': 'Sea surface skin temperature measurements from the Northern Barents Sea in July 2020',
 'naming_authority': 'University Centre in Svalbard (UNIS)',
 'id': '554b5b10-8675-500c-9ecd-9b23998c0b74',
 'summary': 'ANALAGOUS TO AN ABSTRACT IN A PAPER',
 'keywords': 'Earth Science > Oceans > Ocean Temperature > Sea Surface Temperature > Sea Surface Skin Temperature',
 'keywords_vocabulary': 'GCMD',
 'geospatial_lat_min': 78,
 'geospatial_lat_max': 79,
 'geospatial_lon_min': 30,
 'geospatial_lon_max': 32,
 'time_coverage_start': '2022-07-02T12:00:00Z',
 'time_coverage_end': '2022-07-14T12:00:00Z',
 'Conventions': 'ACDD-1.3; CF-1.8',
 'history': 'File create at 2022-01-28T12:12:01Z using xarray in Python',
 'source': 'Satellite estimates of sea surface skin temperature',
 'processing_level': 'raw',
 'date_created': '2022-01-28T12:12:01Z',
 'creator_type': 'person',
 'creator_institution': 'The University Centre in Svalbard',
 'creator_email': 'olan@unis.no',
 'creator_url': 'https://w

In [88]:
myfile.data_vars

Data variables:
    sea_surface_skin_temperature  (time, latitude, longitude) float64 548.8 ....

In [89]:
myfile['time']