# Import modules

Let's import the modules that we will use.

In [23]:
import xarray as xr # For creating a NetCDF dataset

# Introducing the data

In this example, we will be loading a depth profile of some Chlorophyll A data. However, this example should be relevant for depth profiles of any data.

Nansen Legacy data can be found via the SIOS data access portal. All Nansen Legacy datasets should be returned when filtering using the 'AeN' collection. Please contact data.nleg@unis.no if you have any problems finding or accessing data.

I have downloaded the following dataset into my directory.

# Loading the data

In [28]:
data = xr.open_dataset('AR_PR_CT_58US_2021710.nc')

# Overview of the file

Firstly, let's have a look at the entire dataset.

In [27]:
data

To look at all of the attributes:

In [29]:
data.attrs

{'title': 'Arctic Ocean - In Situ Observation Copernicus',
 'qc_manual': 'Recommendations for in-situ data Near Real Time Quality Control https://doi.org/10.13155/36230',
 'contact': 'cmems-service@imr.no',
 'format_version': '1.4',
 'distribution_statement': 'These data follow Copernicus standards; they are public and free of charge. User assumes all risk for use of data. User must display citation in any publication or product using data. User must contact PI prior to any commercial use of data.',
 'citation': 'These data were collected and made freely available by the Copernicus project and the programs that contribute to it ',
 'naming_authority': 'Copernicus Marine In Situ',
 'data_assembly_center': 'IMR',
 'update_interval': 'void',
 'area': 'Arctic Ocean',
 'author': '',
 'Conventions': 'CF-1.6 Copernicus-InSituTAC-FormatManual-1.42 Copernicus-InSituTAC-SRD-1.5 Copernicus-InSituTAC-ParametersList-3.2.0 ACDD-1.3',
 'data_mode': 'R',
 'comment': '',
 'history': '',
 'references': 

The 'Conventions' attribute is important. It tells us what standards have been followed when creating the file. If you are not sure what is meant by 'creator_name' for example, you can look it up and find a definition for this term.

The ACDD-1.3 standards are discovery metadata (helps someone find the data), and can be found here:

The CF-1.6 standards are use metadata (helps someone use the data), and can be found here:

To look at individual attributes:

In [36]:
data.attrs['Conventions']

'CF-1.6 Copernicus-InSituTAC-FormatManual-1.42 Copernicus-InSituTAC-SRD-1.5 Copernicus-InSituTAC-ParametersList-3.2.0 ACDD-1.3'

To see all the variables:

In [32]:
data.data_vars

Data variables: (12/17)
    TIME_QC      (TIME) float32 ...
    POSITION_QC  (POSITION) float32 ...
    DIRECTION    (TIME) object ...
    PRES         (TIME, DEPTH) float32 ...
    PRES_QC      (TIME, DEPTH) float32 ...
    TEMP         (TIME, DEPTH) float64 ...
    ...           ...
    TEMP_QC      (TIME, DEPTH) float32 ...
    PSAL_QC      (TIME, DEPTH) float32 ...
    FLU2_QC      (TIME, DEPTH) float32 ...
    CNDC_QC      (TIME, DEPTH) float32 ...
    SVEL_QC      (TIME, DEPTH) float32 ...
    CCOMD003_QC  (TIME, DEPTH) float32 ...

To see an individual data variable:

In [44]:
data['PSAL']

There are variable attributes. The standard_name refers to the name of the variable from a controlled vocabulary, the CF-1.6 standards. We can find a definition for this variable by following the link below.

The long_name is provided by the data creator, in their own words. 

# Dumping to Excel file

Some people prefer to work with the data in a format that they're more familiar with. To output as CSV or XLSX:

In [62]:
import pandas as pd

df = data['TEMP'].to_dataframe()

We can create a dataframe for a second variable, and join them together. Or we can do this in a loop to select and add many variables that have the same dimensions

In [63]:
for var in ['PSAL', 'PRES', 'SVEL']:
    df_to_add = data[var].to_dataframe()
    df = df.join(df_to_add)

df

Unnamed: 0_level_0,Unnamed: 1_level_0,TEMP,PSAL,PRES,SVEL
TIME,DEPTH,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-08-26 16:28:23,0,5.572,34.878,5.0,1472.96
2021-08-26 16:28:23,1,5.784,34.967,6.0,1473.94
2021-08-26 16:28:23,2,5.678,34.599,7.0,1473.06
2021-08-26 16:28:23,3,5.709,33.862,8.0,1472.27
2021-08-26 16:28:23,4,5.644,33.623,9.0,1471.73
...,...,...,...,...,...
2021-09-22 04:32:56,4358,,,,
2021-09-22 04:32:56,4359,,,,
2021-09-22 04:32:56,4360,,,,
2021-09-22 04:32:56,4361,,,,


Now let's write that dataframe out to an xlsx file

In [67]:
df.to_excel('/home/lukem/ctd_data.xlsx')

In [64]:
lat = data['LATITUDE'].to_dataframe()
lon = data['LONGITUDE'].to_dataframe()
len(lon), len(temp) # IS THERE A QUICK WAY TO TIE TOGETHER LATITUDE AND LONGITUDE WITH THE OTHER VARIABLES?

(44, 191972)