In [1]:
import xarray as xr

# Open Netcdf datasets via Opendap requests

**Dataset title**: Detailed observations of the spatial and temporal distribution of rainfall and drizzle in Lopik, Netherlands

[Link to dataset 1](https://data.4tu.nl/articles/_/12696887/1) 

Comments: Go to the opendap data service at the bottom of the dataset landing page and click on the link and go to the folder 2019/01/01 

In [3]:
## Open the netcdf dataset and inspect it 

## Exploring the netcdf structure

A classic NetCDF file like this one can be broken down into 3 components:
- dimensions, variables and global attributes.

- The variables can be broken down into coordinate variables and data variables. Sometimes 
they are displayed separately like here, but if you open a NetCDF file using different software the coordinate variables and data variables might be displayed together.

## Attributes

Conventions is probably the most important global attribute because it tells you (and a machine) how to interpret the rest of the file. CF-1.4 refers to version 1.4 of the CF conventions, which you can find here:

https://cfconventions.org/Data/cf-conventions/cf-conventions-1.4/build/cf-conventions.html

In [4]:
## Looking at the attributes of the dataset

## Dimensions

Dimensions are given as a FrozenDict is an immutable dictionary used by xarray for safety, consistency, and performance.

You cannot modify it directly, which helps prevent accidental dimension changes in your datasets.

In other words:

- Dimensions are fixed:

- You can’t accidentally change their length.

- You have to explicitly create new datasets or slices if you want different dimensions.

In [None]:
## Looking at the dimensions of the dataset

## Coordinate and data variables

### Coordinate variables 

Coordinate variables in xarray (and NetCDF, following the CF conventions) are variables used to label and index data along each dimension clearly.

- Coordinate variables give context to the dimensions.

They usually represent:

- time points (e.g., timestamps)

- spatial locations (latitude, longitude, altitude, depth, distance)

- other meaningful numeric or categorical indexes

In [None]:
## Looking at the coordinate variables of the dataset 

In [None]:
## Looking at the coordinate variables's attributes of the dataset

### Data variables

- Data variables are the primary measurements or observations stored in your dataset.

- They are the main scientific or observational values you're analyzing.

- Each data variable is associated with one or more dimensions (and thus coordinates)

In [None]:
## Looking at the data variables of the dataset 

In [None]:
## Looking at the data variables's attributes of the dataset

In [None]:
## Plotting some of the data variables  

# An undocumented netcdf dataset

DENlab wind data

Measuring instrument: DENlab wind speed meter Standard aenemometer as commonly used in water sports.

[Link to the dataset](https://data.4tu.nl/articles/_/12708080/1)

Comments: Go to the opendap data service at the bottom of the dataset landing page and click on the link and go to the file wind-2008.nc

In [None]:
## Open the file and inspect it 

In [None]:
## Look at the attributes. What do you see? What do you miss? 

In [None]:
## Look at the data variables. which dimensions does it have? Can you plot it? 

**Why is NOT following the CF Convention a bad practice?**

1. Limited interpretability
Without clear metadata (e.g., units, descriptions, reference systems), data is ambiguous.

Users can misunderstand or incorrectly interpret data.

2. Poor interoperability
Tools relying on conventions (e.g., visualization software, OPeNDAP servers) expect clearly defined attributes and structures.

Non-standard files cannot easily integrate into data-processing workflows.

3. Reduced Reusability and Reproducibility
Missing standard conventions → extra effort to reuse or verify results.

Others may avoid using unclear or non-standard data.

4. Reduced Findability
Convention-based metadata improves searchability in data catalogs and repositories.

**Benefits of CF-Conventions (Climate & Forecast)**

CF Conventions specify:

- Dimensions and Coordinates clearly labeled (e.g., latitude, longitude, height, time).

- Units, standard_name, and long_name attributes clearly documented.

- Variables structured in standardized ways, easily readable by common tools (xarray, Panoply, ncview).

- Improved data interoperability, allowing easier sharing and reuse of data in science communities.

 Minimal CF-compliance checklist

| CF Element                        | Required?     | Example                                  |
|-----------------------------------|---------------|------------------------------------------|
| **Conventions global attribute**  | ✅ Mandatory  | `Conventions = "CF-1.10"`                |
| **Dimensions clearly defined**    | ✅ Mandatory  | `time`, `latitude`, `longitude`, `height`|
| **Variable attributes: units**    | ✅ Mandatory  | `units = "m s-1"`                        |
| **Variable attributes: standard_name** | ✅ Recommended | `standard_name = "northward_wind"`     |
| **Coordinate attributes: axis**   | ✅ Recommended | `axis = "T"` (for time)                 |
| **Global attributes: metadata**   | ✅ Recommended | `title`, `institution`, `history`, `source`|


## Adjust this dataset to adhere the CF convention 

### Example

In [4]:
# step 1 Import libraries and recreate a dummy dataset

import xarray as xr
import numpy as np
import pandas as pd

times = pd.date_range("2023-01-01", periods=10, freq='h')
v_wind = np.random.rand(10) * 10  # Example data

ds = xr.Dataset(
    data_vars={
        "v_wind": ("time", v_wind, {
            "units": "m s-1"
        }),
    },
    coords={
        "time": times,
    },
    attrs={
        "title": "DENlab wind data, raw, 2008"
    }
)

# Step 2: Add latitude, longitude, height coordinates explicitly
## Assumptions: Measurements are taken at one fixed location, - The measurement device or station location doesn’t change over time,  
#If your sensor moved or if you had multiple locations, you'd define coordinates as arrays.

ds = ds.assign_coords(
    latitude=52.0,
    longitude=4.3,
    height=10.0
)

# Step 3: Add CF-compliant attributes explicitly
# Latitude attributes , Latitude and Longitude explicitly identify the geographical location of the measurements.
ds["latitude"].attrs.update({
    "units": "degrees_north",
    "standard_name": "latitude"
})

# Longitude attributes
ds["longitude"].attrs.update({
    "units": "degrees_east",
    "standard_name": "longitude"
})

# Height attributes, Height explicitly describes the vertical position of measurement (e.g., sensor height above ground).
ds["height"].attrs.update({
    "units": "m",
    "standard_name": "height",
    "positive": "up"
})


# Time attributes
ds["time"].attrs.update({
    "standard_name": "time",
    "axis": "T",
    "long_name": "Time of measurement"
})

# Data variable attributes
ds["v_wind"].attrs.update({
    "standard_name": "northward_wind",
    "long_name": "Northward component of wind velocity"
})

# Global attributes for CF-compliance
ds.attrs.update({
    "Conventions": "CF-1.10",
    "institution": "Delft University of Technology",
    "source": "DENlab Wind Sensor",
    "history": f"Converted to CF conventions on {pd.Timestamp.now()}"
})

# Step 4: Save to a CF-compliant NetCDF file
ds.to_netcdf("denlab_wind_cf.nc")

print("CF-compliant NetCDF file created: 'denlab_wind_cf.nc'")

CF-compliant NetCDF file created: 'denlab_wind_cf.nc'
