# Exercise 7: NetCDF4

## Aim: Introduce the netCDF4 library in Python to read and create NetCDF4 Files.

### Issues covered:
- Importing netCDF4
- Reading a NetCDF file as a Dataset instance
- Accessing dimensions, variables and attributes
- Defining dimensions, variables and attributes
- Writing a NetCDF file as a Dataset

## Creating/opening/closing netCDF files

Import the `netCDF4` library

In [1]:
import netCDF4

Let's create a new NetCDF file called "test.nc" in all? mode ('a') with the NETCDF4 format. This mode will allow us to edit the dataset later.

In [2]:
new_file = netCDF4.Dataset("test.nc", "a", format="NETCDF4")

Inspect the new file to see what its `data_model` is.

In [3]:
new_file.data_model

'NETCDF4'

## Groups, dimensions, variables and attributes

### Groups

Groups act as containers for variables, dimensions and attributes. Let's add a group to the dataset we just made called "forecasts".

In [4]:
group1 = new_file.createGroup("forecasts")

List the groups of your dataset using `.groups`

In [5]:
new_file.groups

{'forecasts': <class 'netCDF4._netCDF4.Group'>
 group /forecasts:
     dimensions(sizes): 
     variables(dimensions): 
     groups: }

Create a new group within forecasts called `model1` then print the groups to see your new group.

In [6]:
group2 = new_file.createGroup("/forecasts/model1")
new_file.groups

{'forecasts': <class 'netCDF4._netCDF4.Group'>
 group /forecasts:
     dimensions(sizes): 
     variables(dimensions): 
     groups: model1}

What happens if you do `group3 = new_file.createGroup("/analyses/model2")`?

In [7]:
# It creates the 'analyses' group then adds the 'model2' group to it.
group3 = new_file.createGroup("/analyses/model2")
new_file.groups

{'forecasts': <class 'netCDF4._netCDF4.Group'>
 group /forecasts:
     dimensions(sizes): 
     variables(dimensions): 
     groups: model1,
 'analyses': <class 'netCDF4._netCDF4.Group'>
 group /analyses:
     dimensions(sizes): 
     variables(dimensions): 
     groups: model2}

What happens if you do `group4 = new_file.createGroup("analyses")`?

In [8]:
# Nothing - it returns the existing group.
group4 = new_file.createGroup("analyses")
new_file.groups

{'forecasts': <class 'netCDF4._netCDF4.Group'>
 group /forecasts:
     dimensions(sizes): 
     variables(dimensions): 
     groups: model1,
 'analyses': <class 'netCDF4._netCDF4.Group'>
 group /analyses:
     dimensions(sizes): 
     variables(dimensions): 
     groups: model2}

### Dimensions

Create some dimensions for the `new_file` dataset:
- time dimension with unlimited size
- level dimension with unlimited size
- lat dimension with unlimited size
- lon dimension with unlimited size

In [9]:
time = new_file.createDimension('time', None)
level = new_file.createDimension('level', None)
lat = new_file.createDimension('lat', None)
lon = new_file.createDimension('lon', None)

Print out the dimensions you just created.

In [10]:
new_file.dimensions

{'time': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0,
 'level': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0,
 'lat': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'lat', size = 0,
 'lon': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'lon', size = 0}

Check the length of the latitude dimension to make sure it is 10.

In [11]:
print(len(lat))

0


Check that the level dimension is unlimited.

In [12]:
print(level.isunlimited())

True


Let's take a look at an overview using 
```
for dim in new_file.dimensions.values():
    print(dim)
```

In [13]:
for dim in new_file.dimensions.values():
    print(dim)

<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'lat', size = 0
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'lon', size = 0


### Variables

Remember that the data types are as follows:
- `f4`: 32-bit floting point 
- `f8`: 64-bit floating point 
- `i4`: 32-bit signed integer 
- `i2`: 16-bit signed integer
- `i8`: 64-bit unsigned integer
- `i1`: 8-bit signed integer
- `u1`: 8-bit unsigned integer
- `u2`: 16-bit unsigned integer
- `u4`: 32-bit unsigned integer
- `u8`: 64-bit unsigned integer
- `S1`: single-character string

Create a scalar variable called `times` with the type set to `f8`. 

In [14]:
times = new_file.createVariable('times', 'f8')

Create a scalar variable called `levels` but this time set the type to `np.float64`. (You'll need to import nump as np)

In [15]:
import numpy as np
levels = new_file.createVariable('levels', np.float64)

Print out the variables using `new_file.variables`. What do you notice about the types?

In [16]:
# The types are the same - both float64. Sometimes people will use np.float64 as it is more clear than f8. 
print(new_file.variables)

{'times': <class 'netCDF4._netCDF4.Variable'>
float64 times()
unlimited dimensions: 
current shape = ()
filling on, default _FillValue of 9.969209968386869e+36 used, 'levels': <class 'netCDF4._netCDF4.Variable'>
float64 levels()
unlimited dimensions: 
current shape = ()
filling on, default _FillValue of 9.969209968386869e+36 used}


Create a variable in the model2 group we made earlier called `temp`, with the float64 type and this time give it dimensions: (`time`, `level`, `lat`, `lon`). Print it out.

In [17]:
temp = new_file.createVariable("/analyses/model2/temp", np.float64, ("time", "level", "lat", "lon",))
print(new_file.variables)

{'times': <class 'netCDF4._netCDF4.Variable'>
float64 times()
unlimited dimensions: 
current shape = ()
filling on, default _FillValue of 9.969209968386869e+36 used, 'levels': <class 'netCDF4._netCDF4.Variable'>
float64 levels()
unlimited dimensions: 
current shape = ()
filling on, default _FillValue of 9.969209968386869e+36 used}


Create two values: 
- longitudes with the name "lon", type float64 and dimension lon
- latitudes with the name "lat", type float64 and dimension lat

In [18]:
longitudes = new_file.createVariable("lon", np.float64, ("lon",))
latitudes = new_file.createVariable("lat", np.float64, ("lat",))

### Attributes

Let's create a global attribute. Create an attribute on the new_file object called `.description` with the value `This is a test description.`.

In [19]:
new_file.description = "This is a test description."

Let's create a variable attribute. Create an attribute on the `times` variable called `units` and put `hours`.

In [20]:
times.units = "hours"

Take a look at the attrs on `new_file` using `new_file.ncattrs()`. What does this show?

In [21]:
#This just shows the name of the global attrs.
new_file.ncattrs()

['description']

To get the name AND description, use the following loop:
```
for name in new_file.ncattrs():
    print(name, ":", getattr(new_file, name))
```

In [22]:
for name in new_file.ncattrs():
    print(name, ":", getattr(new_file, name))

description : This is a test description.


There is an easier way of doing this - using `new_file.__dict__`. Try it out!

In [23]:
new_file.__dict__

{'description': 'This is a test description.'}

## Writing data to and receiving data from netCDF variables

Create an array to populate the lats with using `lats = np.arange(-100, 100, 2)` and an array to populate the lons with using `lons = np.arange(-200, 200, 2)`.

In [24]:
lats = np.arange(-90, 91, 5)
lons = np.arange(-180, 180, 5)

Print out latitudes and longitudes to see what it looks like before we populate these variables.

In [25]:
print(latitudes)
print(longitudes)

<class 'netCDF4._netCDF4.Variable'>
float64 lat(lat)
unlimited dimensions: lat
current shape = (0,)
filling on, default _FillValue of 9.969209968386869e+36 used
<class 'netCDF4._netCDF4.Variable'>
float64 lon(lon)
unlimited dimensions: lon
current shape = (0,)
filling on, default _FillValue of 9.969209968386869e+36 used


Populate the two variables with our data using `latitudes[:] = lats` and the same for longitudes.

In [26]:
latitudes[:] = lats
longitudes[:] = lons

Print the data out and take a look.

In [27]:
print("latitudes =\n{}".format(latitudes[:]))
print("longitudes =\n{}".format(longitudes[:]))

latitudes =
[-90. -85. -80. -75. -70. -65. -60. -55. -50. -45. -40. -35. -30. -25.
 -20. -15. -10.  -5.   0.   5.  10.  15.  20.  25.  30.  35.  40.  45.
  50.  55.  60.  65.  70.  75.  80.  85.  90.]
longitudes =
[-180. -175. -170. -165. -160. -155. -150. -145. -140. -135. -130. -125.
 -120. -115. -110. -105. -100.  -95.  -90.  -85.  -80.  -75.  -70.  -65.
  -60.  -55.  -50.  -45.  -40.  -35.  -30.  -25.  -20.  -15.  -10.   -5.
    0.    5.   10.   15.   20.   25.   30.   35.   40.   45.   50.   55.
   60.   65.   70.   75.   80.   85.   90.   95.  100.  105.  110.  115.
  120.  125.  130.  135.  140.  145.  150.  155.  160.  165.  170.  175.]


- Extend new_file to include dimensions for `time` and `pressure` where time is an unlimited dimension.
- Define a 4D variable `temperature` with dimensions (time, pressure, latitude, longitude)
- Generate random temperature data for a subset of time and pressure values and assign it to `temperature`. Use dimensions (10, 3, 37, 73) where `time` ranges from 0 to 9, `pressure` has three levels (850, 500 and 200 hPa).
- After assigning the data, print the shape of the `temperature` variable.

In [28]:
import numpy as np

new_file.createDimension("pressure", 10)

temperature = new_file.createVariable("temperature", "f4", ("time", "pressure", "lat", "lon",))

nlats = len(new_file.dimensions["lat"])
nlons = len(new_file.dimensions["lon"])

temperature[0:10, 0:3, :, :] = np.random.uniform(size=(10, 3, nlats, nlons))

print("temp shape after adding data = {}".format(temperature.shape))

temp shape after adding data = (10, 10, 37, 72)


- Define the `pressure` variable with values [1000, 850, 700, 500, 300, 250, 200, 150, 100, 50].
- Populate the `pressure` variable in the netCDF dataset.
- Use fancy indexing to slice the temoerature variable: select times 0, 2 and 4. Use pressure levels [850, 500, 200] and select only positive latitudes and longitudes.
- Print the shape of the resulting subset array.

In [29]:
pressure = new_file.createVariable("pressure", "f4", ("pressure",))

pressure[:] = [1000., 850., 700., 500., 300., 250., 200., 150., 100., 50.]

temperature = new_file.variables["temperature"]
latitudes = new_file.variables["lat"][:]
longitudes = new_file.variables["lon"][:]

tempdat = temperature[::2, [1, 3, 6], latitudes > 0, longitudes > 0]
print("shape of fancy temp slice = {}".format(tempdat.shape))

shape of fancy temp slice = (5, 3, 18, 35)


## Time-coordinates

Most metadata standards specify that time should be measured relative to a fixed date with units such as `hours since YY-MM-DD hh:mm:ss`. We can convert values to and from calendar dates using `num2date` and `date2num` from the `cftime` library. Two other helpful functions are `datetime` and `timedelta` from the `datetime` library.

- Let's generate a list of data and time values: create a list called `dates` containing date and time values, starting from January 1st 2022, and incrementing by 6 hours for a total of 5 entries. 
- Use `date2num` to convert your list of dates to numeric values using: `units="hours since 2022-01-01 00:00:00"` amd `calendar="gregorian"`. Store these in an array called `times`.
- Print the numeric times values to confirm the numeric representation.
- Use `num2date` to convert times back to datetime objects using the same units and calendar. Store these in a list called `converted_dates`
- Print the converted dates to verify they match the original dates list. 

In [33]:
from datetime import datetime, timedelta
from cftime import num2date, date2num

# Step 1: Generate dates list
dates = [datetime(2022, 1, 1) + n * timedelta(hours=6) for n in range(5)]
print("Original dates:", dates)

# Step 2: Convert dates to numeric time values
units = "hours since 2022-01-01 00:00:00"
calendar = "gregorian"
times = date2num(dates, units=units, calendar=calendar)

# Step 3: Print numeric time values
print("Numeric time values (in units '{}'):\n{}".format(units, times))

# Step 4: Convert numeric time values back to calendar dates
converted_dates = num2date(times, units=units, calendar=calendar)

# Step 5: Print converted dates
print("Dates corresponding to numeric time values:\n", converted_dates)

Original dates: [datetime.datetime(2022, 1, 1, 0, 0), datetime.datetime(2022, 1, 1, 6, 0), datetime.datetime(2022, 1, 1, 12, 0), datetime.datetime(2022, 1, 1, 18, 0), datetime.datetime(2022, 1, 2, 0, 0)]
Numeric time values (in units 'hours since 2022-01-01 00:00:00'):
[ 0  6 12 18 24]
Dates corresponding to numeric time values:
 [cftime.DatetimeGregorian(2022, 1, 1, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(2022, 1, 1, 6, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(2022, 1, 1, 12, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(2022, 1, 1, 18, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(2022, 1, 2, 0, 0, 0, 0, has_year_zero=False)]


## Multi-file datasets

## Compression of variables

## Compound data types

## Variable-length data types

## Enum data type

## Extension