In [25]:
import pandas as pd
from scipy.io import netcdf
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

[Guide to NetCDF Data](https://towardsdatascience.com/read-netcdf-data-with-python-901f7ff61648)

From article above: Network common data form (NetCDF) is commonly used to store multidimensional geographic data. Some examples of these data are temperature, precipitation, and wind speed. Variables stored in NetCDF are often measured multiple times per day over large (continental) areas. With multiple measurements per day, data values accumulate quickly and become unwieldy to work with. When each value is also assigned to a geographic location, data management is further complicated. NetCDF provides a solution for these challenges.

In [26]:
from netCDF4 import Dataset

# Open the NetCDF file for reading
# nc_file = Dataset('b.e21.BHISTcmip6.f09_g17.LE2-1001.001.cam.h0.SST.185001-185912.nc', 'r')

# lat_data = nc_file.variables['lat'][:]

A NetCDF file has three basic parts: metadata, dimensions and variables. Variables contain both metadata and data.

In [27]:
SST1_file = Dataset('/Users/michaelvaden/Downloads/b.e21.BHISTcmip6.f09_g17.LE2-1001.001.cam.h0.SST.196001-196912.nc', 'r')

Metadata can be accessed using a print statement or with a dictionary like such:

In [28]:
SST1_file.__dict__

{'Conventions': 'CF-1.0',
 'source': 'CAM',
 'case': 'b.e21.BHISTcmip6.f09_g17.LE2-1001.001',
 'logname': 'sunseon',
 'host': 'mom2',
 'initial_file': 'b.e21.B1850.f09_g17.CMIP6-piControl.001.cam.i.1001-01-01-00000.nc',
 'topography_file': '/mnt/lustre/share/CESM/cesm_input/atm/cam/topo/fv_0.9x1.25_nc3000_Nsw042_Nrs008_Co060_Fi001_ZR_sgh30_24km_GRNL_c170103.nc',
 'model_doi_url': 'https://doi.org/10.5065/D67H1H0V',
 'time_period_freq': 'month_1'}

Access to dimensions is similar to file metadata. Each dimension is stored as a dimension class which contains pertinent information. Metadata for all dimensions can be access by looping through all available dimensions, like so.

In [29]:
for dim in SST1_file.dimensions:
    print(dim)

for dim in SST1_file.dimensions.values():
    print(dim)

lat
lon
zlon
nbnd
time
chars
lev
ilev
<class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 192
<class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 288
<class 'netCDF4._netCDF4.Dimension'>: name = 'zlon', size = 1
<class 'netCDF4._netCDF4.Dimension'>: name = 'nbnd', size = 2
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 120
<class 'netCDF4._netCDF4.Dimension'>: name = 'chars', size = 8
<class 'netCDF4._netCDF4.Dimension'>: name = 'lev', size = 32
<class 'netCDF4._netCDF4.Dimension'>: name = 'ilev', size = 33


And the same thing can be done for variable metadata:

In [30]:
for var in SST1_file.variables.values():
    print(var)

<class 'netCDF4._netCDF4.Variable'>
float64 lat(lat)
    _FillValue: -900.0
    long_name: latitude
    units: degrees_north
unlimited dimensions: 
current shape = (192,)
filling on
<class 'netCDF4._netCDF4.Variable'>
float64 lon(lon)
    _FillValue: -900.0
    long_name: longitude
    units: degrees_east
unlimited dimensions: 
current shape = (288,)
filling on
<class 'netCDF4._netCDF4.Variable'>
float64 gw(lat)
    _FillValue: -900.0
    long_name: latitude weights
unlimited dimensions: 
current shape = (192,)
filling on
<class 'netCDF4._netCDF4.Variable'>
float64 zlon(zlon)
    _FillValue: -900.0
    long_name: longitude
    units: degrees_east
    bounds: zlon_bnds
unlimited dimensions: 
current shape = (1,)
filling on
<class 'netCDF4._netCDF4.Variable'>
float64 zlon_bnds(zlon, nbnd)
    _FillValue: -900.0
    long_name: zlon bounds
    units: degrees_east
unlimited dimensions: 
current shape = (1, 2)
filling on
<class 'netCDF4._netCDF4.Variable'>
float64 lev(lev)
    _FillValue: -9

In [31]:
print(SST1_file['lat'])

<class 'netCDF4._netCDF4.Variable'>
float64 lat(lat)
    _FillValue: -900.0
    long_name: latitude
    units: degrees_north
unlimited dimensions: 
current shape = (192,)
filling on


You can get actual data values by indexing, and a numpy array is returned (nice).

In [32]:
pd.DataFrame(SST1_file['lat'][:]).rename({0: "latitude"}, axis=1)

Unnamed: 0,latitude
0,-90.000000
1,-89.057592
2,-88.115183
3,-87.172775
4,-86.230366
...,...
187,86.230366
188,87.172775
189,88.115183
190,89.057592
