# Introduction
Scientific geospatial datasets like climatological or oceanographic model results can sometimes grow in complexity and data volume very quickly. If the data is simulated over large geographic regions or generated multiple times per day, it can become very challenging to work with the huge amount of outputs. The NetCDF format provides a solution for efficient data management by storing the datasets in a well-organized manner that allows a successful data analysis in a user-friendly way.
## But how to work with NetCDF?
Analyzing NetCDF Data with Python is a beginners-friendly guide for learning the basics of reading and analyzing NetCDF. 
All sections of this four-part tutorial are available as Jupyter Notebooks utilizing the Python programming language and an example NetCDF file from the ADCIRC ocean circulation model. 



# Tutorial


## Part 1: Getting Started with NetCDF


### 1.1 - What is NetCDF?


##### About NetCDF
Network common data form (NetCDF) is a data format that is widely used to store multi-dimensional, array-oriented information like geographic, meteorological or oceanographic data. 
NetCDF often handles huge datasets that hold spatial information and data values sampled several times a day or collected over large geographic regions. 
The NetCDF file format structure supports this by providing the data values as named arrays that also can hold metadata to describe the dataset.


#### NetCDF in Python
Python provides the powerful NetCDF4 and NumPy libraries that have all the functionalities we need to successfully work with a NetCDF file.

### 1.2 - Software Installation


We don't have anything for this yet

### 1.3 - Tutorial example NetCDF file

In our exercise, we will use an example NetCDF file coming from the oceanographic ADCIRC model. The file contains the underlying mesh topology and the values for the maximum water elevation computed at each mesh node. 
You can find the file here: (data/maxele.63.nc)

## Part 2: Reading a NetCDF File and Understanding its Structure


### 2.1 - Opening a NetCDF file

In [2]:
# importing libraries
import netCDF4

For reading in the NetCDF file, we will pass it to NetCDF4.Dataset.

In [3]:
# reading in the NetCDF file
myNetCDF = netCDF4.Dataset(r"data\maxele.63.nc")

Now that we have loaded our file in Python, we can see how we can get the desired information out of it. For that, let’s have a look at the general structure of a NetCDF file first.

### 2.2 - NetCDF file structure


#### A NetCDF file has three basic components:



**Attributes** (Metadata) that describe the dataset as a whole and 
all contained data variables or dimensions. 
- “Global attributes” describe the entire dataset with metadata like project title, institution, contact information, program version etc.
- “Variable attributes” describe properties of the data variables like units, scaling factors, offsets etc.


**Dimensions** are used to define the shape of the data variables (arrays) in the netCDF file.

**Variables** hold the actual data values.
- Variables store model output data like water heights, wind velocities, pressures etc. but also information like latitudes, longitudes, times etc. 
- Each variable can also have attributes (metadata) that describe the data.
- The NetCDF format stores variables as arrays that are defined by unique variable names, data types, and array dimensions. 


## Part 3: Accessing NetCDF Attributes

### 3.1 - Dataset attributes (metadata)

When we print the entire NetCDF dataset (myNetCDF), we will get an overview of the file, including:
- the **global attributes** describing the entire dataset
- the **dimensions** defining the shape of the contained data arrays
- the names of all data **variables** contained in the NetCDF file


In [4]:
# getting the attributes (metadata) of the file
print(myNetCDF)


<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4_CLASSIC data model, file format HDF5):
    _FillValue: -99999.0
    model: ADCIRC
    version: 51.52.29
    grid_type: Triangular
    description: Shinnecock Inlet V20051108               ! UPTO 32 CHARACTER ALPHANUMERIC RUN D
    agrid: Shinacock Inlet Coarse Grid
    title: adcirc.org netcdf examples project
    institution: UNC CH Institute of Marine Sciences
    source: adcirc.org examples page
    history: based on Shinnecock Inlet but with netcdf output
    references: http://adcirc.org/home/documentation/example-problems/shinnecock-inlet-ny-with-t
    comments: netcdf4 format was used (fully compatible with hdf5)
    host: adcirc.org
    convention: CF
    Conventions: UGRID-0.9.0
    contact: cfulcher@email.unc.edu
    creation_date: 2015-12-14  4:18:28 -05:00
    modification_date: 2015-12-14  4:18:28 -05:00
    fort.15: ==== Input File Parameters (below) ====
    dt: 6.0
    ihot: 0
    ics: 2
    nolibf: 2
    nolifa: 2


With this information we can explore the variables and their dimensions in more depth. To get a better understanding of how that works, we will dive in to see how the NetCDF4 library reads in the NetCDF files in Python.

### 3.2 - Utilizing the Python dictionary structure for working with NetCDF


When we read a NetCDF file with NetCDF4, the NetCDF ``variables`` and ``dimensions`` are represented as Python **dictionaries**. 
- The keys are the variable ‘names’.
- The values are associated NumPy arrays that contain the data and any attributes. 


All NetCDF variables are read as **dictionaries** in the **[key/value]** structure with the values being NumPy arrays.

How do we access the variables and dimensions in our NetCDF file?
- ``mynetcdf.variables()`` is a dictionary containing the variables
-``mynetcdf.dimensions()`` is a dictionary containing the dimensions of the variables

As we have already explored in,  3.1 - Dataset attributes (metadata)how to get a list of all variable names contained in the NetCDF file by printing the entire dataset.


In [5]:
print(myNetCDF)

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4_CLASSIC data model, file format HDF5):
    _FillValue: -99999.0
    model: ADCIRC
    version: 51.52.29
    grid_type: Triangular
    description: Shinnecock Inlet V20051108               ! UPTO 32 CHARACTER ALPHANUMERIC RUN D
    agrid: Shinacock Inlet Coarse Grid
    title: adcirc.org netcdf examples project
    institution: UNC CH Institute of Marine Sciences
    source: adcirc.org examples page
    history: based on Shinnecock Inlet but with netcdf output
    references: http://adcirc.org/home/documentation/example-problems/shinnecock-inlet-ny-with-t
    comments: netcdf4 format was used (fully compatible with hdf5)
    host: adcirc.org
    convention: CF
    Conventions: UGRID-0.9.0
    contact: cfulcher@email.unc.edu
    creation_date: 2015-12-14  4:18:28 -05:00
    modification_date: 2015-12-14  4:18:28 -05:00
    fort.15: ==== Input File Parameters (below) ====
    dt: 6.0
    ihot: 0
    ics: 2
    nolibf: 2
    nolifa: 2


Alternatively, we can access all variable names by calling the “keys()” of the variables dictionary. 


In [6]:
# get all variables names as contained in the file by printing the dictionary ’keys’
all_variable_names = myNetCDF.variables.keys()
print(all_variable_names)

dict_keys(['time', 'x', 'y', 'element', 'adcirc_mesh', 'neta', 'nvdll', 'max_nvdll', 'ibtypee', 'nbdv', 'nvel', 'nvell', 'max_nvell', 'ibtype', 'nbvv', 'depth', 'zeta_max', 'time_of_zeta_max'])


### 3.3 - Variable attributes (metadata)


The NetCDF variables hold the actual data values of the dataset. Each variable can also contain associated attributes that describe the variable.

Let’s first get an overview of all variable attributes that our NetCDF has.List of all variable attributes


#### List of all variable attributes

In [7]:
# getting a list of all attributes (metadata) of all variables in the file
for var in myNetCDF.variables.values():
   print(var)


<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
    long_name: model time
    standard_name: time
    units: seconds since 2015-12-14 00:00:00 UTC
    base_date: 2015-12-14 00:00:00 UTC
unlimited dimensions: time
current shape = (1,)
filling on, default _FillValue of 9.969209968386869e+36 used
<class 'netCDF4._netCDF4.Variable'>
float64 x(node)
    long_name: longitude
    standard_name: longitude
    units: degrees_east
    positive: east
unlimited dimensions: 
current shape = (3070,)
filling on, default _FillValue of 9.969209968386869e+36 used
<class 'netCDF4._netCDF4.Variable'>
float64 y(node)
    long_name: latitude
    standard_name: latitude
    units: degrees_north
    positive: north
unlimited dimensions: 
current shape = (3070,)
filling on, default _FillValue of 9.969209968386869e+36 used
<class 'netCDF4._netCDF4.Variable'>
int32 element(nele, nvertex)
    long_name: element
    cf_role: face_node_connectivity
    start_index: 1
    units: nondimensional
unlimited dimen

#### Attributes of a specific variable 


If we are interested in a specific variable and want to explore the associated attributes, we can do so by calling the **variable name**.

In [8]:
# access the attributes (metadata) of the variable ’x’ (longitudes)
print(myNetCDF.variables['x'])	


<class 'netCDF4._netCDF4.Variable'>
float64 x(node)
    long_name: longitude
    standard_name: longitude
    units: degrees_east
    positive: east
unlimited dimensions: 
current shape = (3070,)
filling on, default _FillValue of 9.969209968386869e+36 used


### 3.4 - Dimensions attributes (metadata)


If we want to find out in what array structure our NetCDF variables are stored, we can use the **dimensions**. 

Similar to our variables, let’s first get an overview of all dimensions attributes that our NetCDF has.


#### List of all dimensions attributes


In [9]:
# getting a list of all attributes (metadata) of all dimensions in the file
for dim in myNetCDF.dimensions.values():
   print(dim)

<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 1
<class 'netCDF4._netCDF4.Dimension'>: name = 'node', size = 3070
<class 'netCDF4._netCDF4.Dimension'>: name = 'nele', size = 5780
<class 'netCDF4._netCDF4.Dimension'>: name = 'nvertex', size = 3
<class 'netCDF4._netCDF4.Dimension'>: name = 'nope', size = 1
<class 'netCDF4._netCDF4.Dimension'>: name = 'max_nvdll', size = 75
<class 'netCDF4._netCDF4.Dimension'>: name = 'nbou', size = 1
<class 'netCDF4._netCDF4.Dimension'>: name = 'max_nvell', size = 570
<class 'netCDF4._netCDF4.Dimension'>: name = 'mesh', size = 1
