# Introduction to netCDF4

netCDF (or Network Common Data Form) files are way of storing multidimensional data so it can be shared by scientists on different computers, different operating systems, and using different programming languages. In this module, we'll take a look at why it is so widely used by oceanographers and climate scientists today. 

Let's first install the netCDF4 module that reads .netcdf files in Python.

1. Open Terminal
1. Check if the netcdf4 module is available for installation using ```conda search netcdf4```. This should give you a list of versions of the module.
1. Type in ```conda install netcdf4``` to install the latest version.
1. Check if the module has been installed using ```conda list```.

We will be working with satellite measurements of [sea-surface temperature from NASA](https://neo.sci.gsfc.nasa.gov/view.php?datasetId=MYD28M).  This a quick video on how the Aqua satellite collects this data https://www.youtube.com/watch?v=unlfchZaRo0.

![AQUA satellite](https://sealevel.nasa.gov/system/missions/images/2_aqua_deploy.1.jpg)



Let's import this dataset for the month of August, 2019 using the function Dataset(**file_name**) from the netCDF4 package.

In [4]:
from netCDF4 import Dataset #import Dataset from the netCDF4 package
data = Dataset("A20192132019243.L3m_MO_SST_sst_4km.nc") # SST = sea surface temperature

If you were working with this dataset, you would want to know who collected it, when the data was acquired, what methods they used etc. This is called "metadata," which is basically data about the data. In a .netcdf file, these are its attributes. We can access these by just calling our data variable.

In [5]:
data

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    product_name: A20192132019243.L3m_MO_SST_sst_4km.nc
    instrument: MODIS
    title: MODISA Level-3 Standard Mapped Image
    project: Ocean Biology Processing Group (NASA/GSFC/OBPG)
    platform: Aqua
    temporal_range: month
    processing_version: 2014.0.1
    date_created: 2019-09-18T20:24:53.000Z
    history: l3mapgen par=A20192132019243.L3m_MO_SST_sst_4km.nc.param 
    l2_flag_names: LAND,HISOLZEN
    time_coverage_start: 2019-08-01T00:35:01.000Z
    time_coverage_end: 2019-09-01T02:59:59.000Z
    start_orbit_number: 91712
    end_orbit_number: 92164
    map_projection: Equidistant Cylindrical
    latitude_units: degrees_north
    longitude_units: degrees_east
    northernmost_latitude: 90.0
    southernmost_latitude: -90.0
    westernmost_longitude: -180.0
    easternmost_longitude: 180.0
    geospatial_lat_max: 90.0
    geospatial_lat_min: -90.0
    geospatial_lon_max: 180.0
    geospatia

To access a particular attribute, type in name you've given your dataset followed by a period ("."). Find the temporal range of this dataset i.e what time period the data was collected over. 

In [17]:
#example:
print(data.creator_email) 

data@oceancolor.gsfc.nasa.gov


Our global sea surface temperature data is saved as a two-dimensional array. 

Take a break a play the Battleship Game: https://www.battleshiponline.org. You are given a grid (i.e a two dimensional array) and you have identify where your enemy ships are by selecting a x and y coordinate on the grid.

Similarly for our data, the x-axis is latitude ('lat') and the y-axis is longitude ('lon'). The sea surface temperature is like the location of the ships. It provides additional information for each location point selected.

![lat and long grid](https://www.ncl.ucar.edu/Applications/Images/mapgrid_1_lg.png)

The dimensions tell you the size of the dataset. You can access the dimensions of the dataset by calling **dataset.dimensions**. Notice that the output is a dictionary. 

There are also some dimensions that do not have any physical meaning namely 'rgb' and 'eightbitcolor'. These will become useful when mapping the data and will ignore them for now. 

In [18]:
data.dimensions

OrderedDict([('lat',
              <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 4320),
             ('lon',
              <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 8640),
             ('rgb',
              <class 'netCDF4._netCDF4.Dimension'>: name = 'rgb', size = 3),
             ('eightbitcolor',
              <class 'netCDF4._netCDF4.Dimension'>: name = 'eightbitcolor', size = 256)])

We can see the "keys," or dimension names, with **dataset.dimensions.keys()**

In [19]:
data.dimensions.keys()

odict_keys(['lat', 'lon', 'rgb', 'eightbitcolor'])

If you want to see a specific dimension, you can do so by adding brackets and the dimension name in quotes. i.e. **dataset.dimensions['lat']**.

In [20]:
data.dimensions['lat']

<class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 4320

Like you did earlier, to pull out a particular attribute of this dimensions (in this case size) you would type in **dataset.dimensions['lat'].size**. Create a tuple which gives you the size of the grid i.e the size of the latitude and longitude dimensions.

In [None]:
latitudeSize = 0
longitudeSize = 0
gridSize = (latitudeSize,longitudeSize)
print(gridSize)

Now, we are ready to look at the variables we're playing with using **dataset.variables."something"**. First output the names of these variables. 

Hint: refer back to our steps for looking at the dimensions of the dataset. 

Now, the variables themselves. This is a lot of information so try doing just the variable 'sst' instead. 

Hint: again refer back to our steps for looking at the dimensions of the dataset. 

Look over the attributes to the this variable like its name, units, etc. Are there any that don't make sense? Note them down and we'll discuss it together. 

We can access any one of these attributes by calling it directly. Just add a period at the end of your call to a variable and add in the attribute name.

What is the shape (or size) of this variable? Does it match it up to the grid size we figured out earlier? Test this out using code!

Hint: **dataset.variables['sst'].shape** gives you the size of the variable.

Now working with a partner, draw out the structure of the dataset we looked at today. 

Can you imagine packing all of this information onto a list or an excel sheet? This is why .netcdf files are so useful!