# Introduction to netCDF4

netCDF (or Network Common Data Form) files are way of storing multidimensional data so it can be shared by scientists on different computers, different operating systems, and using different programming languages. In this module, we'll take a look at why it is so widely used by oceanographers and climate scientists today. 

Let's first install the netCDF4 module that reads .netcdf files in Python.

1. Open Terminal
1. Check if the netcdf4 module is available for installation using ```conda search netcdf4```. This should give you a list of versions of the module.
1. Type in ```conda install netcdf4``` to install the latest version.
1. Check if the module has been installed using ```conda list```.

We will be working with satellite measurements of [sea-surface temperature from NASA](https://neo.sci.gsfc.nasa.gov/view.php?datasetId=MYD28M).  This a quick video on how the Aqua satellite collects this data https://www.youtube.com/watch?v=unlfchZaRo0.

![AQUA satellite](https://sealevel.nasa.gov/system/missions/images/2_aqua_deploy.1.jpg)



Let's import this dataset for the month of August, 2019 using the function Dataset(**file_name**) from the netCDF4 package. You might need to change the path of your file. This is tells Python where the file is the stored on your computer. 

In [None]:
from netCDF4 import Dataset #import Dataset from the netCDF4 package
data = Dataset("A20192132019243.L3m_MO_SST_sst_4km.nc") # SST = sea surface temperature

Let's just call the data variable and see what we get.

In [None]:
data

You probably notice that the **data** variable is ordered in this way:
* attributes of the data: who collected the data, when was it acquired, what methods they used etc.
* dimensions: size of the dataset
* variables: the data itself i.e sea surface temperature, latitude, longitude etc.

We will look at each of these in turn.

## Attributes
To look at a particular attribute, type in **dataset.attribute_name** like so:

In [None]:
data.date_created

To look at just the names of the attributes, you would use the function ncattrs().

In [None]:
data.ncattrs()

What is the start and end time of the dataset i.e the time_coverage?

In [None]:
print(data.time_coverage_start)
print(data.time_coverage_end)

## Dimensions

The dimensions tell you the size of the dataset. You can access the dimensions of the dataset by calling **dataset.dimensions**. Notice that the output is a dictionary. 

There are also some dimensions that do not have any physical meaning namely 'rgb' and 'eightbitcolor'. These will become useful when mapping the data and will ignore them for now. 

In [None]:
data.dimensions

We can see the "keys," or dimension names, with **dataset.dimensions.keys()**

In [None]:
data.dimensions.keys()

What are the dimensions of this dataset? Does it make sense?

If you want to see a specific dimension, you can do so by adding brackets and the dimension name in quotes. i.e. **dataset.dimensions['lat']**.

In [None]:
data.dimensions['lat']

Like you did earlier, to pull out a particular attribute of this dimensions (in this case size) you would type in **dataset.dimensions['lat'].size**. Create a tuple which gives you the size of the latitude and longitude dimensions.

In [None]:
latitudeSize = 0 #insert value here
longitudeSize = 0 #insert value here
gridSize = (latitudeSize,longitudeSize)
print(gridSize)

## Variables

Our global sea surface temperature data is saved as a two-dimensional array. 

Take a break a play the Battleship Game: https://www.battleshiponline.org. You are given a grid (i.e a two dimensional array) and you have identify where your enemy ships are by selecting a x and y coordinate on the grid.

Similarly for our data, the x-axis is latitude ('lat') and the y-axis is longitude ('lon'). The sea surface temperature is like the location of the ships. It provides additional information for each location point selected.

![lat and long grid](https://www.ncl.ucar.edu/Applications/Images/mapgrid_1_lg.png)

Now, we are ready to look at the variables we're playing with using **dataset.variables**. First output the names of these variables. 

Hint: refer back to our steps for looking at the dimensions of the dataset. 

Output out just the variable 'sst'. 

Hint: again refer back to our steps for looking at the dimensions of the dataset. 

Look over the attributes to the this variable like its name, units, etc. Are there any that don't make sense? Note them down and we'll discuss it together. 

Get the size of the 'sst' variable. 

Hint: This is called the **shape** of the dataset because it is two-dimensional.

Does it match it up to the grid size we figured out earlier? Test this out using code!

Now working with a partner, draw out the structure of the dataset we looked at today. 

Can you imagine packing all of this information onto a list or an excel sheet? This is why .netcdf files are so useful!

## Acessing data

Notice so far we haven't seen the data itself. We can do so by typing in **dataset**.variables[**variable_name**][:]. The data is saved as a masked array. This hides certain elements in the data. We'll see why that is useful when we plot the data.

In [None]:
data.variables['sst'][:]

To get the unmasked data, type in **dataset**.variables['**variable_name**'].data

Notice that the fill value = -32767 mentioned above fills in anything that was True in the masked dataset. 

In [None]:
data.variables['sst'][:].data

We want to make sure that all the elements in the dataset aren't masked. To do so, let's play around with the indexing of the array. Skip the next section if you feel comfortable with indexing.

### Refresher on indexing with lists:

In [None]:
A = [0,1,2,3,4,5,6,7,8,9]
print('for entire list: ', A[:])
print('fifth element: ', A[4])
print('first 3 elements: ', A[:3])
print('last element: ', A[9], 'or ', A[-1])

In [None]:
# Output the last three elements in A


In [None]:
print('even numbers: ', A[0:10:2]) 
# note: this does not take into account the end index i.e 10

#If you did A[10], Python would give you an error since there is no 11th element
A[10]

### Challenge:

Plot out the 2160th row of the 'sst' grid against longitude.  Use the masked data. What does this approximately show?

Hint: find out what the 2160th element in ```data.variables['lat']``` is. 