# The Network Common Data Format: netCDF

NetCDF is one of the most common ways that geoscience data is distributed. It was developed in the early 1990s specifically to deal with the challenges associated with multidimensional arrays. 

Much of the climate/earth/ocean/atmosphere data that you can access will be in the form of netCDF files. They typically have the extention`.nc`, so like `ocean_temps.nc`.

NetCDF files are machine independant, meaning that macs, PCs, linux machines, you name it, they can all read the files. 

Also, the netCDF files are self contained - i.e. they carry all the information about the data they contain with them. So they are 'self-describing' like the datasets and dataArrays we have been building. 

In fact, Xarray is bascially a package devoted to reading, writing, and manipulating netCDFs. This means it's a super easy and useful way to work with geophysical data from nearly anywhere. 

In this lesson we are going to use Xarray to load some Sea Surface Temperature data from a netCDF file. We will see how easy it is to make calculations and plots of these big data sets using Xarray.

### credit 

This lesson is from  Abernathy's book: (https://earth-env-data-science.github.io/lectures/xarray/xarray_intro.html). 


# Loading netCDF datasets

The primary tool in the Xarray library that we will use with netCDF files is `xr.open_dataset()`. This will read in a netCDF file and create one of our DataArrays. 

In this example we are going to read in a Sea Surface Temperature dataset created by NOAA that goes back to the 1800's. You can learn more about the data here: https://www.ncdc.noaa.gov/data-access/marineocean-data/extended-reconstructed-sea-surface-temperature-ersst-v5

First, let's do our normal import statements that we need to access the libraries in this new notebook:


Let's load in the data using `xr.open_dataset()` and take a look at it:

Did that work? There is a lot of information there. Let's go through all of it to make sure we understand what our Dataset looks like. 

Draw on the board and answer the following:
* What are the dimensions of the data?
* What is the data itself
* what do the coordinate of the dimensions look like?
* draw a schematic of the data and label all the 'sides'
* what is the stuff in the attributes?


# plotting netcdf data

Next let's make some plots to look at the data. We have lat, lon, Sea Surface Temperature data over a range of times. Maybe let's start with a simple plot of the SST all over the globe on one particular day. What is a good day?

*note* if you look at the time dimension, we see that the data is reported in monthly means with dates on the first of the month - let's pick the first day of a month.

what if we pick a different day of the month?


We got an error because we asked for a specific day that isn't in the dataset. We can get around this sort of thing luckily!


### Nearest point indexing, or 'nearest neighbor lookups'

In the case above we input an exact date that is avilabile in our data. What if we didn't know all the exact dates? Try putting a random date in to the plot call.  What if we want to get the time closest to some date we care about? Xarray can handle this if we give it an extra arguement using `method='nearest'`:

Ok, so we can pretty easily make a plot of global SST on a single day. That is pretty cool. 

We can use this dataset to see some amazing things without doing a lot of hard work thanks to the people who developed xarray (and the people who created/collected the data!!!!!).




### Let's make a simple plot to see how global average sea surface temperature has changed over time. Do you think we will be able to see a warming signal?

To do this we want to use xarray's `.mean()` function. But we need to tell it what kind of mean we want. In other words we need to define the dimensions over which to take the mean. If we are interested in makeing a plot that shows global averaged sea surface temperature over time, what are the dimentions to average over?

we are going to do something like: `ds.sst.mean( dim = ('dim1', ...) ).plot()` fill in the blanks:

What about just plotting the time average map of SST? What dimensions are we going to average over here?

What about the average temperature as a function of latitude? We want to make a line plot that shows how temperature depends on latitude only, how would we do that?


How about a timeseries of temperature at a single point? Let's make a plot of the SST at 45 degrees north, and 230 degrees. How do we do that? Recall the `.sel()` method, and it's arguement `nearest`

that is a mess. Let's adjust the axis so we can see what is happening in that blue mess. Let's pick 20 years of data, from 1980 to 2000 and zoom in. We can do this by setting the range of the x axis. We are going to build up a lot of tricks to make plots look the way we want. This is one. 

Huh. That's cool. What are we seeing here?

Let's plot two different Latitudes, one high lat and one on the equator:

# Groupby
Yep, we can do groupby here too.

Let's groupby month and apply a mean. This will give us a climatology of SST from the past couple hundered years at every point on the globe:

climatology at a specific point in the North Atlantic:

Plot the July minus Jan differences

### remove a time mean

Let's look more clearly at the long term SST trend by removing the seasonal climatology

timeseries of SST anomaly at a certain point:

## Saving data to netcdf

Suppose we are always working with the mean surface temperature. Here calculating the mean is fast, but suppose it were very slow... It would be useful to save the mean data so we don't have to repeat the calculation.

Xarray makes that very easy. In general it works like this: 

```python
name = "whatever.nc"
some_dataset.to_netcdf(name)
```

So lets try that for our data:

The end...

# Breakout / exercise 03