# Activity 1 - A Quickstart to Playing with OOI Data 
*Written by Sage Lichtenwalner, Rutgers University, May 30, 2019*

The example was developed for the **June 2019 OOI Ocean Data Labs Workshop**

## Introduction
In this Python notebook, we will demonstrate how to quickly access and work with data from the Ocean Observatories Initiative (OOI). 

This example was designed to run on Google's Colaboratory platform, though it should also work on any Jupyter notebook platform, assuming the required libraries are installed.  

In this notebook, we will demonstrate the following **Data Discovery** steps:
3. Loading Data
4. Exporting Datasets for use in other software tools
5. Quick Plotting

We will use data from the **30m Dissolved Oxygen** sensor on the **[Global Irminger Sea Flanking Mooring A](https://oceanobservatories.org/site/gi03flma/)**, also known as **GI03FLMA-RIS01-03-DOSTAD000**.  You can find out more information about this instrument on the [OOI Website](https://oceanobservatories.org/instrument-class/do2/), the [OOI Data Portal](https://ooinet.oceanobservatories.org/data_access/?search=GI03FLMA-RIS01-03-DOSTAD000), or on the new [Rutgers OOI Data Review portal](https://datareview.marine.rutgers.edu/instruments/view/GI03FLMA-RIS01-03-DOSTAD000).

<img src="https://oceanobservatories.org/wp-content/uploads/2015/09/CEV-OOI-Global-Irminger-Sea.jpg" alt="OOI Irminger Sea Array" width="600px">

## 3. Loading Data

Okay, so let's start coding... 

The first thing we need to do is load several Python libraries that will help us load and work with the NetCDF files.  Google Colaboratory comes with many libraries already installed, like [numpy](https://www.numpy.org), [pandas](https://pandas.pydata.org) and [matplotlib](https://matplotlib.org).   But we need to load [xarray](http://xarray.pydata.org/en/stable/) and the netcdf4 library so we can load data files from the OOI.

In [0]:
import xarray as xr
!pip install netcdf4==1.5.0

### Loading a single NetCDF data file
When you request data from the OOI Data Portal, you often get quite a few of files in the output directory.  

For the purposes of this activity, we will load only one **NetCDF data file** to make this example easy.  We will cover how to request data from the OOI Data Portal later.  Please refer to the *Accessing OOI Data Reference Guide* for more information on how to find the right link to use.

Let's add the URL to our datafile here as a variable.

In [0]:
single_file = 'https://opendap.oceanobservatories.org/thredds/dodsC/ooi/sage-marine-rutgers/20190307T155319-GI03FLMA-RIS01-03-DOSTAD000-recovered_host-dosta_abcdjm_sio_instrument_recovered/deployment0001_GI03FLMA-RIS01-03-DOSTAD000-recovered_host-dosta_abcdjm_sio_instrument_recovered_20140912T201501-20150818T103001.nc'

We can now easily load this file using xarray's `open_dataset()` function

In [0]:
# Load the data files
ds = xr.open_dataset(single_file)

Remember, python notebooks will always print out the last line of a code block if just a variable is specified.

In [0]:
# Let's find out what's in the file
ds

As you can see, there are a number of coordinates, variables and attributes (i.e. metadata) in the dataset we loaded.  

**Important note:** By default, OOI datasets use the 'obs' variable as the index.  But obs is just an array of integers (i.e. 1,2,3,etc.) Time is more convenient for our purposes, so let's make time the default dimension.  In general, this is something you will always want to do when first loading OOI datasets.

In [0]:
# Swap the dimensions
ds = ds.swap_dims({'obs': 'time'})

### Selecting Variables and Metadata
Thanks to xarray, we can easily access individual variables, the global metadata, and the metadata for individual variables. There are actually two ways you can do this...
* `ds['dissolved_oxygen']`  <-- The preferred way (fewer errors)
* `ds.dissolved_oxygen`

You can refer to the full list of variables an attributes outputted above.

Here are a few examples.

In [0]:
# A variable
ds['dissolved_oxygen']

In [0]:
# The units metadata field for a variable
ds['dissolved_oxygen'].units

In [0]:
# A global metadata field
ds.source

In the next box, try accessing other variables or metadata fields.  Remember, you can use tab-complete to more easily find available items in the dataset.

In [0]:
# Add your code here

## 4. Exporting Data

* Xarray **Datasets** are great for loading and exporting NetCDF data, which are often multi-dimensional
* Pandas **DataFrames** are great for doing the same with CSV datasets.  Think of an Excel spreadsheet containing columns of data, each with a header.

So, to export our dataset so we can use it in another tool, we will first convert our xarray Dataset into a pandas DataFrame.

In [0]:
# Convert the xarray Dataset to a pandas DataFrame
df = ds.to_dataframe()

Let's take a look at the new variable we created to see how a Pandas DataFrame is different from the Xarray Dataset we loaded above.  

To do so, we could just print out the variable *df*, but Pandas also has a nice function called `head()` that just gives us the first 5 rows.

In [0]:
df.head()

So now we can use the .to_csv() method to easily create a CSV file.  Once it's created, in Google Colab, you can view and download files in the left sidebar.


In [0]:
# Create a CSV file with the raw dataset
df.to_csv('output.csv') 

## 5. Quick Plots

And now we can really start having some fun... 

To start, we can use the built in matplotlib plotting routines in xarray or pandas to make some plots.

In [0]:
df['ctdmo_seawater_temperature'].plot();

In the next box, try plotting another variable.

In [0]:
# Add your code here

### We can also make a quick histogram

In [0]:
ds['ctdmo_seawater_temperature'].plot.hist(bins=100);

In [0]:
# Try another histogram here

### And we can plot a bunch of variables at once

To do this, we also need to load the matplotlib library directly (even though it's already included in xarray) so we can create subplots.

In [0]:
import matplotlib.pyplot as plt

In [0]:
fig, (ax1,ax2,ax3) = plt.subplots(3,1, sharex=True, figsize=(10,6))
df['ctdmo_seawater_temperature'].plot(ax=ax1)
df['practical_salinity'].plot(ax=ax2)
df['dissolved_oxygen'].plot(ax=ax3);

Now it's your turn.  Try recreating the above plot, only this time, add a 4th subplot that includes the pressure variable.  (Note, it's not called pressure.)

In [0]:
# Add your code here

### Going further..

Modify your code above to include the following:

1) You can add y-axis labels using the following line
```
ax2.set_ylabel(df['dissolved_oxygen'].name);

```

2) The salinity has some outliners.  We can change the y-limits to account for that, by adding the following line.
```
ax2.set_ylim(**min**,**max**);

```

3) You can also change the plot syle.  Try adding the following attributes to the plot functions.
```
linestyle='None', marker='.', markersize=1

```


You should notice that the average pressure is just shy of 20m, but this instrument was supposed to be at 30m.  So what's up?  

It turns out, for this deployment, the mooring was deployed 10-15m shallower that planned.  In this case, 30m was the *design depth,* but it's always a good idea to check the actual pressure measurements when possible.