# Activity 3 - Extra Credit
*Written by Sage Lichtenwalner, Rutgers University, May 31, 2019*

The example was developed for the **June 2019 OOI Ocean Data Labs Workshop**

This Python notebook includes the essential code needed to access, plot and export a dataset obtained from the OOI data portal.  It is designed as a follow up activity that explores a few different OOI datasets using the same Python code used in the previous activities.

In this notebook, we will review the following **Data Discovery** steps:
3.  Loading Data
4. Exporting Data for use in other Tools/Software
5. Quick Plots
6. Basic Statistics and Analysis

In [0]:
# Install the required libraries
import xarray as xr
!pip install netcdf4==1.5.0
import pandas as pd
import matplotlib.pyplot as plt
import datetime

## Choose a Dataset
The following data files have already been generated using the OOI data portal.  Choose one to work through the rest of the notebook by uncommenting one of the lines.  Once you have worked through one dataset, you can come back and try another one.  Or simply copy the code and add additional lines to the bottom of the notebook.

In [0]:
# Irminger Sea Dissolved Oxygen & CTD
datafile = 'https://opendap.oceanobservatories.org/thredds/dodsC/ooi/sage-marine-rutgers/20190307T155319-GI03FLMA-RIS01-03-DOSTAD000-recovered_host-dosta_abcdjm_sio_instrument_recovered/deployment0001_GI03FLMA-RIS01-03-DOSTAD000-recovered_host-dosta_abcdjm_sio_instrument_recovered_20140912T201501-20150818T103001.nc'

# Papa 30m CTD
# datafile ='https://opendap.oceanobservatories.org/thredds/dodsC/ooi/sage-marine-rutgers/20190311T035700-GP03FLMB-RIM01-02-CTDMOG060-recovered_inst-ctdmo_ghqr_instrument_recovered/deployment0001_GP03FLMB-RIM01-02-CTDMOG060-recovered_inst-ctdmo_ghqr_instrument_recovered_20130724T064501-20140617T234501.nc'

# Endurance Fluorometer
# datafile = 'https://opendap.oceanobservatories.org/thredds/dodsC/ooi/sage-marine-rutgers/20190311T023920-CE01ISSM-SBD17-06-FLORTD000-telemetered-flort_sample/deployment0010_CE01ISSM-SBD17-06-FLORTD000-telemetered-flort_sample_20180929T213119.598000-20190309T123011.918000.nc'

# Pioneer Meteolorogical Data
# datafile = 'https://opendap.oceanobservatories.org/thredds/dodsC/ooi/sage-marine-rutgers/20190311T023143-CP04OSSM-SBD11-06-METBKA000-telemetered-metbk_a_dcl_instrument/deployment0009_CP04OSSM-SBD11-06-METBKA000-telemetered-metbk_a_dcl_instrument_20181025T153105.277000-20190201T180357.291000.nc'

# Pioneer Profiling CTD (this is a big file)
# datafile = 'https://opendap.oceanobservatories.org/thredds/dodsC/ooi/sage-marine-rutgers/20190311T023451-CP04OSPM-WFP01-03-CTDPFK000-telemetered-ctdpf_ckl_wfp_instrument/deployment0010_CP04OSPM-WFP01-03-CTDPFK000-telemetered-ctdpf_ckl_wfp_instrument_20181106T180003-20190310T183237.990765.nc'


## Load and subset the data

In [0]:
# Load the data
ds = xr.open_dataset(datafile)
ds = ds.swap_dims({'obs': 'time'}) # Swap Dimensions

print('Dataset has %d points' % ds.time.size)

In [0]:
# Print a list of variables in the file
ds.data_vars

In [0]:
# Next, let's convert the xarray Dataset to a pandas DataFrame for ease of use
df = ds.to_dataframe()

In [0]:
# Alternatively, we can first select just the variables we need

# The following line is for the DO dataset.  It will have to be adapted for the other options.
# df = ds[['ctdmo_seawater_temperature','practical_salinity','dissolved_oxygen']].to_dataframe()

# We may also need to drop some additional unnecessary columns, like these for the DO dataset
# df = df.drop(columns=['obs','lon','lat'])

In [0]:
# We can also subset the data to a specific time range
# df = df.loc[datetime.date(2014,10,1):datetime.date(2014,11,1)]

In [0]:
# Output the first few rows using .head() or last few with .tail()
df.head()

## Export the Data to CSV

In [0]:
# Export the data to a CSV file
df.to_csv('output.csv') 

In [0]:
# If we have a large file, you can compress it using:
# !gzip output.csv

In [0]:
# Alternatively, you can export the daily averaged data
df.resample('D').mean().to_csv('output_daily.csv')

In [0]:
# How big are the files
!du -sh *

## Basic Statistics

In [0]:
df.describe()

In [0]:
# Add your own code here to pull out the means or standard deviations for specific variables

## Some Basic Plotting Fun
And now we can make some quick plots with the data.  Each of the following lines works with one of the corresponding data files above.  Of course, there are many other variables in each file that you can explore and plot as timeseries or scatterplots.

In [0]:
df.dissolved_oxygen.plot() # Irminger DO Sensor

# df.ctdmo_seawater_temperature.plot() #Papa CTD

# df.fluorometric_chlorophyll_a.plot() # Endruance Fluorometer

# df.sea_surface_temperature.plot() # Pioneer Met Sensor
# df.air_temperature.plot()

# df.ctdpf_ckl_seawater_temperature.plot() # Pioneer Profiling CTD

In [0]:
# Add your own code here to create some additional plots

## Advanced Plots

**Irminger DO**

The following is a more advanced plotting example that works with the DO data file.  It will need to be tweaked to work with the other files.

In [0]:
fig, (ax1,ax2,ax3,ax4) = plt.subplots(4,1, sharex=True, figsize=(8,6))
df['ctdmo_seawater_temperature'].plot(ax=ax1,linestyle='None',marker='.',markersize=1)
df['practical_salinity'].plot(ax=ax2,linestyle='None',marker='.',markersize=1)
df['int_ctd_pressure'].plot(ax=ax3,linestyle='None',marker='.',markersize=1);
df['dissolved_oxygen'].plot(ax=ax4,linestyle='None',marker='.',markersize=1);

ax1.set_ylabel('Temp')
ax2.set_ylabel('Salinity')
ax3.set_ylabel('Pressure')
ax4.set_ylabel('DO')

# Let's change the salinity y-limits to account for outliners
ax2.set_ylim(34.6,35.2);

ax1.set_title('Data from %s' % ds.subsite);

**Pioneer Profiling CTD**

The following scatterplot example is designed to plot the Profiling CTD dataset. This will take a minute, as there is a lot of data.

In [0]:
# Scatterplots of Temperature and Salinity
fig, (ax1,ax2) = plt.subplots(2,1, sharex=True, sharey=True, figsize=(10,6))

sc1 = ax1.scatter(df.index, df.ctdpf_ckl_seawater_pressure, c=df.ctdpf_ckl_seawater_temperature, cmap='RdYlBu_r', s=2)
sc2 = ax2.scatter(df.index, df.ctdpf_ckl_seawater_pressure, c=df.practical_salinity, cmap='Blues_r', s=2)

# Because the X and Y axes are shared, we only have to set limits once
ax1.invert_yaxis() # Invert y axis
ax1.set_xlim(df.index[0],df.index[-1]) # Set the time limits to match the dataset

cbar = fig.colorbar(sc1, ax=ax1, orientation='vertical')
cbar.ax.set_ylabel('Temperature')
cbar = fig.colorbar(sc2, ax=ax2, orientation='vertical')
cbar.ax.set_ylabel('Salinity')

ax1.set_ylabel('Pressure (dbar)')
ax2.set_ylabel('Pressure (dbar)')

fig.suptitle('Data from %s' % ds.subsite)
fig.autofmt_xdate()
fig.subplots_adjust(top=0.95);
