## Ocean Biogeochemical Dynamics Lab, Spring 2021
Introduction to Python and Jupyter Notebooks by Nancy Williams

This code covers importing, cleaning, and plotting data from a single SOCCOM float in the Southern Ocean. 

SOCCOM website: https://soccom.princeton.edu/

# Import toolboxes
Always start by importing the tools you will need.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.path as mpath
import seaborn as sns # this will change the look of pandas plots, too
import cartopy.crs as ccrs
import cartopy.feature
import seawater
import xarray as xr
import os
%matplotlib inline 
%config InlineBackend.figure_format = 'retina'
plt.rcParams['figure.figsize'] = (15,9)
plt.rcParams['font.size'] = 18
#plt.rcParams['lines.linewidth'] = 3
# this forces matplotlib to print figures out here when you make plots
from IPython.display import Image

# Press Shift + Enter to "run" this cell and move on to the next one 


In [None]:
# Define the directory where you want the figures to be saved
output_dir = 'generated/'

In [None]:
Image(url='https://www3.mbari.org/soccom/images/SOOCNMAP.jpg', width=800)

That's a map of all of the SOCCOM floats!

In [None]:
# This is how you make a comment!
# Always annotate your code so you know what you did and why
# I promise you won't remember the details when it comes time to write up the results

# Ideally, you are also using some kind of version control like github
# version control allows you to track changes in your code, revert back if you need to,
# or even branch a piece of code off into two independent versions.
# Also great for collaborative projects

Start a new "cell" when you transition to a new step in your code. This allows you to run your code in chunks and better troubleshoot where issues might be. Press Option + Enter to add run and an empty cell below this one
# Download the SOCCOM float data snapshot
This code imports a SOCCOM biogeochemical Argo Float "snapshot" dataset from December 2020 for one float at a time. A snapshot means that the data have been archived with a "doi" or digital object identifier. It is frozen in time and is citable, which is important for reproducibiity. When using this dataset for science and publication, it will also be important to document the steps you take when cleaning, reformatting, renaming, changing units, and any calculations you do. 

The SOCCOM floats measure pH and other carbonate system parameters are estimated by combining float-measured pH with an estimate for alkalinity based on empirical relationships and shipboard bottle samples. There are several options for this alkalinity product (LIAR, MLR, and CANYON). Here we will use LIAR (Locally Interpolated Alkalinity Regression of Carter et al. (2018) http://doi.wiley.com/10.1002/lom3.10232). 

The files are available in both low resolution and high resolution. The core physical sensors (temperature, salinity, and pressure) measure at a higher depth resolution than the BGC sensors. If you choose low-resolution you lose this high resolution physical data. If you choose high resolution you get the full resolution of physical data and the depths which have no BGC data are empty. For the purposes of this notebook, we will use the low-resolution file because it doesn't take up as much space.

There are also both raw and QC files available. The QC (quality control) methods are quite mature at this point and so it is best to use the QC'ed data. If you are working on a project that focuses on sensor QC or you wish to do your own QC, then you may need to download the raw dataset.

The entire .zip file with all floats can be downloaded here: 
https://library.ucsd.edu/dc/object/bb94601812 as the "LIAR Low resolution ODV format." I've already downloaded and unzipped that folder in the current working directory for this notebook and it's called `SOCCOM_LoResQC_LIAR_22Dec2020_odvtxt`. Go look at it. What's inside?

# Importing the dataset
We want to use Pandas' built-in read_csv function to import a single float data file into a pandas data frame called `flt`. Float 9254 is a good example float, but you can pick any float from the snapshot. To pick another float you can go to the SOCCOM web page sensor status table http://soccom.ucsd.edu/floats/SOCCOM_sensor_stats.html and choose a float with lots of good data (i.e., more in the "#g" column than the "#b" columns for variables you're interested in analyzing. You can sort the columns on that webpage by clicking on the column header by which you wish to sort.

In [None]:
# Pick a float
floatnum='9254' 
# you can change this number to look at a different float and it will update throughout the file
# A value in '' is a string type, not a number to be used in calculations.
# A string can also contain letters and symbols

floatpath='SOCCOM_LoResQC_LIAR_22Dec2020_odvtxt/' # This is the folder where all the float data live
floatsuffix='SOOCNQC.TXT' # all of the Southern Ocean floats have the same suffix
flt=pd.read_csv(floatpath+floatnum+floatsuffix, error_bad_lines=False) # concatenates or links together strings
# There are a bunch of other input options for this read_csv function, and you can see them by
# pressing "tab" inside the parenthesis following the function name.

In [None]:
# Python doesn't typically spit out the results unless you ask for them.
# This is how you look at just the "head" the flt dataframe
flt.head()

In [None]:
# This is how you look at the whole flt dataframe
# It's a big dataframe so this will show you just the head and the tail
flt

## Comment lines in data files
Clearly something is wrong. We didn't get any meaningful data! Why? Because those are comment lines in the top of the data file.  Let's try telling read_cvs what a comment looks like. We see from the file that the comment lines start with '//'.

In [None]:
flt=pd.read_csv(floatpath+floatnum+floatsuffix, error_bad_lines=False, comment='/')
# There are a bunch of other input options for this function, and you can see them by
# typing a comma and then pressing "tab" from inside the function parentheses
# Run this new read_csv function and look at the header of the new flt dataframe
# By running this code you overwrite your last flt dataframe
flt.head()

## Delimiters
We are getting warmer, we now see a more meaningful header and some data, but what are those "\t"s? Those are tab delimiters or separators. "CSV" stands for Comma Separated Values and "TSV" stands for Tab Separated Values. The files typically look identical and the delimiters are often invisible when viewed from excel or a text editor (but you should never use excel to edit text files! Excel can change the format when you save and mess up your code). 

So now we need to tell read_csv what the delimiter is:

In [None]:
flt=pd.read_csv(floatpath+floatnum+floatsuffix, error_bad_lines=False, 
                comment='/', delimiter='\t')
# If your line of code is getting long and you're inside a parentheses, simply press 
# enter in the code to continue your code onto a new line.
# Look at the head
flt.head()

In [None]:
# That looks better, now let's look at the info for the file to see more:
flt.info()

We are getting many rows of data, but only four columns. Why? 

Because the "comment" character used in these float data files is two forward slashes "//". Unfortunately, Pandas read_csv can only handle one character in the comment field. Because we entered '/' on any line as the comment character, we also lose everything after any '/'. In this case, we lose everything after "mon" in the header row and in all data rows. When creating a data file like this .txt file, you should never use your delimiter/separator character anywhere else in your data file. To work around this, we will first open the file and replace all instances of '//' with '#'. (I checked to make sure "#" isn't used in the actual data anywhere)

In [None]:
# input file
fin = open(floatpath+floatnum+floatsuffix,'rt',encoding='UTF-8')

# output file to which we will write the result
fout = open('fltrem.txt','wt')

# this is a for loop
# for each line in the input file
for line in fin:
    # red and replace the string and write to the output file
    fout.write(line.replace('//','#'))
# close the files
fin.close()
fout.close()

## NaN values
NaN (Not a Number) values are used to fill in data files where there is either no data or there is bad data. Instead of NaN, SOCCOM used an absurdly small number as their fill value, a number which is not zero but is so small that you would never get the value from a sensor. You could also use an absurdly large number. We want to replace that fill value with NaN, which python is better equipped to handle. If you don't do this, you can mess up your data because you've got a ton of near-zero values in your calculations.

In [None]:
flt=pd.read_csv('fltrem.txt',delimiter='\t',comment='#',na_values=-1E10)
# Now I've also added a term to tell read_csv what the "not a number" value is in the file

In [None]:
#take a look at the info for the flt dataframe you have made
flt.info()

In [None]:
# look at the head of the data frame
flt.head()

## Dates
Notice that the date format is in a text string, and we will want it in a number or "datetime" format. We can use a pandas function to_datetime to do this conversion and the new variable 'date' is appended to the end of the pandas dataframe.

In [None]:
flt['date']=pd.to_datetime(flt['mon/day/yr']+' '+ flt['hh:mm'])
flt.info()

# Mapping your data
It's always a good idea to map your data and make sure it is where you think it is. Here we will use cartopy (basemap is deprecated).

Since we're talking about the Southern Ocean and there are stark fronts, it's good to plot your data in relation to these fronts. The climatological locations of the fronts are available from Orsi et al. (1995) https://www.sciencedirect.com/science/article/pii/096706379500021W. Text files containing the locations of the fronts are located in the "fronts" folder.

### Where is your float in relation to the fronts of the ACC?
The following code imports the longitudes and latitudes of the five fronts. There are some '%' values in the files which creates breaks in the fronts. If we did not keep these breaks, the fronts would plot across continents.

In [None]:
stf=pd.read_csv('fronts/stf.txt',header=None,sep='\s+',na_values='%', names=['lon','lat'])
saf=pd.read_csv('fronts/saf.txt',header=None,sep='\s+',na_values='%', names=['lon','lat'])
pf=pd.read_csv('fronts/pf.txt',header=None,sep='\s+',na_values='%', names=['lon','lat'])
saccf=pd.read_csv('fronts/saccf.txt',header=None,sep='\s+',na_values='%', names=['lon','lat'])
sbdy=pd.read_csv('fronts/sbdy.txt',header=None,sep='\s+',na_values='%', names=['lon','lat'])
# Check one of the fronts and make sure it looks as you thinkg it should
pf

### Cartopy
The following is an example of a South Polar Stereographic map using Cartopy https://scitools.org.uk/cartopy/docs/latest/#. Polar stereographic maps are always a bit more complicated because you have to translate your coordinates to polar coordinates.

In [None]:
plt.figure(figsize=(6, 6))
ax = plt.axes(projection=ccrs.SouthPolarStereo())
ax.set_extent([-180,180,-90,-30],ccrs.PlateCarree())
ax.add_feature(cartopy.feature.LAND)
ax.add_feature(cartopy.feature.OCEAN)
ax.gridlines()

# Compute a circle in axes coordinates, which we can use as a boundary
# for the map. We can pan/zoom as much as we like - the boundary will be
# permanently circular.
theta = np.linspace(0, 2*np.pi, 100)
center, radius = [0.5, 0.5], 0.5
verts = np.vstack([np.sin(theta), np.cos(theta)]).T
circle = mpath.Path(verts * radius + center)

ax.set_boundary(circle, transform=ax.transAxes)
plt.plot(stf['lon'],stf['lat'],color='Red',transform=ccrs.PlateCarree())
plt.plot(saf['lon'],saf['lat'],color='Orange',transform=ccrs.PlateCarree())
plt.plot(pf['lon'],pf['lat'],color='Yellow',transform=ccrs.PlateCarree())
plt.plot(saccf['lon'],saccf['lat'],color='Green',transform=ccrs.PlateCarree())
plt.plot(sbdy['lon'],sbdy['lat'],color='Blue',transform=ccrs.PlateCarree())

plt.scatter(flt['Lon [°E]'],flt['Lat [°N]'],color='Black',transform=ccrs.PlateCarree(), s=1)
plt.savefig(output_dir+'F'+floatnum+'map.png') 
plt.savefig(output_dir+'F'+floatnum+'map.jpg') # Changing the suffix will change the format
plt.show()


# Plotting profiles

Now, let's make a quick plot of temperature versus pressure. This method is quick and dirty but doesn't give us much control over the figure

In [None]:
plt.plot(flt['Temperature[°C]'],flt['Pressure[dbar]'])

In [None]:
# Something funny? 
# We want to invert the axis and add some labels

# Now we will use the object-oriented programming to have more control over the plot
fig = plt.figure()
# this allows you to create multiple axes
axes1= fig.add_axes([0, 0, 1, 1])
axes1.plot(flt['Temperature[°C]'],flt['Pressure[dbar]'])
axes1.set_title('Float '+floatnum+' Temperature')
axes1.invert_yaxis()
axes1.set_xlabel('Temperature [°C]')
axes1.set_ylabel('Pressure [dbar]')
# if you wanted to add a subplot you would add it like this
#axes2= fig.add_axes([.7, .7, .2, .2])

In [None]:
# Can also use subplots function
fig,axes = plt.subplots(nrows = 1, ncols = 2)
# if you have many subplots and some overlap, use tight_layout, or you can leave it 
# at the end of all of your plot statements
# Or you can add bbox_inches='tight' to your print statements

axes[0].plot(flt['Temperature[°C]'],flt['Pressure[dbar]'])
#axes[0].set_title('Temperature')
axes[0].invert_yaxis()
axes[0].set_ylabel('Pressure [dbar]')
axes[0].set_xlabel('Temperature [°C]')

axes[1].plot(flt['Salinity[pss]'],flt['Pressure[dbar]'])
#axes[1].set_title('Salinity')
axes[1].invert_yaxis()
axes[1].set_ylabel('Pressure [dbar]')
axes[1].set_xlabel('Salinity [pss]')
axes[0].set_title('Float '+floatnum)
axes[1].set_title('Float '+floatnum)
# This is the first figure we're saving. We have given it a name, a type, and a dpi or
# dots per inch which is resolution
fig.savefig(output_dir+'F'+floatnum+'TS.png', dpi = 200, bbox_inches='tight')

In [None]:
# Now let's plot multiple things on one axis
# Pick your variables:
var='DIC_LIAR[µmol/kg]'
var2='TALK_LIAR[µmol/kg]'
fig = plt.figure()
ax = fig.add_axes([0.1, 0.1, .8, .8])
ax.plot(flt[var],flt['Pressure[dbar]'],label = var, color = 'red')
ax.plot(flt[var2],flt['Pressure[dbar]'],label = var2, color = 'blue') 
# for color you can also put in an RGB hex code
ax.legend(loc=0) # 0 is for the "best" location
ax.set_title('Float '+floatnum)# Figure out how to have this be dynamic and change with the float number
ax.invert_yaxis()
fig.savefig(output_dir+'F'+floatnum+var[0:3]+var2[0:3]+'.png', dpi = 200, bbox_inches='tight')
flt.info()

## Quality Flags
Clearly there are some bad data. Notice the QF or Quality Flag columns in the flt dataframe. These tell us which data are good, questionable, and bad. We only want to use good data. How can we tell which data are good?

Go back to the original text file and read the comments at the top of the file. Good data are flagged with zeros, and we should remove bad and questionable data which are flagged 4 and 3, respectively, and replace with NaN.

In [None]:
# Loop through all QF columns and apply them to the preceeding columns
from re import search
for column in range(len(flt.columns)):
    name=flt.columns[column]
    if search('QF',name): # if the column is a QF column, apply it to the preceeding column, otherwise go on to next column
        var=flt.columns[column-1]
        flt[var] = np.where(flt.iloc[:,flt.columns.get_loc(var)+1] == 0,flt[var], np.nan)

Now make the same plot again and see if the weird data are gone:

In [None]:
var='DIC_LIAR[µmol/kg]'
var2='TALK_LIAR[µmol/kg]'
fig = plt.figure()
ax = fig.add_axes([0.1, 0.1, .8, .8])
ax.plot(flt[var],flt['Pressure[dbar]'],label = var, color = 'red')
ax.plot(flt[var2],flt['Pressure[dbar]'],label = var2, color = 'blue') 
# for color you can also put in an RGB hex code
ax.legend(loc=0) # 0 is for the "best" location
ax.set_title('Float '+floatnum)# Figure out how to have this be dynamic and change with the float number
ax.invert_yaxis()
fig.savefig(output_dir+'F'+floatnum+var[0:3]+var2[0:3]+'.png', dpi = 200)

Yes, they are!

### Note: Careful, if you are using the High-Resolution dataset and you use "plot" for BGC variables you will get a HUGE gap in the mid-water column because the BGC data are not high resolution. We need to use 'scatter' for the BGC data in the High-Resolution dataset.

In [None]:
var='pHinsitu[Total]'
fig = plt.figure()
axes1= fig.add_axes([0.1, 0.1, .8, .8])
axes1.plot(flt[var],flt['Pressure[dbar]'],label = var,color='purple')
axes1.legend()
axes1.set_title('Float '+floatnum)
axes1.invert_yaxis()
fig.savefig(output_dir+'F'+floatnum+var[0:3]+'.png', dpi = 200)

In [None]:
# is nitrate there?
var='Nitrate[µmol/kg]'
fig = plt.figure()
axes1= fig.add_axes([0.1, 0.1, .8, .8])
axes1.plot(flt[var],flt['Pressure[dbar]'],label = var,color='green')
axes1.legend()
axes1.set_title('Float '+floatnum)
axes1.invert_yaxis()
fig.savefig(output_dir+'F'+floatnum+var[0:3]+'QC.png', dpi = 200)

In [None]:
# is oxygen there?
var='Oxygen[µmol/kg]'
fig = plt.figure()
axes1= fig.add_axes([0.1, 0.1, .8, .8])
axes1.plot(flt[var],flt['Pressure[dbar]'],label = var,color='teal')
axes1.legend()
axes1.set_title('Float '+floatnum)
axes1.invert_yaxis()
fig.savefig(output_dir+'F'+floatnum+var[0:3]+'QC.png', dpi = 200)

In [None]:
# Next look at oxygen over time
var='Oxygen[µmol/kg]'
fig = plt.figure(num=None, figsize=(16, 3), dpi=80, facecolor='w', edgecolor='k')
ax = fig.add_axes([0.1, 0.1, .8, .8])
sc=ax.scatter(flt['date'],flt['Depth[m]'],c=flt[var],cmap = 'magma')
ax.invert_yaxis()
ax.set_title('Float '+floatnum)
cb=plt.colorbar(sc)
cb.set_label(var)
fig.savefig(output_dir+'F'+floatnum+var[0:3]+'section.png', dpi = 200,bbox_inches='tight')

## Mixed Layer Depth
It would be helpful to know where the mixed layer is when we're thinking about surface ocean seasonality. Typically, mixed layer is calculated by looking at the density relative to the surface. As you move down the water column, the density increases and you can choose a threshold above which you are no longer in the mixed layer. This is typically 0.03 kg/m3 (https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2004JC002378). First let's look at density. Sigma theta is actually the density minus 1000 kg/m3. The true density is closer to 1027 kg/m3.

In [None]:
flt['Sigma_theta[kg/m^3]']

In [None]:
# Start with station 1 as a test. Find the surface density (usually the minimum, but not always- fix later)
station=1
surfacedens=flt['Sigma_theta[kg/m^3]'].loc[(flt['Station']==station)].min()
surfacedens

In [None]:
# Find where in that station the density is greater than 0.03 greater than the surface density
# the shallowest of those depths is the mixed layer depth
MLD=flt['Depth[m]'].loc[(flt['Station']==station)&(flt['Sigma_theta[kg/m^3]']-surfacedens>0.03)].min()
MLD

In [None]:
# Calculate MLD for each station
MLD=[]
for station in flt['Station'].unique():
    surfacedens=flt['Sigma_theta[kg/m^3]'].loc[(flt['Station']==station)].min()
    MLD.append([station,flt['date'].loc[(flt['Station']==station)&(flt['Sigma_theta[kg/m^3]']-surfacedens>0.03)].min(),
                flt['Depth[m]'].loc[(flt['Station']==station)&(flt['Sigma_theta[kg/m^3]']-surfacedens>0.03)].min(),
              flt['Lon [°E]'].loc[(flt['Station']==station)].mean(),
              flt['Lat [°N]'].loc[(flt['Station']==station)].mean()])

# Take a look at MLD. First column is the Station, second column is datenum, second column is the MLD
MLD = pd.DataFrame(data=MLD, columns=['Station', 'date', 'MLD','Lon [°E]','Lat [°N]'])
print(MLD)

In [None]:
# Density in the mixed layer
# First, set the depth to which you wish to plot and keep it the same for subsequent plots
depth=MLD['MLD'].max()+50 #plot to depth of mixed layer plus some number of m

var='Sigma_theta[kg/m^3]'
flt[var] = np.where(flt.iloc[:,flt.columns.get_loc(var)+1] == 0,flt[var], np.nan)
fig = plt.figure(num=None, figsize=(16,3), dpi=80, facecolor='w', edgecolor='k')
ax = fig.add_axes([0.1, 0.1, .8, .8])
sc=ax.scatter(flt['date'],flt['Depth[m]'],c=flt[var],cmap = 'Reds')
ax.plot(MLD['date'],MLD['MLD'],c='magenta')
ax.invert_yaxis()
ax.set_title('Float '+floatnum)
ax.set_ylim([depth,0])
cb=plt.colorbar(sc)
cb.set_label(var)
# automatically adjusts the colorbar based on the range of values youre plotting
sc.set_clim(vmin = flt[var].loc[(flt['Depth[m]']<depth)].min(), 
            vmax = flt[var].loc[(flt['Depth[m]']<depth)].max()) 
fig.savefig(output_dir+'F'+floatnum+var[0:3]+'section.png', dpi = 200,bbox_inches='tight')

## Mixed layer climatology
The Holte and Talley climatology for mixed layer depth is informed by Argo profiles and it lives at http://mixedlayer.ucsd.edu/. It would be interesting to see how the mixed layer we calculated/observed from the float compares to the climatological mixed layer depth near the float location. This code section is adapted from Ryan Abernathy's Intro to Physical Oceanography notebook at https://nbviewer.jupyter.org/github/rabernat/intro_to_physical_oceanography/blob/master/lectures/03_air_sea_exchange.ipynb.


In [None]:
os.getcwd()

In [None]:
os.chdir("generated") # Change the directory to the `generated` folder
os.getcwd() # Check that you're now in the `generated` folder

In [None]:
# Curl downloads the mixed layer depths file directly from the web and shows you the status while it works. 
# `!` at the beginning of the line tells you that this command is a unix shell command (not python code)
! curl -o Argo_mixedlayers_monthlyclim_12112019.nc http://mixedlayer.ucsd.edu/data/Argo_mixedlayers_monthlyclim_12112019.nc

In [None]:
# Using xarray to open the MLD dataset
MLDclim = xr.open_dataset('Argo_mixedlayers_monthlyclim_12112019.nc')
os.chdir("..") # Use ".." to move back up one directory now that we've imported the MLD climatology data
MLDclim

In [None]:
# July
fig, ax = plt.subplots()
im = MLDclim.mld_da_mean[:,:,6].plot.imshow(extent=[MLDclim.lon[0], MLDclim.lon[-1],
                                                    MLDclim.lat[-1], MLDclim.lat[0]], yincrease=True)
im.set_clim([0,500])
ax.set_title('July Climatological MLD from Holte et al. (2007)')
plt.savefig(output_dir+'JulyClimatologicalMLD.png', dpi = 200,bbox_inches='tight')
plt.close()
fig
# Clean up the axes later. these are the indices, not the lats and lons themselves

## Look at biogeochemical section plots

In [None]:
# Next look at pCO2 over time
var='pCO2_LIAR[µatm]'
flt[var] = np.where(flt.iloc[:,flt.columns.get_loc(var)+1] == 0,flt[var], np.nan)
fig = plt.figure(num=None, figsize=(16,3), dpi=80, facecolor='w', edgecolor='k')
ax = fig.add_axes([0.1, 0.1, .8, .8])
sc=ax.scatter(flt['date'],flt['Depth[m]'],c=flt[var],cmap = 'Reds')
ax.plot(MLD['date'],MLD['MLD'],c='magenta')
ax.invert_yaxis()
ax.set_title('Float '+floatnum)
ax.set_ylim([depth,0])
cb=plt.colorbar(sc)
cb.set_label(var)
# automatically adjusts the colorbar based on the range of values youre plotting
sc.set_clim(vmin = flt[var].loc[(flt['Depth[m]']<depth)].min(), 
            vmax = flt[var].loc[(flt['Depth[m]']<depth)].max()) 
fig.savefig(output_dir+'F'+floatnum+var[0:3]+'section.png', dpi = 200, bbox_inches='tight')

In [None]:
# Next look at Nitrate over time
var='Nitrate[µmol/kg]'
fig = plt.figure(num=None, figsize=(16,3), dpi=80, facecolor='w', edgecolor='k')
ax = fig.add_axes([0.1, 0.1, .8, .8])
sc=ax.scatter(flt['date'],flt['Depth[m]'],c=flt[var],cmap = 'Purples')
ax.plot(MLD['date'],MLD['MLD'],c='magenta')
ax.invert_yaxis()
ax.set_title('Float '+floatnum)
ax.set_ylim([depth,0])
cb=plt.colorbar(sc)
cb.set_label(var)
# automatically adjusts the colorbar based on the range of values youre plotting
sc.set_clim(vmin = flt[var].loc[(flt['Depth[m]']<depth)].min(), 
            vmax = flt[var].loc[(flt['Depth[m]']<depth)].max()) 
fig.savefig(output_dir+'F'+floatnum+var[0:3]+'section.png', dpi = 200, bbox_inches='tight')

In [None]:
# Next look at Chlorophyll over time 
var='Chl_a_corr[mg/m^3]'
fig = plt.figure(num=None, figsize=(16,3), dpi=80, facecolor='w', edgecolor='k')
ax = fig.add_axes([0.1, 0.1, .8, .8])
sc=ax.scatter(flt['date'],flt['Depth[m]'],c=np.log(flt[var]),cmap = 'Greens') #Log scale
ax.plot(MLD['date'],MLD['MLD'],c='magenta')
ax.invert_yaxis()
ax.set_title('Float '+floatnum)
ax.set_ylim([depth,0])
cb=plt.colorbar(sc)
cb.set_label('log('+var+')')
fig.savefig(output_dir+'F'+floatnum+var[0:3]+'section.png', dpi = 200, bbox_inches='tight')

## Make a time series plot of average top 30 m observations
Here we use the pandas groupby function to group the near-surface data by station.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

In [None]:
# Need to work on this loop to use actual MLD instead of 30 m for averaging
#fltSurf=pd.DataFrame()
#for station in MLD['Station'].unique():
#    stationMLD=MLD['MLD'].loc[(MLD['Station']==station)]
#    mask=flt.loc[(flt['Station']==station),(flt['Pressure[dbar]']<stationMLD)]
#    fltSurf=fltSurf.append(flt[mask])

In [None]:
fltSurf=flt.loc[(flt['Pressure[dbar]']<30)]
fltSurfByStn=fltSurf.groupby('Station').mean()
fltSurfByStn

In [None]:
# Groupby doesn't work on datetime column so it was dropped. We need to make a datetime array to be used with plotting
fltdates=[]
for station in flt['Station'].unique():
    fltdates.append([flt['date'].loc[(flt['Station']==station)].min()])
len(fltdates)

In [None]:
# Unfortunately, sometimes the fltdates is 1 row longer than the groupby file. 
# Not sure how/why this happened but for now, we will just assume that 
# there is an extra date somewhere and drop the final date.
# Need to fix this later. It might lead to a 10-day error in the dates
if len(fltdates)>len(fltSurfByStn):
    fltdates.pop()# pop "pops off" the last value

In [None]:
# Make a big plot with subplots
fig,axes = plt.subplots(nrows = 7, ncols = 1,figsize=(15,25))
import matplotlib.dates as mdates

years = mdates.YearLocator()   # every year
months = mdates.MonthLocator()  # every month
years_fmt = mdates.DateFormatter('%Y')

var='Temperature[°C]'
axes[0].plot(fltdates,fltSurfByStn[var])
axes[0].set_ylabel(var)
axes[0].set_xlim(fltdates[0],fltdates[-1])
# format the ticks
axes[0].xaxis.set_major_locator(years)
axes[0].xaxis.set_major_formatter(years_fmt)
axes[0].xaxis.set_minor_locator(months)

var='Salinity[pss]'
axes[1].plot(fltdates,fltSurfByStn[var])
axes[1].set_ylabel(var)
axes[1].set_xlim(fltdates[0],fltdates[-1])
axes[1].xaxis.set_major_locator(years)
axes[1].xaxis.set_major_formatter(years_fmt)
axes[1].xaxis.set_minor_locator(months)

var='Chl_a_corr[mg/m^3]'
axes[2].plot(fltdates,np.log(fltSurfByStn[var]))
axes[2].set_ylabel('log('+var+')')
axes[2].set_xlim(fltdates[0],fltdates[-1])
axes[2].xaxis.set_major_locator(years)
axes[2].xaxis.set_major_formatter(years_fmt)
axes[2].xaxis.set_minor_locator(months)

var='OxygenSat[%]'
axes[3].plot(fltdates,fltSurfByStn[var])
axes[3].set_ylabel(var)
axes[3].set_xlim(fltdates[0],fltdates[-1])
axes[3].xaxis.set_major_locator(years)
axes[3].xaxis.set_major_formatter(years_fmt)
axes[3].xaxis.set_minor_locator(months)

var='Nitrate[µmol/kg]'
axes[4].plot(fltdates,fltSurfByStn[var])
axes[4].set_ylabel(var)
axes[4].set_xlim(fltdates[0],fltdates[-1])
axes[4].xaxis.set_major_locator(years)
axes[4].xaxis.set_major_formatter(years_fmt)
axes[4].xaxis.set_minor_locator(months)

var='pCO2_LIAR[µatm]'
axes[5].plot(fltdates,fltSurfByStn[var])
axes[5].set_ylabel(var)
axes[5].set_xlim(fltdates[0],fltdates[-1])
axes[5].xaxis.set_major_locator(years)
axes[5].xaxis.set_major_formatter(years_fmt)
axes[5].xaxis.set_minor_locator(months)

var='MLD'
axes[6].plot(MLD['date'],MLD['MLD'])
axes[6].set_ylabel(var)
axes[6].set_xlim(fltdates[0],fltdates[-1])
axes[6].xaxis.set_major_locator(years)
axes[6].xaxis.set_major_formatter(years_fmt)
axes[6].xaxis.set_minor_locator(months)

axes[0].set_title('Float '+floatnum)
# This is the first figure we're saving. We have given it a name, a type, and a dpi or
# dots per inch which is resolution
fig.savefig(output_dir+'F'+floatnum+'TS.png', dpi = 200, bbox_inches='tight')

## Seawater Toolbox
The seawater toolbox contains many functions useful for oceanographic data analysis. To see a list of what's there type `help(seawater)`. Let's calculate the freezing point of seawater over the float's lifetime. You could also calculate the oxygen saturation concentration. 

In [None]:
seawater.fp(fltSurfByStn['Salinity[pss]'],fltSurfByStn['Pressure[dbar]'])

## Other miscellaneous plotting tools
In the following section I will demonstrate a few other neat plotting tools from the Seaborn toolbox: https://seaborn.pydata.org/#. 

For *this* specific dataset, using Seaborn doesn't always give us something oceanographically meaningful, but I want you to know that they exist.

The first example is using distplot: https://seaborn.pydata.org/generated/seaborn.distplot.html?highlight=distplot#seaborn.distplot

In [None]:
var='Temperature[°C]'
var='Salinity[pss]'

sns.distplot(fltSurfByStn['Temperature[°C]'],kde='false')

The next couple of examples uses a jointplot: https://seaborn.pydata.org/generated/seaborn.jointplot.html

In [None]:
sns.jointplot(x='Temperature[°C]',y='Salinity[pss]',data=fltSurfByStn,kind='hist')

In [None]:
sns.jointplot(x='Temperature[°C]',y='Salinity[pss]',data=fltSurfByStn,kind='reg')

The next example wants to plot an entire dataframe. Our flt dataframe is too big, so I'm making a smaller dataframe called fltsmall with just the variables we want to plot.

In [None]:
fltSurfByStnsmall=fltSurfByStn[['Chl_a_corr[mg/m^3]','Nitrate[µmol/kg]','Oxygen[µmol/kg]','pCO2_LIAR[µatm]','Temperature[°C]','Depth[m]']].copy()

In [None]:
fltSurfByStnsmall

In [None]:
sns.pairplot(fltSurfByStnsmall)
# This will take some time. Notice the "*" that appears to the upper left while the cell runs
# If something is taking longer to run than you think it should, that's called "hanging" and
# It may be due to an error. You can quit that cell by going up to "Kernel" in the menu bar and 
# clicking "interrupt"

In [None]:
# You can also use the following code to make a pivot table with a subsample of the larger dataframe 
# to be used with seaborn grid plots
# a=flt.pivot_table(index='Pressure[dbar]',columns='Station',values='Temperature[°C]')

## Regression plot
https://seaborn.pydata.org/generated/seaborn.lmplot.htmls

What is the relationship between TALK and S? Do you think that Alkalinity can be estimated from just salinity?

In [None]:
sns.lmplot(x='Salinity[pss]',y='TALK_LIAR[µmol/kg]',data=fltSurfByStn)

How about dissolved inorganic carbon (DIC)?

In [None]:
sns.lmplot(x='Salinity[pss]',y='DIC_LIAR[µmol/kg]',data=fltSurfByStn) #Seaborn linear model plot

In [None]:
# What are these two different blobs? Plot a third variable as a color
fltSurfByStn.plot.scatter(x='Salinity[pss]',y='DIC_LIAR[µmol/kg]',c=fltSurfByStn['Temperature[°C]'],cmap='Purples')