# MS 263 Data Analysis Final Project

## Introduction

In November and December of 2014, Dr. Amanda Kahn and colleagues executed a series of ROV dives located west of Vancouver, BC in the Strait of Georgia over the Fraser Ridge Reef (Fig. 1). This location was of specific interest because it is the home to a glass sponge reef. Glass sponges are a class of sponges that are made of four- and six-sided spicules made of silica. They are extremely fragile and the only known reefs exist off the coast of British Columbia and Washington. Prior to their discovery in 1987, sponge reefs were thought to have gone extinct during the Jurassic period (Conway et al., 1991). Sponges are filter feeders and their presence in dense numbers acts to slow currents in the water column. This causes sediment to fall out of the water column and build up around the reef, providing habitat and protection for marine organisms (Krautter et al., 2006). Because of their importance and rarity, Dr. Kahn was curious how the water quality above these reefs changes compared to the water surrounding the reefs. Sponges intake water through the ostia located throughout their body and expel it vertically through the osculum after filtering out food. As a product of this process, the water above sponge reefs is likely to differ in its characteristics from water that is not filtered. Because of the importance of sponge reefs for other organisms, knowing the quality of the water above the reef can have implications for species health and composition. 

In order to answer the question of whether or not water characteristics differ over glass sponge reefs, data from the ROV dives mentioned above can be compared, since some of the  transects followed were over glass sponge reefs while others were not. The ROV dives output three different sets of data. There were dive annotations, which gave qualitative observations along with specific times, latitudes, longitudes, and depths at which these observations occurred. CTD data was also collected giving a specific date, time, and depth as well as water temperature, salinity, oxygen concentration, and pressure. Finally, navigation data from the ROV was also recorded, giving specific information regarding date, time, depth, latitude, longitude, and various ROV velocity and heading data. The navigation data recorded five observations per second, the CTD data recorded one observation per second, and the dive annotations were done qualitatively whenever something of note happened during the dive. In total, nine transects were executed over seven dives. 


In [None]:
extent = [-122.3, -121.6, 36.5, 37]
plt.figure()
ax = make_map(ccrs.Mercator())
ax.set_extent(extent)
#ax.coastlines()
ax.coastlines('10m')

Figure 1. Map of the Strait of Georgia, with the red star indicating the transects' location on and around the Fraser Ridge Glass Sponge Reef.

## Methods

Transects 1, 2, 3, and 6 were above a glass sponge reef, transects 4, 5, and 8 were above sediment surrounding the reef (non-reef), and transects 7 and 9 were cross-reef transects partially going over a glass sponge reef. 

## Results

### Import Packages

In [None]:
!pip install maptools

In [1]:
import pandas as pd
import numpy as np
from make_pca import make_pca
from make_plots import make_plots
from create_datasets import create_datasets
from scipy import stats
import statsmodels.api as sm
import statsmodels.formula.api as smf
from matplotlib import pyplot as plt
import xarray as xr
import cartopy.crs as ccrs
from maptools import make_map

ImportError: cannot import name 'make_map' from 'maptools' (/opt/miniconda3/lib/python3.8/site-packages/maptools/__init__.py)

### Read in data

In [None]:
df = pd.read_csv("alltransects.csv")
df.columns
conditions = [(df['transect'] == 1) | (df['transect'] == 2) | (df['transect'] == 3) | (df['transect'] == 6),
(df['transect'] == 4) | (df['transect'] == 5) | (df['transect'] == 8), (df['transect'] == 9) | (df['transect'] == 7)]
values = ['on', 'off', 'cross']
df['reef'] = np.select(conditions, values)

### Make Figures for each transect ( How to facet by transect and on/off reef?)

In [None]:
for i in np.unique(df['transect']):
    transect_sub, oxy_interp_linear, oxy_interp_cubic, min_oxy, max_oxy = create_datasets(df, i, 'oxygen')
    make_plots(transect_sub, oxy_interp_linear, oxy_interp_cubic, i, 'oxygen', min_oxy, max_oxy)

### PCA

In [None]:
from make_pca import make_pca
make_pca(df, ['temp', 'salinity', 'oxygen'])

### Generalized Linear Models

In [None]:
model1 = smf.ols('oxygen ~ 1 + depth + reef', data = df).fit()
model1.summary()

In [None]:
model2 = smf.ols('oxygen ~ 1 + reef + depth + temp + salinity', data = df).fit()
model2.summary()

## Conclusions

Over Glass sponge reef --> less oxygen!! 

## Future Work and Limitations

There are clear limitations in the cubic spline and linear interpolation techniques that cause the data to appear skewed. Cubic spline interpolation is susceptible to over-estimations in areas where there is not nearby data collection. This can be seen in the cubic spline interpolation figures where oxygen content higher than any of the data points are seen in the interpolation areas that are far from data collection locations. Linear interpolation fills in the space between data points in a way that makes more sense (no overestimations), but is still not the most accurate interpolation method. WHY NOT. 

There are two alternative approaches that remedy the visualization problem. The first lies within data collection techniques. As seen in the figures, data across different transects was not collected at uniform depths. Interpolation must be conducted across far vertical and horizontal distances, making visualizations appear messy and potentially inaccurate. In the future, I recommend that data is collected at a uniform depth across all transects. This will also remove depth as a potential predictor for oxygen content between different transects. In transect 8, you can see the data was collected at relatively regular depth intervals (all of which are less than 3.5 m above the seafloor). This has reduced the amount of overestimation that the cubic spline interpolation has done. 

Another solution would be to use the kriging interpolation technique. This technique is difficult to utilize in Python, but would likely be more accurate than the cubic spline and linear interpolations. KRIGING THEORY

## References

Need to reference: matplotlib, scipy, numpy, pandas, physoce + papers

In [None]:
z, ss = kriging.execute("grid", gridx, gridy)

In [None]:
kt.write_asc_grid(gridx, gridy, z, filename="output.asc")
plt.imshow(z)
plt.show()

In [None]:
help(ok)

In [None]:
df.head