# Oceanhackweek 2019 At A Glance

We want you to be able to:
- Describe the flow of OHW19 tutorials
- Identify specific topics you are interested in diving deeper to explore through projects

## Curriculum Overview

The OHW19 curriculum follows the general process of answering scientific questions based on observational and/or model ocean data.

All tutorials will be recorded. The recordings will be posted on the [OHW19 Schedule page](https://oceanhackweek.github.io/curriculum.html) as soon as possible after the session.

The tutorial repositories are hosted under the [Oceanhackweek GitHub organization](https://github.com/oceanhackweek).

![](./img/ohw19_curriculum_overview.png)

## Lightning Talk - Demo a Tool!

Interested in sharing a tool that you are using or developing?

[Sign up on this Google Sheet](https://docs.google.com/spreadsheets/d/19iJBO9S-05RwiOyD4CsIjiPU0yx5Cl9WnfHICSlkOps/edit?usp=sharing) to give a 5-10 mins presentation on Wednesday 1:30-2:30pm.

## Git

**Make sure to have `git` installed on your local machine by 4pm today!**

The `git` tutorial will follow immediately from the project discussion, so you'll be practicing `git` in the context of your project(s) and with your teammates. 

We will go through:
- Basic `git` operations: `clone`, `diff`, `add`, `commit`, `push`, and interacting with your GitHub repository.
- More advanced materials such as `fork` and configuring `remote` to make project collaboration easier.

Come ask any questions at the helpdesks!

## Data Access & Visualization

**Programmatic query and access of multiple ocean data systems**
- Systems: [IOOS](https://ioos.noaa.gov/), [OOI](https://oceanobservatories.org/), [PO.DAAC](https://podaac.jpl.nasa.gov/), etc.
- Approach: [ERDDAP](http://www.ifremer.fr/erddap/index.html), [THREDDS data server](https://www.unidata.ucar.edu/software/tds/current/reference/Services.html), system-specific APIs

The example below uses materials created by Filipe Fernandes ([@ocefpaf](https://github.com/ocefpaf)) that can be accessed [here](https://github.com/oceanhackweek/ohw19-tutorial-data-access-viz/blob/master/Extras-ERDDAP_Argo.ipynb).

In [None]:
from erddapy import ERDDAP

server = "http://www.ifremer.fr/erddap"
e = ERDDAP(server=server)

### Query and access data

In [None]:
e.dataset_id = "ArgoFloats"
e.protocol = "tabledap"
e.variables = ["latitude","longitude","date_creation"]
e.constraints = {"time>=": "2018-08-05T00:00:00Z",
                 "time<=": "2019-06-12T00:00:00Z",
                 "longitude>=": -133.75,
                 "longitude<=": -123.29,
                 "latitude>=": 41.78,
                 "latitude<=": 52.24}

In [None]:
df = e.to_pandas(parse_dates=["date_creation (UTC)"])
df["year"] = df["date_creation (UTC)"].dt.year

### Plot and inspect data interactively

In [None]:
import geoviews as gv
import geoviews.feature as gf

gv.extension("bokeh")

In [None]:
argo = gv.Dataset(df,kdims=["year"])
points = argo.to(gv.Points,
    ["longitude (degrees_east)", "latitude (degrees_north)"],
    ["year"])
tiles = gv.tile_sources.Wikipedia
tiles * points.opts(tools=["hover"], width=500, height=500,
    size=5, cmap="tab10", color="year")

## Manipulate and Compute "Big" Data

In our context, "big" means "larger than memory."

The example below uses materials created by Ryan Abernathey ([@rabernat](https://github.com/rabernat)) that can be access [here](https://github.com/pangeo-data/pangeo-ocean-examples/blob/master/gfdl-cm2_6.ipynb).

In [None]:
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
import holoviews as hv
import datashader
from holoviews.operation.datashader import regrid, shade, datashade

hv.extension('bokeh', width=100)

### Load data from Cloud Data Storage

In [None]:
import intake
cat = intake.Catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean/GFDL_CM2.6.yaml")
ds = cat["GFDL_CM2_6_control_ocean_3D"].to_dask()
ds

Note the multi-dimensional nature of this data set.

Also note the size of the data set:

In [None]:
print('The size of the data set is %4.2f TB!' % (ds.nbytes/1e12))

### Perform distributed computing using xarray/dask

In [None]:
from dask.distributed import Client, progress
from dask_kubernetes import KubeCluster
cluster = KubeCluster()
cluster.adapt(minimum=1, maximum=20)
client = Client(cluster)

In [None]:
temp_zonal_mean = ds.temp.isel(time=slice(0, 10)).mean(dim=('time', 'xt_ocean'))
temp_zonal_mean

In [None]:
%time temp_zonal_mean.load()

In [None]:
fig, ax = plt.subplots(figsize=(16,8))
temp_zonal_mean.plot.contourf(yincrease=False, levels=np.arange(-2,30))
plt.title('Naive Zonal Mean Temperature')

## Machine Learning

**Motivating questions**
- What are the problems you want to solve using machine learning?
- Which method(s) should you use?
- How do you know if your methods have helped you achieved the goals?

**Categories of machine learning methods**
* Supervised Learning
* Unsupervised Learning
* Semi-supervised Learning
* Reinforcement Learning

**A road map for solving problems**
![](./img/MLmap.png)

## Reproducible Research

- Reproducible vs replicable research?
- What should I do when I want to share my work?
    - Where to share data?
    - How to package software?
    - Which license to use?
- What should I do to make my code easy to read/use?

## Where to go from here?

### Ask questions and help your neighbor!

### Helpdesks

- Machine learning (Tue)
- Ocean Data Systems Consultation (Tue-Wed)
- Git, Python, and Pangeo (Tue-Thu)

### Slack channels

_Add yourself to the channels_ you are interested in!

**#projects** - Pitch and get feedback for OHW19 projects. Please pin you project ideas so others can scroll through them quickly.

**#data_science** - Ask for help with any data science questions (git, Python, methods, tools, etc.).

**#ocean_data_systems** - Ask questions about existing ocean data sources and access.

### Check out [OHW19 wiki](https://oceanhackweek.github.io/wiki)