## Part 1: Set filepaths

### Check if we're running in Google Colab
If you are running in Google Colab, you may have to run the cell below twice because the kernel crashes; I'm not sure why this happens.

In [None]:
## check if we're in Colab
try:
    import google.colab

    ## install cftime package
    !pip install -q condacolab
    import condacolab

    condacolab.install()

    ## install extra packages to colab environment
    !mamba install -c conda-forge cftime

    ## connect to Google Drive
    from google.colab import drive

    drive.mount("/content/drive")

    ## flag telling us the notebook is running in Colab
    IN_COLAB = True

except:
    IN_COLAB = False

### <mark>To-do</mark>: update filepaths
__To run this notebook, you'll need to update the filepaths below__, which specify the location of the data (otherwise, you'll get a ```FileNotFoundError``` message when you try to open the data). These filepaths will differ for Mac vs. Windows users and depend on how you've accessed the data (e.g., mounting the WHOI file server or downloading the data).

In [None]:
if IN_COLAB:

    ## These are the paths to update if you're using Google Colab.
    ## 'hist_path' is filepath for historical data.
    ## 'pico_path' is filepath for pre-industrial control data
    hist_path = "/content/drive/My Drive/climate-data"
    pico_path = "/content/drive/My Drive/climate-data/tas_Amon_CESM2_piControl"

else:

    ## These are the paths to update if you're not using Google Colab.
    hist_path = (
        "/Volumes/cmip6/data/cmip6/CMIP/NCAR/CESM2/historical/r1i1p1f1/Amon/tas/gn/1"
    )

    pico_path = (
        "/Volumes/cmip6/data/cmip6/CMIP/NCAR/CESM2/piControl/r1i1p1f1/Amon/tas/gn/1"
    )

### Import packages

In [None]:
import xarray as xr
import numpy as np
import os
import time
import tqdm
import glob
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

## set default plot style
sns.set(rc={"axes.facecolor": "white", "axes.grid": False})

## initialize random number generator
rng = np.random.default_rng()

## Part 2: Open CESM data and compute the WH index

### <mark>To-do:</mark> write a pre-processing function.
This function should trim the data in lon/lat space. We'll use this function to reduce the amount of data we need to load into memory

In [None]:
def trim(data):
    """
    Trim data in lon/lat space to a region around Woods Hole.
    Woods Hole has (lon,lat) coords of approximately (288.5, 41.5).

    Args:
        data: xr.DataArray object

    Returns:
        data_trimmed : xr.DataArray object
    """

    ## to-do: Trim the data in lon/lat space
    data_trimmed = ...

    return data_trimmed

### <mark>To-do</mark>: open *historical* data.
We'll open the data from the historical simulation, trim it using the ```trim``` function from above, and load it into memory. For reference, I've included the filename below.

In [None]:
## filename for the historical simulation data
hist_filename = "tas_Amon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc"

## To-do: open and trim data
T2m_hist = ...

### <mark>To-do:</mark> open *PI control* data
We'll do this using ```xr.open_mfdataset```. To speed up the data-loading process, we'll pass the pre-processing function ```trim``` as an argument to ```xr.open_mfdataset```.

In [None]:
## To-do: open and trim pre-industrial control data
T2m_pico = ...

### <mark>To-do:</mark> write a function to compute climate index

First, we'll write a function to compute the index. Then, we'll apply it to both datasets.

In [None]:
def WH_index(T2m):
    """Function to compute 'Woods Hole climate index. We'll define
    this index as the annual-average temperature in the gridcell
    closest to the (lon, lat) point (288.5, 41.5).

    Args:
        T2m: xr.DataArray with dimensions (lon, lat, time)

    Returns:
        T2m_WH: xr. DataArray with dimension (year)
    """

    ## To-do
    T2m_WH = ...

    return T2m_WH


## Next, apply it to the datasets
T2m_WH_hist = WH_index(T2m_hist)
T2m_WH_pico = WH_index(T2m_pico)

## Part 3: draw random samples from PI-control

### <mark>To-do:</mark> write function to draw *one* sample
We're going to estimate the probability distribution for 
the PI-control run by drawing lots of random samples (with replacement). Let's start by __writing a function which draws a single random sample__ of length ```nyears``` and computes the mean

In [None]:
def get_sample_mean(data, nyears):
    """
    Function draws a random sample from given dataset,
    and averages over period.
    Args:
        'data': xr.DataArray to draw samples from
        'nyears': integer specifying how many years in each sample

    Returns:
        'sample_mean': xr.DataArray containing mean of single sample
    """

    ## To-do: get a random sample
    sample_mean = ...

    return sample_mean

### <mark>To-do:</mark> write function to draw *multiple* samples

In [None]:
def get_sample_means(data, nsamples, nyears=30):
    """
    Function draws multiple random samples, by
    repeatedly calling the 'get_sample_mean' function.
    Args:
        'data': xr.DataArray to draw samples from
        'nsamples': number of samples to draw
        'nyears': number of years in each sample

    Returns:
        'sample_means' xr.DataArray containing mean for each sample

    """

    ## To-do
    sample_means = ...

    return sample_means

Let's apply this function to get 3,000 random samples.

In [None]:
## get random samples
sample_means = get_sample_means(data=T2m_WH_pico, nsamples=3000)

## Part 4: Make a histogram

### <mark>To-do:</mark> Compute histogram of sample means

In [None]:
## To-do
histogram_pico = ...

### <mark>To-do</mark> compute "mean of the histogram"
Think of this as the mean value of the probability distribution. An easy way to do this is by taking the mean of all the sample means. We'll also compute mean value of the index in historical simulation over the last 30 years.

In [None]:
## To-do: get mean of the histogram ("mean of sample means")
pico_mean = ...

## To-do: get mean of last 30 years of historical simulation
hist_last30_mean = ...

### Plot the result below

In [None]:
## blank canvas for plotting
fig, ax = plt.subplots(figsize=(4, 3))

## plot the histogram
ax.stairs(values=histogram_pico, edges=bin_edges, color="k", label="PI-control")

## plot mean value in PI-Control
ax.axvline(pico_mean, c="k", ls="--")

## plot mean value in last ~30 years for historical
ax.axvline(hist_last30_mean, c="r", ls="--", label=r"1984-2014")

## label the plot
ax.set_ylabel("# samples")
ax.set_xlabel(r"$K$")
ax.set_title(r"30-year average $T_{2m}$ in Woods Hole")
ax.legend()

plt.show()