## Extreme Event Detection Project - Accessing Sea Surface Temperature Resources

Testing Data:
 * NOAA OISST
 * NOAA ERSST
 * MUR SST

Additional Installs:
 * [ocetrac](https://github.com/ocetrac/ocetrac)?

In [1]:
# Import libraries
import xarray as xr
import numpy as np
import datetime

## ERSSTv5

ERSST is the data product with the longest temporal coverage, but also the coarsest spatial coverage. There is a recommendation to be cautious around use of data before 1980, but it can be helpful for looking at general patterns over large areas.

ERSST is compact enough to be one single netcdf file, which makes loading it and working with it quite manageable.

In [11]:
# ERDDAP Thredds/OpenDap Access Point:
# One File, easy to acess all at Once
url = 'http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/noaa.ersst.v5/sst.mnmean.nc'
url = "https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/198109/oisst-avhrr-v02r01.19810901.nc"
ds = xr.open_dataset(url)
ds = ds.sel(time=slice('1960', '2018'))#.load()
ds

# OISSTv2

OISSTv2 has an increased spatial resolution with a 0.25 x 0.25 degree resolution, and daily measurements from 1981-Present. This resolution is nice, but makes OISSTv2 too large to fit into. a single file or a single direct access point, so there is some extra work to load it all using ERDDAP/OpenDap 


## How do I find an OpenDap direct access url?

 1. Find the dataset of interest's NCEI landing page:
 
 https://catalog.data.gov/dataset/noaa-optimum-interpolation-1-4-degree-daily-sea-surface-temperature-oisst-analysis-version-2/resource/a15ddf8c-95f5-48de-b367-3c7780ca6a66
 and
 https://www.ncei.noaa.gov/metadata/geoportal/rest/metadata/item/gov.noaa.ncdc:C00844/html
 
 2. Click on either the NCEI direct download link, or the THREDDS data catalog
 
 https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/
 or
 https://www.ncei.noaa.gov/thredds/blended-global/oisst-catalog.html
 
 3. Navigate the folder-style directories to identify the url structure for the netcdf files you want
 
 https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/
 
 4. If needing multiple files, build a list that you can loop through that matches the file pattern
 
 

### Loading a Single Date of OISST Data

In [16]:
# OISST OpenDap Direct Access - One Day

# Example of One OISST Date
url = "https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/198109/oisst-avhrr-v02r01.19810901.nc"
ds = xr.open_dataset(url)
ds = ds.sel(time=slice('1960', '2018'))#.load()
ds

### Loading OISST for a Range of Dates

Access points for collections like OISST follow a common pattern within the directories where they are stored/accessible. This common pattern makes it possible to generate a list of access points programmatically.

In [22]:
# Using a start and end date we can generate a daily file structure
d1 = datetime.date(1982, 1, 1)
d2 = datetime.date(2020, 12, 31)
days = [d1 + datetime.timedelta(days=x) for x in range((d2-d1).days + 1)]

# Print a couple
for day in days[0:4]:
    print(day.strftime('%Y%m%d'))


19820101
19820102
19820103
19820104


In [27]:

###. These are the URL locations that contain subdirectories for each daily file

# THREDDS
catalog_url = "https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/"

# NCEI direct access - Does not work with open_mfdataset
# catalog_url = 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/'

# Build a list of links around a start and end date, following the file naming structure:

# Start and end dates
d1 = datetime.date(2020, 1, 1)
d2 = datetime.date(2020, 12, 31)

# List of dates in that range
days = [d1 + datetime.timedelta(days=x) for x in range((d2-d1).days + 1)]

# List of access point URLs 
url_l    = []
for day in days:
    date = day.strftime('%Y%m%d')
    day_file = f"{catalog_url}{date[0:6]}/oisst-avhrr-v02r01.{date}.nc"
    url_l.append(day_file)
    

# Print what a couple access points look like
for url in url_l[0:3]:
    print(url)

https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/198201/oisst-avhrr-v02r01.19820101.nc
https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/198201/oisst-avhrr-v02r01.19820102.nc
https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/198201/oisst-avhrr-v02r01.19820103.nc


### Accessing Multiple Access Points with `xr.open_mfdataset`

Using the list of URLs we can create a single xr.Dataset that contains each daily file appended together into a single sliceable object.

If this is being passed off the OpenDap this can take quite a long time, so plan accordingly.

In [32]:
# Use xr.open_mfdataset to lazy load the metadata for all dates within that range
oisst = xr.open_mfdataset(url_l)
oisst

Unnamed: 0,Array,Chunk
Bytes,55.02 GiB,3.96 MiB
Shape,"(14245, 1, 720, 1440)","(1, 1, 720, 1440)"
Count,42735 Tasks,14245 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 55.02 GiB 3.96 MiB Shape (14245, 1, 720, 1440) (1, 1, 720, 1440) Count 42735 Tasks 14245 Chunks Type float32 numpy.ndarray",14245  1  1440  720  1,

Unnamed: 0,Array,Chunk
Bytes,55.02 GiB,3.96 MiB
Shape,"(14245, 1, 720, 1440)","(1, 1, 720, 1440)"
Count,42735 Tasks,14245 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,55.02 GiB,3.96 MiB
Shape,"(14245, 1, 720, 1440)","(1, 1, 720, 1440)"
Count,42735 Tasks,14245 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 55.02 GiB 3.96 MiB Shape (14245, 1, 720, 1440) (1, 1, 720, 1440) Count 42735 Tasks 14245 Chunks Type float32 numpy.ndarray",14245  1  1440  720  1,

Unnamed: 0,Array,Chunk
Bytes,55.02 GiB,3.96 MiB
Shape,"(14245, 1, 720, 1440)","(1, 1, 720, 1440)"
Count,42735 Tasks,14245 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,55.02 GiB,3.96 MiB
Shape,"(14245, 1, 720, 1440)","(1, 1, 720, 1440)"
Count,42735 Tasks,14245 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 55.02 GiB 3.96 MiB Shape (14245, 1, 720, 1440) (1, 1, 720, 1440) Count 42735 Tasks 14245 Chunks Type float32 numpy.ndarray",14245  1  1440  720  1,

Unnamed: 0,Array,Chunk
Bytes,55.02 GiB,3.96 MiB
Shape,"(14245, 1, 720, 1440)","(1, 1, 720, 1440)"
Count,42735 Tasks,14245 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,55.02 GiB,3.96 MiB
Shape,"(14245, 1, 720, 1440)","(1, 1, 720, 1440)"
Count,42735 Tasks,14245 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 55.02 GiB 3.96 MiB Shape (14245, 1, 720, 1440) (1, 1, 720, 1440) Count 42735 Tasks 14245 Chunks Type float32 numpy.ndarray",14245  1  1440  720  1,

Unnamed: 0,Array,Chunk
Bytes,55.02 GiB,3.96 MiB
Shape,"(14245, 1, 720, 1440)","(1, 1, 720, 1440)"
Count,42735 Tasks,14245 Chunks
Type,float32,numpy.ndarray


# MUR SST

MUR is an ultra-high resolution satellite SST product which has been available since 2001. The data can be accessed via erddap. ERDDAP has some increased functionality for server-side processing, so it should be possible to speed things up considerably despite it being a higher-resolution dataset.