# Chapter 7 - Example: Satellite Data 
###  Days with sea surface temperature above a threshold

In this chapter we exemplify the use of Sea Surface Temperature (SST) data in the cloud. 

This example analyzes a time series from an area of the ocean or a point. If an area, it averages SST values into a single value. Then it analyze the time series to assess when SST is above a given threshold. This could be used to study marine heatwaves, or use a SST threshold relevant to a marine species of interest.

<span style="font-size:larger;">__You must have the Zarr package installed as well__</span>

In [None]:
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt 
import datetime as dt
import cartopy.feature as cfeature
import cartopy.crs as ccrs
import warnings 
warnings.simplefilter('ignore') 

from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter
from calendar import month_abbr

In [None]:
# input parameters

# select either a range of lat/lon or a point. 
# If a point, set both entries to the same value
latr = [35, 40] # make sure lat1 < lat2 since no test is done below to simplify the code
lonr = [-125, -120] # lon1 < lon2, range -180:180. resolution daily 1km!

# time range. data range available: 2002-06-01 to 2020-01-20. [start with a short period]
dater = ['2018-01-06','2018-01-14'] # dates on the format 'YYYY-MM-DD' as string

***
## We are going to use the Multi-Scale Ultra High Resolution (MUR) Sea Surface Temperature (SST) data set
### This dataset is stored the Amazon (AWS) Cloud. For more info and links to the data detail and examples, see: https://registry.opendata.aws/mur/

This dataset is stored in `zarr` format, which is an optimized format for the large datasets and the cloud. It is not stored as one 'image' at a time or a gigantic netcdf file, but in 'chunks', so it is perfect for extracting time series.

First, we open the dataset and explore it, but we are not downloading anything yet.

In [None]:
# first determine the file name using, in the format:
# the s3 bucket [mur-sst], and the region [us-west-2], and the folder if applicable [zarr-v1] 
file_location = 'https://mur-sst.s3.us-west-2.amazonaws.com/zarr-v1'

ds_sst = xr.open_zarr(file_location,consolidated=True) # open a zarr file using xarray
# it is similar to open_dataset but it only reads the metadata

ds_sst # we can treat it as a dataset!

## Now that we know what the file contains, we select our data (region and time), operate on it if needed (if a region, average), and download only the selected data 
It takes a while given the high resolution of the data. So, be patient.... and if you're only testing, might want to choose a small region and a short time period first. 

In [None]:
#remove all values that are for lakes (look at the meta data for the mask field above)
sst_filtered = ds_sst.where(ds_sst.mask != 5, np.nan)

#filter the data using the specified extent of the latitutde, longitude, and time from above
sst = sst_filtered['analysed_sst'].sel(time = slice(dater[0],dater[1]),
                        lat  = slice(latr[0], latr[1]), 
                        lon  = slice(lonr[0], lonr[1])
                        ).mean(dim={'time'}, skipna=True, keep_attrs=True).load() # skip 'not a number' (NaN) values and keep attributes

sst = sst-273.15 # transform units from Kelvin to Celsius
sst.attrs['units']='deg C' # update units in metadata
sst.to_netcdf('data/sst_example.nc') # saving the data, incase we want to come back to analyze the same data, but don't want to acquire it again from the cloud.

***
### *Execute the next cell only if your reading the data from a file - either no access to cloud, or not want to keep reading from it. Skip otherwise. (No problem if you executed it by mistake).*

In [None]:
#open the temperature data and close it 
sst = xr.open_dataset('data/sst_example.nc') 
sst.close()

#look at the temperature data
sst.analysed_sst.values

In [None]:
#define latitude and longitude boundaries
latr = [np.nanmin(sst.analysed_sst['lat']), np.nanmax(sst.analysed_sst['lat'])] 
lonr = [np.nanmax(sst.analysed_sst['lon']), np.nanmin(sst.analysed_sst['lon'])] 

# Select a region of our data, giving it a margin
margin = 0
region = np.array([[latr[0]-margin,latr[1]+margin],[lonr[0]+margin,lonr[1]-margin]]) 

#add state outlines
states_provinces = cfeature.NaturalEarthFeature(
        category='cultural',
        name='admin_1_states_provinces_lines',
        scale='50m',
        facecolor='none')

# Create and set the figure context
fig = plt.figure(figsize=(16,10), dpi = 72) 
ax = plt.axes(projection=ccrs.PlateCarree()) 
ax.coastlines(resolution='10m',linewidth=1,color='black') 
ax.add_feature(cfeature.LAND, color='grey', alpha=0.3)
ax.add_feature(states_provinces, linewidth = 0.5)
ax.add_feature(cfeature.BORDERS, color = 'black')
ax.set_extent([region[1,0],region[1,1],region[0,0],region[0,1]],crs=ccrs.PlateCarree()) 
ax.set_xticks(np.round([*np.arange(region[1,1],region[1,0]+1,1)][::-1],0), crs=ccrs.PlateCarree()) 
ax.set_yticks(np.round([*np.arange(np.floor(region[0,0]),region[0,1]+1,1)],1), crs=ccrs.PlateCarree()) 
ax.xaxis.set_major_formatter(LongitudeFormatter(zero_direction_label=True))
ax.yaxis.set_major_formatter(LatitudeFormatter())
ax.gridlines(linestyle = '--', linewidth = 0.5)

# Plot track data, color by temperature
sst.analysed_sst.plot(transform=ccrs.PlateCarree(),cbar_kwargs={'label': 'Temperature [deg C]'}, cmap = "RdBu_r")
plt.title('Averaged Temperature Values ('+dater[0]+' to '+dater[1]+')', fontdict = {'fontsize' : 16})
plt.show()

***

## Switching to the Salinity Data from the Soil Moisture Active Passive (SMAP) observatory through JPL

For this exercise we'll be use the level 3 collocated salinity data. You can access the data through a direct S3 access through Amazon Web Services (AWS) of selecting the granule data in the Earth Explorer app. The data is averaged by 8 day segments. In this exercise instead of gathering the data through the direct S3 access, you have been provided a single file of the eight day averaged data. The file below contains data from 2018-01-06 to 2018-01-14.

[JPL SMAP Sea Surface Salnity](https://podaac.jpl.nasa.gov/dataset/SMAP_JPL_L3_SSS_CAP_8DAY-RUNNINGMEAN_V5)

In [None]:
#open the salinity dataset and close it 
sss = xr.open_dataset('data/RSS_smap_SSS_L3_8day_running_2018_006_FNL_v04.0.nc4') 
sss.close()

#look at the variable containing the salinity data
sss.sss_smap

Let's us look at the map of the whole world.

In [None]:
sss.sss_smap.plot()

Now let's zoom in to a the West Coast of the US.

In [None]:
#create boundaries for the latitute and longitude
mask_lon = (sss.lon >= 210) & (sss.lon <= 250)
mask_lat = (sss.lat >= 20) & (sss.lat <= 70)

#filter the data using .where() and the two variables we just created
sss_zoomed = sss.where(mask_lon & mask_lat, drop=True)

#show the map of it 
sss_zoomed.sss_smap.plot()

### Compare SST and SSS Data

Lets make two figures that show the spatial differences between the temperature and salinity. Notice 1) the differences in spatial resoluation (the size of the pixels for each data point), and 2) the distance from shore that temperature and salinity are measured to. 

In [None]:
###Temperature Map
#create the figure
plt.figure(figsize=(16,8), dpi = 72)
p = sst.analysed_sst.plot(
    subplot_kws=dict(projection=ccrs.PlateCarree()),
    transform=ccrs.PlateCarree())
plt.title("Temperature")
p.axes.coastlines()

###Salinity Map
#set up variables to filter the latitutde and longitude for the salinity
sss_mask_lon = (sss.lon >= 235) & (sss.lon <= 240)
sss_mask_lat = (sss.lat >= 35) & (sss.lat <= 40)

#creatrue the figure
plt.figure(figsize=(16,8), dpi = 72)
sss_compare = sss.where(sss_mask_lon & sss_mask_lat, drop=True)
q = sss_compare.sss_smap.plot(
    subplot_kws=dict(projection=ccrs.PlateCarree()),
    cmap = 'plasma',
    transform=ccrs.PlateCarree())
plt.title("Salinity")
q.axes.coastlines()


## Interpolating the Salinity Data

Interpolating data is very valuable given the large differences in the resolution of the salinity and temperature data. If we want to accurately compare these data, we need interpolate the values of the salinity data to match the resolution of the temperature data. First you need to match up the longitude degrees. The sea surface temperature data uses negative degrees to represent longitude values west of the prime meridian, so the values range from -180 to 180. The salinity data only uses values east of the prime meridian, so the values range from 0 to 360. Because of these differences we first need to transform the longitude values to match each other. 

Once the longitudes have been corrected, the salinity data can then be interpolated using the .interp_like() function which takes the array you want to interpolate and transforms it to the array you want. This is a 2D interpolation, so it interpolates both latitude and longitude (or two other variables). Another common use of this function is to interpolation data through time instead of the space. 

Looking at the map of the interpolated data, we can see that the resolution of the salinity data now matches the temperature data. 

In [None]:
#transform the longitudes to be from 0-360 to -180-180
sss_transform = sss_compare.sortby(sss_compare.lon)
sss_transform.coords['lon'] = np.mod(sss_transform.coords['lon'] + 180,360) - 180

#interpolate the data using the .interp_like() function
sss_inter = sss_transform.interp_like(sst.analysed_sst)

#look at the shape of the data before and after the interpolation
print("Shape before interpolation: ", sss_compare.dims)
print("Shape after interpolation: ", sss_inter.dims)

#view the new interpolated data
plt.figure(figsize=(16,8), dpi = 72)
q = sss_inter.sss_smap.plot(
    subplot_kws=dict(projection=ccrs.PlateCarree()),
    cmap = 'plasma',
    transform=ccrs.PlateCarree())
plt.title("Interpolated Salinity")
q.axes.coastlines()

## Resources

### Resources specifically for this chapter:

- [MUR SST Data](https://registry.opendata.aws/mur/). SST data in the cloud, with references the official datta website, examples and other resources.

- [Pangeo OSM2020 Tutorial](https://github.com/pangeo-gallery/osm2020tutorial). This is a very good tutorial for ocean application and cloud computing. Plenty of examples. Many of the commands here are from this tutorial.

### If you want to learn more:

- [Methods for accessing a AWS bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html). Bucket is the name of the cloud storage object. S3 stands for Amazon's Simple Storage Service.

- [hvplot site](https://hvplot.holoviz.org/index.html). Plotting tool used here.

- [zarr](https://zarr.readthedocs.io/en/stable/). Learn more about this big data storage format.