*Version 4*<br>
*Last updated: 2021-06-08<p>*
christina.herrick@unh.edu<br>
steeleb@caryinstitute.org<br>
__Please do not distribute__

## Chris to-do:
Add flag to export csv:


*   too cold pixels
*   -1 > skew > 1
* set up export of all available site-specific data where available? (low priority)

## B changes

* user-defined drive path to remaining export of .csv's and ancillary data


# Initial Setup

This notebook uses the [Google Earth Engine](https://developers.google.com/earth-engine) [Python API](https://developers.google.com/earth-engine/guides/python_install) to apply the Single-Channel (SC) algorithm, a model-based algorithm using atmospheric correction coefficients, to Landsat thermal imagery and derives water body skin surface temperatures. 

Algorithms for Landsat 4,5,7 can be found in *Jiménez-Muñoz, J. C., J. Cristóbal, J. A. Sobrino, G. Soria, N. Ninyerola, and X. Pons. 2009. Revision of the Single-Channel Algorithm for Land Surface Temperature Retrieval From Landsat Thermal-Infrared Data. IEEE Transactions on Geoscience and Remote Sensing 47:339–349.* [https://doi.org/10.1109/TGRS.2008.2007125](https://doi.org/10.1109/TGRS.2008.2007125)

Algorithms for Landsat 8 can be found in *Jiménez-Muñoz, J. C., J. A. Sobrino, D. Skoković, C. Mattar, and J. Cristóbal. 2014. Land Surface Temperature Retrieval Methods From Landsat-8 Thermal Infrared Sensor Data. IEEE Geoscience and Remote Sensing Letters 11:1840–1843.* [https://doi.org/10.1109/LGRS.2014.2312032](https://doi.org/10.1109/LGRS.2014.2312032)


To run this script successfully, the bounding box coordinates of a lake are entered in a Google Spreadsheet. If you need help finding those coordinates, try using [OpenStreetMap](https://www.openstreetmap.org/export#map=12/43.3826/-72.0157). The bounding box should be as small as possible while still including the entire lake surface.

## Modules

This section of code blocks imports necessary python modules for the notebook to run. You will be prompted to click one or more URLs and be taken to a page to sign in with your Google account. This account must already be authorized to use Google Earth Engine. If you do not already have access, [fill out this application](https://signup.earthengine.google.com/#!/).

Copy the unique code(s) provided on the web page when prompted and paste where prompted to finish authorization.

In [None]:
#@markdown __Run this block__ to authorize Colab to authenticate your Google account and give it access to upload files from your local computer ([to be used later](https://colab.research.google.com/drive/1AJFnCB7B7Uev2-0hqIayOb5fkSB0Xn8v#scrollTo=L_Tf54wwIPRy) in the notebook)
from google.colab import auth, files
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials

In [None]:
#@markdown __Run this block__ to connect Colab to your Earth Engine account
import ee
ee.Authenticate()
ee.Initialize()

In [None]:
#@markdown __Run this block__ to connect Colab to your Google Drive
from google.colab import drive
drive.mount('/content/drive', force_remount=False)

In [None]:
#@markdown __Run this block__ to install packages and functions used for 
#@markdown interactive mapping and data exploration (double-click to reveal code)
import pandas as pd
import numpy as np
import os
import re
import matplotlib.pyplot as plt
from time import strftime, sleep
from datetime import datetime, timedelta
import pytz
import folium
from folium import plugins
from google.colab import data_table

# Add custom basemaps to folium
basemaps = {
    'Google Maps': folium.TileLayer(
        tiles = 'https://mt1.google.com/vt/lyrs=m&x={x}&y={y}&z={z}',
        attr = 'Google',
        name = 'Google Maps',
        overlay = True,
        control = True
    ),
    'Google Satellite': folium.TileLayer(
        tiles = 'https://mt1.google.com/vt/lyrs=s&x={x}&y={y}&z={z}',
        attr = 'Google',
        name = 'Google Satellite',
        overlay = True,
        control = True
    ),
    'Google Terrain': folium.TileLayer(
        tiles = 'https://mt1.google.com/vt/lyrs=p&x={x}&y={y}&z={z}',
        attr = 'Google',
        name = 'Google Terrain',
        overlay = True,
        control = True
    ),
    'Google Satellite Hybrid': folium.TileLayer(
        tiles = 'https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}',
        attr = 'Google',
        name = 'Google Satellite',
        overlay = True,
        control = True
    ),
    'Esri Satellite': folium.TileLayer(
        tiles = 'https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}',
        attr = 'Esri',
        name = 'Esri Satellite',
        overlay = True,
        control = True
    )
}

# Define a method for displaying Earth Engine image tiles on a folium map.
def add_ee_layer(self, ee_object, vis_params, name):
    
    try:    
        # display ee.Image()
        if isinstance(ee_object, ee.image.Image):    
            map_id_dict = ee.Image(ee_object).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
            ).add_to(self)
        # display ee.ImageCollection()
        elif isinstance(ee_object, ee.imagecollection.ImageCollection):    
            ee_object_new = ee_object.mosaic()
            map_id_dict = ee.Image(ee_object_new).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
            ).add_to(self)
        # display ee.Geometry()
        elif isinstance(ee_object, ee.geometry.Geometry):    
            folium.GeoJson(
            data = ee_object.getInfo(),
            name = name,
            overlay = True,
            control = True
        ).add_to(self)
        # display ee.FeatureCollection()
        elif isinstance(ee_object, ee.featurecollection.FeatureCollection):  
            ee_object_new = ee.Image().paint(ee_object, 0, 2)
            map_id_dict = ee.Image(ee_object_new).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
        ).add_to(self)
    
    except:
        print("Could not display {}".format(name))
    
# Add EE drawing method to folium.
folium.Map.add_ee_layer = add_ee_layer

print("Packages installed")

## Imported variables
These are Landsat-specific variable settings for GEE.

In [None]:
#@markdown __Run this block__ to import pre-existing features, images, and collections from Earth Engine (double-click to reveal code)
wrs2 = ee.FeatureCollection("users/christinaherrickunh/WRS2_descending_2018")
utmbounds = ee.FeatureCollection("users/christinaherrickunh/UTM_Zone_Boundaries")
sw = ee.Image("JRC/GSW1_2/GlobalSurfaceWater")
ag100 = ee.Image("NASA/ASTER_GED/AG100_003")
vapor = ee.ImageCollection("NCEP_RE/surface_wv")
l4t1 = ee.ImageCollection("LANDSAT/LT04/C01/T1")
l4t2 = ee.ImageCollection("LANDSAT/LT04/C01/T2")
l5t1 = ee.ImageCollection("LANDSAT/LT05/C01/T1")
l5t2 = ee.ImageCollection("LANDSAT/LT05/C01/T2")
l7t1 = ee.ImageCollection("LANDSAT/LE07/C01/T1")
l7t2 = ee.ImageCollection("LANDSAT/LE07/C01/T2")
l8t1 = ee.ImageCollection("LANDSAT/LC08/C01/T1")
l8t2 = ee.ImageCollection("LANDSAT/LC08/C01/T2")

print("variables installed")

Atmospheric functions are derived from modeling the [TIGR2311](https://doi.org/10.1175/1520-0450(2002)041%3C0144:ARNNAF%3E2.0.CO;2) atmospheric sounding database, except Landsat 8, which come from [GAPRI4838](http://dx.doi.org/10.1080/01431161.2015.1054965). These are necessary for the Single-Channel algorithm to run successfully. 

In [None]:
#@markdown __Run this block__ to import atmospheric functions
coeff4 = [0.06674,-0.03447,1.04483,-0.50095,-1.15652,0.09812,-0.04732,1.50453,-0.34405] #  TIGR2311
coeff5 = [0.08158,-0.05707,1.05991,-0.58853,-1.08536,-0.00448,-0.06201,1.59086,-0.33513] #  TIGR2311
coeff7 = [0.06982,-0.03366,1.04896,-0.51041,-1.20026,0.10490,-0.05457,1.52631,-0.32136] #  TIGR2311
coeff8 = [0.04019,0.02916,1.01523,-0.38333,-1.50294,0.20324,0.00918,1.36072,-0.27514] #  GAPRI4838
print("Coefficients imported")


Landsat scenes contain a QA band meant for bitwise masking of pixels. Information can be found at [USGS](https://www.usgs.gov/core-science-systems/nli/landsat/landsat-collection-1-level-1-quality-assessment-band?qt-science_support_page_related_con=0#qt-science_support_page_related_con). 

Below are lists of binary pixel values used for quality control, represented as `qa_values` (Landsat 4,5,7) and as `qa_val_8` (Landsat 8). <br>
Cloud, snow, and ice pixels are removed.

Excel spreadsheets of pixel quality values and their definititions can be found:
- for Landat 4-7 [here](https://unh.box.com/v/landsat4-7-qavalues); 
- for Landsat 8, [here](https://unh.box.com/v/landsat8-qavalues)

In [None]:
#@markdown __Run this block__ to set masked pixels from the Landsat QA bands
qa_values = ee.List([672,676,680,684,704,708,712,716,928,932,936,940,960,964,968,972]) #  KEEP these pixel values
                        # 1696,1700,1704,1708,1728,1732,1736,1740]) <-- these are snow/ice

qa_val_8 = ee.List([2,2720,2722,2724,2728,2732,2752,2756,2760,2764,2976,2980,2984,2988,3008,3012,3016,3020]) #  KEEP these pixel values
                        # ,3744,3748,3752,3756,3776,3780,3784,3788]); <-- these are snow/ice

## User-set variables

In this section, use either *Option A*, where you define the bounding box, or *Option B* where you import a shape file of your lake. 

The final section, *Set Remaining Variables*, defines acceptable lake coverage percent, RMSE of Landsat Tier 2 inclusion, dates of interest, and path/row of Landsat flyover.

###Option A: use a bounding box
If you need help finding your coordinates, try using [OpenStreetMap](https://www.openstreetmap.org/export#map=12/43.3826/-72.0157). The bounding box should be as small as possible while still including the entire lake surface.<p>
Enter coordinates to see a map of the bounding box. You can also click on the map to get interactive pop-ups of the latitude (y) and longitude (x), and replace/re-run the coordinates.

In [None]:
#@markdown __Run this block__ after inputting your coordinates below.
#@markdown Do not run this block if you are using Option B

#@markdown `lat1` = North
lat1 = '43.4342'  #@param {type:"string"}
#@markdown `lat2` = South
lat2 = '43.3162'  #@param {type:"string"}
#@markdown `lon1` = West
lon1 = '-72.0937'  #@param {type:"string"}
#@markdown `lon2` = East
lon2 = '-72.0133'  #@param {type:"string"}

use_user_input_file = "no"
try:
  lat1 = float(lat1)
  lon1 = float(lon1)
  lat2 = float(lat2)
  lon2 = float(lon2)
except ValueError as e:
  raise ValueError("Coordinates are missing or are not numbers")
box = ee.Geometry.Polygon(
    [[[lon1, lat1], # ur
      [lon2, lat1], # ul
      [lon2, lat2], # ll
      [lon1, lat2]]], None, False) # lr
center_lat = (lat1+lat2)/2
center_lon = (lon1+lon2)/2
locs_map = folium.Map(location=[center_lat,center_lon],zoom_start=10,width=600,height=600)
locs_map.add_child(folium.LatLngPopup())
basemaps['Google Maps'].add_to(locs_map)
locs_map.add_ee_layer(box,{},'aoi')
display(locs_map)

###Option B: upload a table file
Using your Google Earth Engine account, upload a shapefile or csv of your study lake to your Assets and link to it here.<p>Click [here](https://drive.google.com/file/d/1lfFoZzQD_wAA7Bnil7mI4zwjKDG5kegh/view?usp=sharing) and [here](https://drive.google.com/file/d/1KNaZV63J_LUp6X1OcoLX1wonF0ASiIzi/view?usp=sharing) for screenshots of the process. 

In [None]:
#@markdown File path should begin with 'users/'
user_input_file = "users/steeleb/sunapee" #@param {type: "string"}
use_user_input_file = "yes" #@param ["yes","no"] {allow-input: true}
#@markdown Remember to __run this block__ after filling in the above fields.
#@markdown Do not run this block if you are using Option A.

In [None]:
#@markdown For Opt B, __run this block__ to calculate the centroid latitude and longitude
if len(user_input_file)>1:
  shp = ee.FeatureCollection(user_input_file).geometry()
  centroid = shp.centroid(10).getInfo()["coordinates"]
  center_lat = centroid[1]
  center_lon = centroid[0]
  print(center_lat)
  print(center_lon)
else:
  print("No file input; will use Opt A bounding box coordinates")

In [None]:
#@markdown For Opt B, __run this block__ to confirm the waterbody AOI
if 'shp' in globals() or 'shp' in locals():
  shp_map = folium.Map(location=[center_lat,center_lon],zoom_start=10,width=600,height=600)
  shp_map.add_child(folium.LatLngPopup())
  basemaps['Google Maps'].add_to(shp_map)
  shp_map.add_ee_layer(shp,{},'aoi')
  display(shp_map)
else:
  print("Display does not apply here; Opt A bounding box is being used for AOI")

### Set remaining variables
This section defines acceptable water persistence and error for the Landsat pixels, the time period of interest, and confirms the path and row for Landsat acquisition. You can also change the default Google Drive folder for any outputs.

In [None]:
#@markdown The default path for any exports is in the Colab Notebooks directory 
#@markdown within Google Drive. If preferred, it can be changed here
output_dir = "/content/drive/MyDrive/herrick_etal_temp" #@param {type:"string"}
#@markdown Enter a file naming prefix for any exports (no spaces or special characters)
file_prefix = "sunapee" #@param {type:"string"}

if not os.path.exists(output_dir):
  os.makedirs(output_dir)

def check_splcharacter(test):
  string_check = re.compile('[@!#$%^&*() <>?/\|}{~:]')
  if not string_check.search(test)==None:
    print("Your file prefix contains special characters, please fix")
    file_prefix = "your_lake_name"
  else:
    print("File prefix:",test)

check_splcharacter(file_prefix)

In [None]:
#@markdown Boundaries of water bodies are automatically detected using the JRC 
#@markdown Global Surface Water Mapping Layer dataset of percent water occurrence. 
#@markdown By default, a pixel has to be classified as water at least 55% of the 
#@markdown time. This is defined as `pctTime`

#@markdown <p>More information on this dataset can be found at https://doi.org/10.1038/nature20584

pctTime = 55  #@param {type: "number"}
#@markdown __Run this block__ after inputting the above parameter

In [None]:
#@markdown Landsat scenes from both Tier 1 and Tier 2 of Collection 1 are 
#@markdown included for possible use. From Tier 2, only scenes that are processed 
#@markdown to L1T with a combined RMS error of no greater than 30 meters are 
#@markdown considered.
rmse = 30  #@param {type: "number"}
#@markdown __Run this block__ after inputting the above parameter

In [None]:
#@markdown Landsat records begin in 1984 with Landsat 4 TM. 
#@markdown Enter the year range that you want to search, as well as months of 
#@markdown the year. To search all months, `m1 = 1` and `m2 = 12` <p>
#@markdown <p>First year (y1) and last year (y2). Years are inclusive.<p>
#@markdown <p>First month (m1) and last month (m2). Months are inclusive.<p>
#@markdown <p>Note that this method has only been developed for lake ice-off temperature estimation. 

y1 = 1985 #@param {type: "number"}
y2 = 2020 #@param {type: "number"}

m1 = 5 #@param {type: "number"}
m2 =  11#@param {type: "number"}
print('Script will search years %d through %d and months %d through %d of each year' % (y1,y2,m1,m2))
#@markdown __Run this block__ after inputting the above parameters

In [None]:
#@markdown __Run this block__ to calculate the Landsat path(s) and row(s) that 
#@markdown overlap the extent of Option A/B It will print as
#@markdown `WRS2 Path/Row: PPPRRR` where `PPP` is the three-digit path, and 
#@markdown `RRR` is the three-digit row.

if use_user_input_file=="yes":
  try:
    box = shp
  except NameError:
    raise NameError("Cannot find user-input file. Are you using Option B or Opt A bounding box? You may need to correct and rerun.")

pathrow = wrs2.filterBounds(box)
  
num_of_pr = len(pathrow.getInfo()["features"])
path_west = 1
path_east = 233
row_north = 122
row_south = 1

print('landsat path/rows:')
for i in range(0,num_of_pr):
  pr = pathrow.getInfo()["features"][i]["properties"]["PR"]
  p = pr[:3]
  r = pr[3:]
  if int(p) > path_west:
    path_west = int(p)
  if int(p) < path_east:
    path_east = int(p)
  if int(r) < row_north:
    row_north = int(r)
  if int(r) > row_south:
    row_south = int(r)
  print(f"p{p} r{r}")

utm = utmbounds.filterBounds(box)
printUTM = utm.getInfo()["features"][0]["properties"]["ZONE"]
print("UTM Zone:",printUTM, "(does not print more than one zone if more are present)")
crs_out = f"EPSG:326{printUTM}"

zone_map = folium.Map(location=[center_lat,center_lon],zoom_start=7,width=600,height=600)
zone_map.add_child(folium.LatLngPopup())
basemaps['Google Maps'].add_to(zone_map)
zone_map.add_ee_layer(utm,{},'utm zone(s)')
zone_map.add_ee_layer(pathrow,{},'path row')
zone_map.add_ee_layer(box,{},'aoi box')
display(zone_map)
print(f'\nbounding paths: {path_west} to {path_east}')
print(f'bounding rows: {row_north} to {row_south}')

In [None]:
#@markdown Use the printed statement above to guide your entry for the path and row variables below. If there is only one path and/or one row, enter the same number for `p1,p2` and `r1,r2` <p> 
#@markdown Starting and ending path (*Path 233 starts at the Prime Meridian and decreases moving west-east*)
p1 = 13 #@param {type: "number"}
p2 =  13#@param {type: "number"}
#@markdown Starting row (*Row 1 starts in the Arctic and increases moving north-south*)
r1 =  30#@param {type: "number"}
r2 = 30 #@param {type: "number"}
print(f"path {p1} to {p2}, row {r1} to {r2}")
#@markdown __Run this block__ after inputting the above parameters

# Find water pixels & delineate boundary
This step uses the bounding box or the table file as well as the water persistence using [JRC Water dataset](https://global-surface-water.appspot.com/) to define the waterbody area. 

Note, in some cases the _Opt A_ bounding box method will include upstream and downstream water areas, particularily where there are rivers that are large enough to be detected by Landsat. To avoid this, use the _Opt B_ table file method. <p>
You can explore the JRC Water dataset [here](https://global-surface-water.appspot.com/).<p>

In [None]:
#@markdown This code block reduces the bounding box defined above to the largest water body present, and then counts the 30m pixels within the water body.<p>
#@markdown Landsat scenes included in analysis must have a minimum percentage of clear lake pixels available (these are discontinuous pixels)
pctLakeCoverage = 25 #@param
#@markdown __Run this block__ after inputting the above parameter

wateroccurrence = sw.select(0)
water = wateroccurrence.gte(pctTime)
water = water.updateMask(water.neq(0))

def addArea(feature):
  # returns area +/- 1sqm
  return feature.set({'area':feature.geometry().area(1)})  

regions = water.addBands(wateroccurrence).reduceToVectors(
    reducer=ee.Reducer.min(),
    geometry=box,
    scale=30,
    maxPixels=5e9,
    geometryInNativeProjection=True,
    labelProperty='surfaceWater').map(addArea).sort('area',False);

lake_outline = ee.Feature(regions.first())
aoi = lake_outline
geo = lake_outline.geometry()
coords = geo.getInfo()['coordinates'][0]

watercount = water.reduceRegion(reducer=ee.Reducer.count().unweighted(), 
                                geometry=lake_outline.geometry(),
                                scale=30, bestEffort=True)
totalpixels = watercount.getInfo()["occurrence"]
pixel_min = totalpixels*(pctLakeCoverage/100.0)
print("\n\nTotal # of lake pixels: ", totalpixels);
print(f"{pctLakeCoverage}% of available lake pixels is", int(pixel_min))

pixel_map = folium.Map(location=[center_lat,center_lon],zoom_start=10, width=600, height=600)
basemaps['Google Satellite'].add_to(pixel_map)
pixel_map.add_ee_layer(geo ,{},'pixels over lake')
display(pixel_map)

# Calculate Skin Surface Temperatures
Using all of the information amassed so far, the code blocks in this section perform all necessary steps to get skin temperature.

In [None]:
#@title Landsat functions
#@markdown __Run this block__ to define the functions that pre-process and 
#@markdown atmospherically correct Landsat images to water skin temperature.
def prep4bands(img):
  systime = img.get('system:time_start')
  elev = img.get('SUN_ELEVATION')
  sza = ee.Number(90).subtract(elev)
  uid = img.get('system:index')
  
  radiance = ee.Algorithms.Landsat.calibratedRadiance(img).select(['B6'],['radi'])
  toa = ee.Algorithms.Landsat.TOA(img)
  toa = toa.select(['B[1-6]'],['blue','green','red','nir','swir','temp'])
  
  bt = toa.select(['temp'],['bt']).subtract(273.15)
  
  coeff = ee.Image(coeff4).select([0,1,2,3,4,5,6,7,8],['c11','c12','c13','c21','c22','c23','c31','c32','c33'])
  Bg = ee.Image.constant(1290).select([0],['Bg'])

  #  use the QA band to mask pixels  
  p_qa = img.select(["BQA"])
  p_mask = p_qa.remap(qa_values,qa_values).mask().int8()
  
  both = radiance.addBands(coeff).addBands(Bg).addBands(bt).updateMask(p_mask).copyProperties(img).set('sza',sza).set('uid',uid).set('system:time_start',systime)
  
  return ee.Image(both)

def prep5bands(img):
  systime = img.get('system:time_start')
  elev = img.get('SUN_ELEVATION')
  sza = ee.Number(90).subtract(elev)
  uid = img.get('system:index')
  
  radiance = ee.Algorithms.Landsat.calibratedRadiance(img).select(['B6'],['radi'])
  toa = ee.Algorithms.Landsat.TOA(img)
  toa = toa.select(['B[1-6]'],['blue','green','red','nir','swir','temp'])
  
  bt = toa.select(['temp'],['bt']).subtract(273.15)
  
  coeff = ee.Image(coeff5).select([0,1,2,3,4,5,6,7,8],['c11','c12','c13','c21','c22','c23','c31','c32','c33'])
  Bg = ee.Image.constant(1256).select([0],['Bg'])

  #  use the QA band to mask pixels  
  p_qa = img.select(["BQA"])
  p_mask = p_qa.remap(qa_values,qa_values).mask().int8()
  
  both = radiance.addBands(coeff).addBands(Bg).addBands(bt).updateMask(p_mask).copyProperties(img).set('sza',sza).set('uid',uid).set('system:time_start',systime);
  
  return ee.Image(both)

def prep7bands(img):
  systime = img.get('system:time_start')
  elev = img.get('SUN_ELEVATION')
  sza = ee.Number(90).subtract(elev)
  uid = img.get('system:index')
  
  radiance = ee.Algorithms.Landsat.calibratedRadiance(img).select(['B6_VCID_1'],['radi'])
  toa = ee.Algorithms.Landsat.TOA(img)
  toa = toa.select(['B[1-5]','B6_VCID_1'],['blue','green','red','nir','swir','temp'])
  
  bt = toa.select(['temp'],['bt']).subtract(273.15)
  
  coeff = ee.Image(coeff7).select([0,1,2,3,4,5,6,7,8],['c11','c12','c13','c21','c22','c23','c31','c32','c33'])
  Bg = ee.Image.constant(1277).select([0],['Bg'])
  
  # use the QA band to mask pixels  
  p_qa = img.select(["BQA"])
  p_mask = p_qa.remap(qa_values,qa_values).mask().int8()
  
  both = radiance.addBands(coeff).addBands(Bg).addBands(bt).updateMask(p_mask).copyProperties(img).set('sza',sza).set('uid',uid).set('system:time_start',systime);
  
  return ee.Image(both)

def prep8bands(img):
  systime = img.get('system:time_start')
  elev = img.get('SUN_ELEVATION')
  sza = ee.Number(90).subtract(elev)
  uid = img.get('system:index')
  
  radiance = ee.Algorithms.Landsat.calibratedRadiance(img).select(['B10'],['radi'])
  toa = ee.Algorithms.Landsat.TOA(img)
  toa = toa.select(['B[2-6]','B10'],['blue','green','red','nir','swir','temp'])
  
  bt = toa.select(['temp'],['bt']).subtract(273.15)
  
  coeff = ee.Image(coeff8).select([0,1,2,3,4,5,6,7,8],['c11','c12','c13','c21','c22','c23','c31','c32','c33'])
  Bg = ee.Image.constant(1324).select([0],['Bg'])
  
  # use the QA band to mask pixels  
  p_qa = img.select(["BQA"]);
  p_mask = p_qa.remap(qa_val_8,qa_val_8).mask().int16()
  both = radiance.addBands(coeff).addBands(Bg).addBands(bt).updateMask(p_mask).copyProperties(img).set('sza',sza).set('uid',uid).set('system:time_start',systime);
  
  return ee.Image(both)

def atmosCorr(img):
  img = ee.Image(img)
  systime = img.get('system:time_start')
  crs_out = img.select(0).projection()
  
  radi = img.select('radi')
  bt = img.select('bt')
  Bg = img.select('Bg')
  
  gamma_top = bt.multiply(bt)
  gamma_bot = radi.multiply(Bg)
  gamma = gamma_top.divide(gamma_bot).select([0],['gamma'])
  
  delta_right = gamma_top.divide(Bg)
  delta = bt.subtract(delta_right).select([0],['delta'])
  
  # THIS GETS THE CORRESPONDING VAPOR IMAGE
  v = ee.Image(img.get('vapor')).multiply(0.1).select([0],['vapor'])
  """Since the literature says that the SC algorithm doesn't work well
  with water vapor columns over 2.5g/cm, those pixels are removed"""
  v = v.mask(v.lte(2.5))
  
  img = img.addBands(v)
  
  psi_1 = img.expression(
    '(c1*v*v)+(c2*v)+c3',{
      'c1': img.select('c11'),
      'c2': img.select('c12'),
      'c3': img.select('c13'),
      'v': img.select('vapor')
    }).select([0],['psi_1'])
    
  psi_2 = img.expression(
    '(c1*v*v)+(c2*v)+c3',{
      'c1': img.select('c21'),
      'c2': img.select('c22'),
      'c3': img.select('c23'),
      'v': img.select('vapor')
    }).select([0],['psi_2'])
  
  psi_3 = img.expression(
    '(c1*v*v)+(c2*v)+c3',{
      'c1': img.select('c31'),
      'c2': img.select('c32'),
      'c3': img.select('c33'),
      'v': img.select('vapor')
    }).select([0],['psi_3'])
 
  surface_temp = psi_1.multiply(radi).add(psi_2).multiply(e).add(psi_3).multiply(gamma).add(delta)

  surface_temp = ee.Image(surface_temp).setDefaultProjection(crs_out,None,30)
  surface_temp = ee.Image(surface_temp).select([0],['surface_temp']).copyProperties(img)
  surface_temp = surface_temp.set('system:time_start',systime)

  return surface_temp

print("Functions imported", strftime("%x %X"))

In [None]:
#@title Calculate emissivity
#@markdown __Run this block__ to calculate average emissivity values by using ASTER Global Emissivity Dataset (100m)<br>
#@markdown More information on the emissivity dataset can be found <a href="https://lpdaac.usgs.gov/products/ag100bv003/" target="_blank">here</a>
aster13 = ag100.select(['emissivity_band13']).multiply(0.001).reduceRegion(reducer=ee.Reducer.mean(), geometry=geo, scale=100).get('emissivity_band13')
aster14 = ag100.select(['emissivity_band14']).multiply(0.001).reduceRegion(reducer= ee.Reducer.mean(), geometry= geo, scale=100).get('emissivity_band14')

emissivity = ee.Number.expression('(e13 + e14) / 2', {
  'e13': aster13,
  'e14': aster14
})
e = ee.Number(1).divide(ee.Number(emissivity))
print(f'Average emissivity: {emissivity.getInfo()}')

In [None]:
#@title Filter and Stack Landsat

#@markdown __Run this block__ to filter by total cloud cover, date, site location, 
#@markdown and make sure all landsat scenes are in descending orbit (wrs<234), 
#@markdown then create radiance band, toa brightness temp band in Celcius, & band 
#@markdown coefficients and constants. Stack them together, and make sure the metadata 
#@markdown and system time carries over. For landsat 8, make sure scenes are 
#@markdown nadir and the TIRS algorithm isn't preliminary version.

l4 = (l4t1.merge(l4t2).filterMetadata('DATA_TYPE','equals','L1TP') \
      .filterMetadata('GEOMETRIC_RMSE_MODEL','not_greater_than',rmse) \
      .filterBounds(geo) \
      .filter(ee.Filter.calendarRange(y1,y2,'year')) \
      .filter(ee.Filter.calendarRange(m1,m2,'month')) \
      .filterMetadata('WRS_ROW','not_less_than',r1).filterMetadata('WRS_ROW','not_greater_than',r2) \
      .filterMetadata('WRS_PATH','not_greater_than',p1).filterMetadata('WRS_PATH','not_less_than',p2) \
      .map(prep4bands))

l5 = (l5t1.merge(l5t2).filterMetadata('DATA_TYPE','equals','L1TP') \
      .filterMetadata('GEOMETRIC_RMSE_MODEL','not_greater_than',rmse) \
      .filterBounds(geo) \
      .filter(ee.Filter.calendarRange(y1,y2,'year')) \
      .filter(ee.Filter.calendarRange(m1,m2,'month')) \
      .filterMetadata('WRS_ROW','not_less_than',r1).filterMetadata('WRS_ROW','not_greater_than',r2) \
      .filterMetadata('WRS_PATH','not_greater_than',p1).filterMetadata('WRS_PATH','not_less_than',p2) \
      .map(prep5bands))

l7 = (l7t1.merge(l7t2).filterMetadata('DATA_TYPE','equals','L1TP') \
      .filterMetadata('GEOMETRIC_RMSE_MODEL','not_greater_than',rmse) \
      .filterBounds(geo) \
      .filter(ee.Filter.calendarRange(y1,y2,'year')) \
      .filter(ee.Filter.calendarRange(m1,m2,'month')) \
      .filterMetadata('WRS_ROW','not_less_than',r1).filterMetadata('WRS_ROW','not_greater_than',r2) \
      .filterMetadata('WRS_PATH','not_greater_than',p1).filterMetadata('WRS_PATH','not_less_than',p2) \
      .map(prep7bands))

l8 = (l8t1.merge(l8t2).filterMetadata('NADIR_OFFNADIR','equals','NADIR') \
      .filterMetadata('TIRS_SSM_MODEL','not_equals','PRELIMINARY') \
      .filterMetadata('DATA_TYPE','equals','L1TP') \
      .filterMetadata('GEOMETRIC_RMSE_MODEL','not_greater_than',rmse) \
      .filterBounds(geo) \
      .filter(ee.Filter.calendarRange(y1,y2,'year')) \
      .filter(ee.Filter.calendarRange(m1,m2,'month')) \
      .filterMetadata('WRS_ROW','not_less_than',r1).filterMetadata('WRS_ROW','not_greater_than',r2) \
      .filterMetadata('WRS_PATH','not_greater_than',p1).filterMetadata('WRS_PATH','not_less_than',p2) \
      .map(prep8bands))

# filtering by solar zenith angle is useful in high-latitude areas
landsat = ee.ImageCollection((l4).merge(l5).merge(l7).merge(l8)).filterMetadata('sza','less_than',77).sort('system:time_start')
print("Landsat compiled at", strftime("%x %X"))

In [None]:
#@title Join Landsat scene to water column vapor
#@markdown This connects the <a href="https://doi.org/10.1175/1520-0477(1996)077%3C0437:TNYRP%3E2.0.CO;2" target="_blank">NCEP/NCAR</a> water column vapor image to the Landsat image so that there is a value available for atmospheric water vapor during atmospheric correction.<p>
#@markdown __Run this block__ to use image timestamps to find the vapor image closest in time to 
#@markdown Landsat flyover time. This also uses geometry to find a vapor image 
#@markdown that intersects.
primary = landsat  # primary collection to use
secondary = vapor  # collection used to join to primary
timeDiff = 1000*60*60*6  # maximum time difference of acquisitions in milliseconds (6 hours)
maxDiffFilter = ee.Filter.maxDifference(difference=timeDiff, leftField="system:time_start", rightField="system:time_start")
intersectFilter = ee.Filter.intersects(leftField=".geo",rightField=".geo")  # images must overlap
joinFilter = ee.Filter.And(maxDiffFilter,intersectFilter)
saveFirstJoin = ee.Join.saveBest(matchKey="vapor", measureKey="difference")
joinedVapor = saveFirstJoin.apply(primary,secondary,joinFilter)
print("Landsat and NCEP/NCAR water vapor joined to Landsat:", strftime("%x %X"))

In [None]:
#@title Apply atmospheric correction and get skin temperature
#@markdown __Run this block__ to calculate water body skin temp, populate image
#@markdown metadata, and filter Landsat to include minimum lake coverage
#@markdown (defined in <a href="https://colab.research.google.com/drive/1AJFnCB7B7Uev2-0hqIayOb5fkSB0Xn8v#scrollTo=V1ljgJffQF1Y&line=3&uniqifier=1">this code block</a>)

# Calculate skin temperature
temps = joinedVapor.map(atmosCorr)
print("Landsat skin temperatures calculated:", strftime("%x %X"))

'''
Add metadata to each scene that indicates how many visible pixels were included 
in analysis, and filter images so that only scenes with the minimum number of 
pixels remain.
'''
def countPixels(img):
  img = ee.Image(img)
  getCount = img.reduceRegion(**{
      "reducer": ee.Reducer.count(),
      "geometry": geo, 
      "scale": 30})
  count = ee.Dictionary(getCount).get('surface_temp')
  return img.set('pixel_count',count)

countedPixels = temps.map(countPixels).filterMetadata('pixel_count','not_less_than', pixel_min)
print("Landsat filtered:", strftime("%x %X"))

# Data Exploration
The code blocks in the following section allow you to explore your results using dataframes and pixel-value distributions. You can visually inspect each scene for errant data or export the data to explore further.

In [None]:
#@title Convert data from server-side EE objects to client-side Python objects
#@markdown __Run this block__ to define functions for data exploration
def generate_histogram(img):
  img = ee.Image(img)
  fhisto = img.reduceRegion(**{
      "reducer": ee.Reducer.fixedHistogram(0,25,50).unweighted(),
      "geometry": geo,
      "scale": 30,
      "maxPixels": 5e9
  })
  return img.set('histogram',fhisto.get('surface_temp'))
print("Will generate histograms with a fixed x-axis between 0-25 deg C")

# These functions convert image pixels to numpy arrays
def createArrays(img):
  psi_prop = ee.Image(img).sampleRectangle(region=geo, defaultValue=9999)
  getProp = psi_prop.get("psi_1")
  return img.set("img_array", getProp)

def getArrays(client_img_col):
  list_of_arrays = []
  for img in client_img_col:
    prop = img["properties"]["img_array"]
    a = np.array(prop)
    list_of_arrays.append(a)
  return list_of_arrays

In [None]:
#@markdown This code block executes the functions from the previous block and 
#@markdown converts the image collection from a server-side EE object to a client-side 
#@markdown Python object. This allows iteration over the image collection with a `for` 
#@markdown loop, which provides much more Python functionality.<p><mark>__This step 
#@markdown could take some time to run</mark> depending on the length of the 
#@markdown ImageCollection.__
print("Start:", strftime("%x %X"))
imgs_histo = countedPixels.map(generate_histogram)
generate_arrays = imgs_histo.map(createArrays)

imgCol = generate_arrays.getInfo()["features"]
print("Finish:", strftime("%x %X"))
print("Number of Landsat scenes: ", len(imgCol))

In [None]:
#@markdown Now that the image collection is client-side, __run this block__ to 
#@markdown get arrays of each image.
# returns a list of arrays
imgs_to_arrays = getArrays(imgCol)
print("Converted images to arrays")

In [None]:
#@title View and export collated pixel distributions
#@markdown __Run this block__ to iterate through the image collection and convert 
#@markdown all bins and frequencies to a single Pandas `DataFrame`. 
#@markdown This returns *bins* (in deg C) in the first column, the image date as 
#@markdown all subsequent column headers, and the pixel counts within each bin.

length_plots = len(imgCol)
list_of_dfs = []
list_of_uids = []
# fig = make_subplots(rows=length_plots, cols=1)
for i in imgCol:
  uid = (i['properties']['uid']).split("_")[-1]
  list_of_uids.append(uid)
  histogram = i["properties"].get("histogram")

  a = pd.DataFrame(histogram, columns=["bin",uid]).set_index("bin")
  # fig.add_histogram()
  list_of_dfs.append(a)

appended_dfs = pd.concat(list_of_dfs,ignore_index=False, axis=1)
transposed_dfs = appended_dfs.T
data_table.DataTable(appended_dfs, include_index=True, max_columns=50, max_rows=50, num_rows_per_page=50)

In [None]:
#@markdown __Run this block__ to export the `DataFrame` as a csv to your Google Drive

if not 'output_dir' in globals():
  output_dir = "drive/My Drive/Colab Notebooks/"
out_histo_compiled = os.path.join(output_dir, file_prefix + "_histograms.csv")
appended_dfs.to_csv(out_histo_compiled)
print("Export completed:", out_histo_compiled)

In [None]:
#@title View pixel distribution using `matplotlib`
#@markdown __Run this block__ to generate distribution charts of lake temperature 
#@markdown for each Landsat scene.
print(len(list_of_uids)," histograms will be generated")
print("Start:", strftime("%x %X"))
figure, axis = plt.subplots(len(list_of_uids),1, 
                            figsize=(8,4*len(list_of_uids)),frameon=False, dpi=50)
x = appended_dfs.index
for row in range(0,len(list_of_uids)):
  uid = list_of_uids[row]
  y = appended_dfs[uid]
  axis[row].plot(x,y)
  axis[row].set_title(uid)
plt.show()
print("Finish:", strftime("%x %X"))

##Export to CSV

In [None]:
#@markdown Execute this cell block to export a CSV containing temperature and related image metadata
def add_temp_flag(img):
  stats = img.reduceRegion(**{
      "reducer": reducers,
      "geometry": geo,
      "scale": 30,
      "maxPixels": 5e9
  })

  getMin = ee.Number(ee.Dictionary(stats).get("surface_temp_min"))
  temp_flag = ee.String(getMin.lt(0.0)) # 1=true, 0=false
  # add_flag_temp = ee.Algorithms.If(temp_flag.compareTo("1").eq(0),flags.add("temp_flag"),None)
  return temp_flag

def exportWholeLakeStats(img):
  img = ee.Image(img).clip(geo)

  # get statistics for skin_temperature
  reducers = ee.Reducer.mean().combine(**{
      "reducer2": ee.Reducer.stdDev(), "sharedInputs":True}).combine(**{
      "reducer2": ee.Reducer.minMax(), "sharedInputs":True}).combine(**{
      "reducer2": ee.Reducer.median(**{"maxBuckets":500, "minBucketWidth": 0.125}), "sharedInputs":True}).combine(**{
      "reducer2": ee.Reducer.skew(), "sharedInputs":True}).combine(**{
      "reducer2": ee.Reducer.percentile(**{"percentiles": [25,75], "maxBuckets": 500, "minBucketWidth": 0.125}), "sharedInputs":True}).combine(**{
      "reducer2": ee.Reducer.count(), "sharedInputs":True})

  # retrieve image metadata for output file
  landsattime = img.get('system:time_start')
  cloudcover = img.get('CLOUD_COVER')
  vaporImg = ee.Image(img.get('vapor'))
  vaportime = vaporImg.get('system:time_start')
  esd = img.get("EARTH_SUN_DISTANCE")
  elev = img.get('SUN_ELEVATION')
  azi = img.get('SUN_AZIMUTH')
  sza = img.get('sza')
  count = img.get('pixel_count')
  pctAvail = ee.Number(count).divide(totalpixels)
  flags = ee.List([])

  # retrieve max water column vapor for output file
  vaporMath = ee.Algorithms.If(ee.Algorithms.IsEqual(vaporImg,None),-9999,vaporImg)
  vaporMathg = ee.Image(vaporMath).multiply(0.1)

  getWaterCol = vaporMathg.reduceRegion(
      reducer=ee.Reducer.max(),
      geometry=geo,
      bestEffort=True,
      scale=30
      )
  waterCol = ee.Dictionary(getWaterCol).get('pr_wtr')
  

  # stats = img.reduceRegion(
  #     reducer=ee.Reducer.mean().combine(
  #         reducer2=ee.Reducer.stdDev(),
  #         sharedInputs=True).combine(
  #             reducer2=ee.Reducer.minMax(),
  #             sharedInputs=True
  #         ),
  #   geometry=geo,
  #   scale=30,
  #   maxPixels=5e9
  #   )
  stats = img.reduceRegion(**{
      "reducer": reducers,
      "geometry": geo,
      "scale": 30,
      "maxPixels": 5e9
  })

  getMin = ee.Number(ee.Dictionary(stats).get("surface_temp_min"))
  temp_flag = ee.String(getMin.lt(0.0)) # 1=true, 0=false
  add_flag_temp = ee.Algorithms.If(temp_flag.compareTo("1").eq(0),flags.add("temp_flag"),None)
  
  # lake_mean = stats.get('surface_temp_mean')
  # lake_stdev = stats.get('surface_temp_stdev')
  # zscore = img.subtract(lake_mean).divide(lake_stdev).select([0],['zscore'])

  # avg_zscore = zscore.reduceRegion(
  #     reducer=ee.Reducer.mean().combine(
  #         reducer2=ee.Reducer.minMax(),
  #         sharedInputs=True
  #     ),
  #     geometry=geo,
  #     scale=30,
  #     maxPixels=5e9
  # )

  more_stats = ee.Dictionary({'pixel_count':count,
                              'vapor_time':vaportime,
                              'landsat_time':landsattime,
                              'cloud_cover':cloudcover,
                              'water_column':waterCol,
                              'emiss':emissivity,
                              'elev':elev,
                              'azimuth':azi,
                              'esd':esd,
                              'sza':sza,
                              # 'flags':add_flag_temp,
                              'pct_lake':pctAvail,
                              'l_exceltime':ee.Number(landsattime).divide(1000.0).divide(86400).add(25569),
                              'v_exceltime': ee.Number(vaportime).divide(1000.0).divide(86400).add(25569)
                              # 'zscores': avg_zscore
                              })
  # stats = ee.Dictionary.combine(stats,avg_zscore)
  stats2 = ee.Dictionary.combine(stats, more_stats)
  
  return ee.Feature(None,stats2)

# truncate folder path for ee export to drive
export_folder = output_dir.split("MyDrive/",1)[1] 

export_task = ee.batch.Export.table.toDrive(**{
    'collection': temp_stats,
    'description': (file_prefix + "_temp_stats"),
    "fileFormat": "CSV",
    'folder': export_folder
})

print("Exporting lake-specific 'temp_stats.csv'")
export_task.start()
print('Polling for task (id: {}) at'.format(export_task.id))

while export_task.active():
  print(strftime("%x %X"), export_task.status())
  sleep(10)

print("Finished:", strftime("%x %X"))
print('Export should now be visible in Drive.')

In [None]:
#@markdown XX being edited XX Execute this cell block to export a CSV containing temperature and related image metadata

def exportWholeLakeStats(img):
  img = ee.Image(img).clip(geo)

  # get statistics for skin_temperature
  reducers = ee.Reducer.mean().combine(**{
      "reducer2": ee.Reducer.stdDev(), "sharedInputs":True}).combine(**{
      "reducer2": ee.Reducer.minMax(), "sharedInputs":True}).combine(**{
      "reducer2": ee.Reducer.median(**{"maxBuckets":500, "minBucketWidth": 0.125}), "sharedInputs":True}).combine(**{
      "reducer2": ee.Reducer.skew(), "sharedInputs":True}).combine(**{
      "reducer2": ee.Reducer.percentile(**{"percentiles": [25,75], "maxBuckets": 500, "minBucketWidth": 0.125}), "sharedInputs":True}).combine(**{
      "reducer2": ee.Reducer.count(), "sharedInputs":True})

  # retrieve image metadata for output file
  landsattime = img.get('system:time_start')
  cloudcover = img.get('CLOUD_COVER')
  vaporImg = ee.Image(img.get('vapor'))
  vaportime = vaporImg.get('system:time_start')
  esd = img.get("EARTH_SUN_DISTANCE")
  elev = img.get('SUN_ELEVATION')
  azi = img.get('SUN_AZIMUTH')
  sza = img.get('sza')
  count = img.get('pixel_count')
  pctAvail = ee.Number(count).divide(totalpixels)
  flags = ee.List([])

  # retrieve max water column vapor for output file
  vaporMath = ee.Algorithms.If(ee.Algorithms.IsEqual(vaporImg,None),-9999,vaporImg)
  vaporMathg = ee.Image(vaporMath).multiply(0.1)

  getWaterCol = vaporMathg.reduceRegion(
      reducer=ee.Reducer.max(),
      geometry=geo,
      bestEffort=True,
      scale=30
      )
  waterCol = ee.Dictionary(getWaterCol).get('pr_wtr')
  

  # stats = img.reduceRegion(
  #     reducer=ee.Reducer.mean().combine(
  #         reducer2=ee.Reducer.stdDev(),
  #         sharedInputs=True).combine(
  #             reducer2=ee.Reducer.minMax(),
  #             sharedInputs=True
  #         ),
  #   geometry=geo,
  #   scale=30,
  #   maxPixels=5e9
  #   )
  stats = img.reduceRegion(**{
      "reducer": reducers,
      "geometry": geo,
      "scale": 30,
      "maxPixels": 5e9
  })

  getMin = ee.Number(ee.Dictionary(stats).get("surface_temp_min"))
  temp_flag = ee.String(getMin.lt(0.0)) # 1=true, 0=false
  add_flag_temp = ee.Algorithms.If(temp_flag.compareTo("1").eq(0),flags.add("temp_flag"),None)
  
  # lake_mean = stats.get('surface_temp_mean')
  # lake_stdev = stats.get('surface_temp_stdev')
  # zscore = img.subtract(lake_mean).divide(lake_stdev).select([0],['zscore'])

  # avg_zscore = zscore.reduceRegion(
  #     reducer=ee.Reducer.mean().combine(
  #         reducer2=ee.Reducer.minMax(),
  #         sharedInputs=True
  #     ),
  #     geometry=geo,
  #     scale=30,
  #     maxPixels=5e9
  # )

  more_stats = ee.Dictionary({'pixel_count':count,
                              'vapor_time':vaportime,
                              'landsat_time':landsattime,
                              'cloud_cover':cloudcover,
                              'water_column':waterCol,
                              'emiss':emissivity,
                              'elev':elev,
                              'azimuth':azi,
                              'esd':esd,
                              'sza':sza,
                              # 'flags':add_flag_temp,
                              'pct_lake':pctAvail,
                              'l_exceltime':ee.Number(landsattime).divide(1000.0).divide(86400).add(25569),
                              'v_exceltime': ee.Number(vaportime).divide(1000.0).divide(86400).add(25569)
                              # 'zscores': avg_zscore
                              })
  # stats = ee.Dictionary.combine(stats,avg_zscore)
  stats2 = ee.Dictionary.combine(stats, more_stats)
  
  return ee.Feature(None,stats2)

temp_stats = countedPixels.map(exportWholeLakeStats)

export_task = ee.batch.Export.table.toDrive(**{
    'collection': temp_stats,
    'description': "temp_stats",
    "fileFormat": "CSV",
    'folder': "Colab Notebooks"
})

print("Exporting 'temp_stats.csv'")
export_task.start()
print('Polling for task (id: {}) at'.format(export_task.id))

while export_task.active():
  print(strftime("%x %X"), export_task.status())
  sleep(10)

print("Finished:", strftime("%x %X"))
print('Export should now be visible in Drive.')

# Get Skin Surface Temperature for specific locations
This section extracts skin temperature data for specific locations.

# Pair Landsat with *insitu* data
This next section is optional, but allows you to compare any field data you may have to the Landsat data produced here. This section uses the CSV exported above (`"/content/drive/MyDrive/Colab Notebooks/temp_stats.csv"`)<p>
In order for this to run successfully, your data must be in CSV format and have the following headers/columns:


*  `datetime`: Date and time of each temperature reading with `strftime` format
*  `lat_dd`: latitude in decimal degrees
*  `lon_dd`: longitude in decimal degrees
*  `temp_degC`: an integer or float number representing temperature in degrees Celcius
*  `location`: for in-lake statistics, a column with lake zone names (string format, no special characters); can be all the same for a single output or differentiated by lake zones/sensors

The file may have additional columns, but the above are mandatory fields. 


## Upload CSV of *insitu* data
Choose Option 1 or 2. For more ways to import data, see [here](https://colab.research.google.com/notebooks/io.ipynb)  

In [None]:
#@title Option 1: Link to raw CSV from Github
pasted_path = "https://raw.githubusercontent.com/cherrickunh/ids-ne-lakes/master/data/insitu_temp_data_v2021-05-17.csv?token=AHVGWKVD3LZI3UZOCDOI3ADAX66JW" #@param {type:"string"}
is_df = pd.read_csv(pasted_path)
print("dataframe created")

In [None]:
#@title Option 2: Upload data from your local file system to Colab
#@markdown (temporary while connected to current Colab runtime session)

In [None]:
#@markdown Run this cell to upload a file
uploaded = files.upload()
for fn in uploaded.keys():
  print("Paste this in the box below:\n/content/{name}".format(name=fn))

In [None]:
pasted_path = "/content/all_temp_data_v2021Apr21.csv" #@param {type:"string"}
is_df = pd.read_csv(pasted_path)
print("dataframe created")

## View data on a map

In [None]:
# Coming Soon

## Pair *insitu* data with Landsat data

In [None]:
#@markdown Enter the window of time (in minutes) from Landsat flyover where *insitu* data should be included. For example, `timewindow = 30` will include any data within 60 minutes of Landsat flyover (+/- 30 minutes)
timewindow = 30 #@param {type:"number"}
#@markdown What timezone is your data?
insitu_timezone = "US/Eastern" #@param {type:"string"}
#@markdown Does your data observe Daylight Savings Time?
dst_obs = "no" #@param ["yes","no"]
#@markdown How is your datetime formatted? (https://strftime.org/)
datetime_format = "%Y-%m-%d %H:%M:%S" #@param {type:"string"}

outfile = (file_prefix + '_insitu_landsat_paired.csv')
cwd = output_dir
os.chdir(cwd)

print(f"Time window: +- {timewindow} minutes")
print(f"Time Zone: {insitu_timezone}")
if dst_obs=="no":
  print("insitu data does not observe DST")
else:
  print("insitu data does observe DST")
print("Output file:", os.path.join(cwd,outfile))

In [None]:
#@markdown (If you're unsure of how to format your timezone, run this cell for a list of options)
for each in pytz.all_timezones:
  print(each)

In [None]:
#@markdown The following code block manages differences in time between 
#@markdown UTC (Landsat data) and insitu data. It also confirms how DST 
#@markdown should be considered.
min_ms = timewindow * 60 * 1000.0
insitu_time = pytz.timezone(insitu_timezone)
dst_transitions = insitu_time._utc_transition_times
epoch = insitu_time.localize(datetime.fromtimestamp(0))
utc = pytz.timezone("UTC")

def convert_datetime(dt, dtformat=datetime_format):
# def convert_datetime(dt, dtformat="%Y-%m-%d %H:%M:%S"):
    """converts string format insitu time to epoch ms"""
    dto = datetime.strptime(dt, dtformat)

    if dst_obs=="no":
        dst_dates = []
        for item in dst_transitions:
            if item.year == dto.year:
                dst_dates.append(item)

        if dst_dates[0] <= dto <= dst_dates[1]:
            dto = datetime.strptime(dt, dtformat) + timedelta(hours=1)

    aware = insitu_time.localize(dto)
    return (aware - epoch).total_seconds() * 1000.0

print("Date/time function imported", strftime("%x %X"))

In [None]:
#@markdown The following codeblock compares the datasets and generate statistics 
#@markdown for in-lake data that matches Landsat flyovers.<br> 
#@markdown It also uses a subfolder called 'ancillary' (and creates the folder if needed) 
#@markdown that keeps a record of any insitu data points that are used for each Landsat scene.



print("Start:", strftime("%x %X"))
gee_csv = pd.read_csv((output_dir + '/' + file_prefix + '_temp_stats.csv'))
gee_csv = gee_csv.drop([".geo"], axis=1)
insitu_csv = is_df

insitu_csv["datetime_ms"] = insitu_csv["datetime"].apply(convert_datetime)

for index, each in gee_csv.iterrows():
    scene = gee_csv.loc[index, "system:index"].split("1_")[-1]
    landsattime = gee_csv.loc[index, "landsat_time"]

    same_time = insitu_csv[
        (abs(landsattime - insitu_csv.datetime_ms) <= min_ms)]

    gee_csv.loc[index, "scene"] = scene
    gee_csv.loc[index, "temp_avg"] = same_time["temp_degC"].mean()
    gee_csv.loc[index, "t_stdev"] = same_time["temp_degC"].std()
    gee_csv.loc[index, "depth_avg"] = same_time["depth_m"].mean()
    gee_csv.loc[index, "d_stdev"] = same_time["depth_m"].std()
    gee_csv.loc[index, "temp_med"] = same_time["temp_degC"].median()
    gee_csv.loc[index, "insitu_count"] = same_time.shape[0]

    site_stats = same_time.groupby(['location'])['temp_degC'].agg(['median', 'mean', 'std', 'count'])

    sites = site_stats.axes[0]
    stats = site_stats.axes[1]

    for site in sites:
        for stat in stats:
            newcol = "{0}_{1}".format(str(site), str(stat))
            gee_csv.loc[index, newcol] = site_stats[stat][site].item()
    
    if same_time.shape[0] > 0:
        if not os.path.exists(os.path.join(cwd, 'ancillary')):
            os.makedirs(os.path.join(cwd, 'ancillary'))
        same_time.to_csv(os.path.join(cwd, 'ancillary', scene + ".csv"))

out_csv = gee_csv[gee_csv["insitu_count"] > 0]
out_csv.to_csv(os.path.join(cwd, outfile))
("Finished:", strftime("%x %X"))