# Notebook to drive Vegetation Edge extraction from Satellite Images
The programming language we are using is called Python. The code has all been written and this notebook will guide you through modifying the analysis for your own area of interest, and executing the analysis.

**To run a code block, click in a cell, hold down shift, and press enter.** An asterisk in square brackets `In [*]:` will appear while the code is being executed, and this will change to a number `In [1]:` when the code is finished. *The order in which you execute the code blocks matters, they must be run in sequence.*

Inside blocks of python code there are comments indicated by lines that start with `#`. These lines are not computer code but rather comments providing information about what the code is doing to help you follow along and troubleshoot. 

Before we get started we need to tell python to import the tools we want to use (these are called modules):

In [None]:
# Imports modules
import os, sys, glob, pickle, warnings, matplotlib, ee
from pathlib import Path
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import geopandas as gpd
from Toolshed import Download, Toolbox, VegetationLine, Plotting, Transects

# Initialise plotting environment and earth engine
warnings.filterwarnings("ignore")
matplotlib.use('Qt5Agg')
plt.ion()
sns.set()
ee.Initialize()

## Name your project
Start by naming your project, then we'll set up a folder structure to store the new data, files and outputs created by the analysis

In [None]:
# DEFINE YOUR PROJECT NAME HERE e.g. based on your chosen site
sitename = 'SITENAME'

# directory where the data will be stored
filepath = os.path.join(os.getcwd(), 'Data')
if os.path.isdir(filepath) is False:
    os.mkdir(filepath)

# directory where outputs will be stored
direc = os.path.join(filepath, sitename)
if os.path.isdir(direc) is False:
    os.mkdir(direc)

## Define Area of Interest
Area of Interest (AOI) can be defined using a pre-existing shapefile (e.g. by drawing a box within a GIS). If using a shapefile this should only contain one singlepart shape and record (the use of multiple shapes for iteration may be added in the future). Alternatively you can provide the coordinates of the four corners of a bounding box. The shapefile or coordinates should be provided in latitudes and longitudes (i.e. WGS1984, EPSG code 4326).

In [None]:
# shapefile name here if you have one, this should only contain 1 single part shape/record
AOIfilename = "../Musselburgh/AOI.shp"
BLfilename = "../Musselburgh/Baseline.shp"

# Check if it exists and read the shape if so
if Path(AOIfilename).exists():
    AOI = gpd.read_file(AOIfilename)
    if not AOI.crs == "epsg:4326":
        sys.exit("Wrong Coordinate System")
    
    # get minimum bounding box
    lonmin, latmin, lonmax, latmax = AOI.bounds.iloc[0]    

else:
    print("No file found")
    # Define AOI using coordinates of a rectangle
    # The points represent the corners of a bounding box that go around your site
    lonmin, lonmax = -2.84869, -2.79878
    latmin, latmax = 56.32641, 56.39814

# setup an AOI object
UTM_epsg = (int)(Toolbox.get_UTMepsg_from_wgs((latmin+latmax)/2, (lonmin+lonmax)/2))
polygon, point = Toolbox.AOI(lonmin, lonmax, latmin, latmax, sitename, UTM_epsg)

# it's recommended to convert the polygon to the smallest rectangle (sides parallel to coordinate axes)
### why not just call this at the end of the AOI function then?
polygon = Toolbox.smallest_rectangle(polygon)

## Image settings
In order to analyse a timeseries of satellite images, we need to define a range of time over which to perform the analysis. We also need to define which satellites we wish to work with, where `L5` is Landsat 5, `L8` is Landsat 8, and `S2` is Sentinel 2.

In [None]:
# Image Settings
# date range
StartDate = '2020-01-01'
EndDate = '2023-01-01'
dates = [StartDate, EndDate]
if len(dates)>2:
    daterange='no'
else:
    daterange='yes'
years = list(Toolbox.daterange(datetime.strptime(dates[0],'%Y-%m-%d'), datetime.strptime(dates[-1],'%Y-%m-%d')))

# satellite missions
# Input a list of containing any/all of 'L5', 'L8', 'S2'
sat_list = ['L5','L8','S2']

projection_epsg = 27700 # OSGB 1936 # THIS WILL ALSO BE OBSELETE?
# image_epsg = 32630 # UTM Zone 30N THIS IS NOW OBSELETE AS DEFINED ABOVE

# put all the inputs into a dictionnary
inputs = {'polygon': polygon, 'dates': dates, 'daterange':daterange, 'sat_list': sat_list, 'sitename': sitename, 'filepath':filepath}


### Check image availability
Before we start processing any imagery, we'll check what images are available in the date range specified for the desired platforms. Note that the tool does not download any imagery since it is all processed in Google Earth Engine.

In [None]:
# before downloading the images, check how many images are available for your inputs
Download.check_images_available(inputs)

In [None]:
# Image Metadata Retrieval
Sat = Toolbox.image_retrieval(inputs)
metadata = Toolbox.metadata_collection(inputs, Sat)

In [None]:
# Vegetation Edge Settings

BasePath = 'Data/' + sitename + '/Veglines'

if os.path.isdir(BasePath) is False:
    os.mkdir(BasePath)

settings = {
    
    # General parameters:
    'cloud_thresh': 0.5,        # threshold on maximum cloud cover
    'output_epsg': UTM_epsg,  # epsg code of spatial reference system desired for the output   
    'wetdry':True,              # extract wet-dry boundary as well as veg
    
    # Quality control:
    'check_detection': False,    # if True, shows each shoreline detection to the user for validation
    'adjust_detection': False,  # if True, allows user to adjust the postion of each shoreline by changing the threhold
    'save_figure': False,        # if True, saves a figure showing the mapped shoreline for each image
    
    # [ONLY FOR ADVANCED USERS] shoreline detection parameters:
    'min_beach_area': 200,     # minimum area (in metres^2) for an object to be labelled as a beach
    'buffer_size': 250,         # radius (in metres) for buffer around sandy pixels considered in the shoreline detection
    'min_length_sl': 500,       # minimum length (in metres) of shoreline perimeter to be valid
    'cloud_mask_issue': False,  # switch this parameter to True if sand pixels are masked (in black) on many images  
    
    # add the inputs defined previously
    'inputs': inputs,
    'projection_epsg': projection_epsg,
    'year_list': years
}


### Reference Shoreline
Information goes here to explain the reference shoreline

In [None]:
# Vegetation Edge Reference Line Load-In
referenceLine, ref_epsg = Toolbox.ProcessRefline(BLfilename,settings)
DF = gpd.read_file(BLfilename)

# update settings with reference line info
settings['reference_shoreline'] = referenceLine
settings['ref_epsg'] = ref_epsg
settings['max_dist_ref'] = 250 # Distance to buffer reference line by (this is in metres)

In [None]:
# Vegetation Line Extraction

"""
OPTION 1: Run extraction tool and return output dates, lines, filenames and 
image properties.
"""

clf_model = 'L5L8S2_SAVI_MLPClassifier_Veg.pkl' 
output, output_latlon, output_proj = VegetationLine.extract_veglines(metadata, settings, polygon, dates, clf_model)

### cant run this currently due to lack of tides.

In [None]:
# Vegetation Line Extraction Load-In

"""
OPTION 2: Load in pre-existing output dates, lines, filenames and image properties.
"""

SiteFilepath = os.path.join(inputs['filepath'], sitename)
with open(os.path.join(SiteFilepath, sitename + '_output.pkl'), 'rb') as f:
    output = pickle.load(f)
with open(os.path.join(SiteFilepath, sitename + '_output_latlon.pkl'), 'rb') as f:
    output_latlon = pickle.load(f)
with open(os.path.join(SiteFilepath, sitename + '_output_proj.pkl'), 'rb') as f:
    output_proj = pickle.load(f)
    

In [None]:
# Remove duplicate date lines (images taken on the same date by the same satellite)

output = Toolbox.remove_duplicates(output) 
output_latlon = Toolbox.remove_duplicates(output_latlon)
output_proj = Toolbox.remove_duplicates(output_proj)

In [None]:
# Save the veglines as shapefiles locally

Toolbox.SaveConvShapefiles(output, BasePath, sitename, settings['projection_epsg'])
if settings['wetdry'] == True:
    Toolbox.SaveConvShapefiles_Water(output, BasePath, sitename, settings['projection_epsg'])

## Transect-based Analyses

In [None]:
# Create shore-normal transects
SmoothingWindowSize = 21 
NoSmooths = 100
TransectSpacing = 10
DistanceInland = 100
DistanceOffshore = 350

BasePath = 'Data/' + sitename + '/veglines'
VeglineShp = glob.glob(BasePath+'/*veglines.shp')
VeglineGDF = gpd.read_file(VeglineShp[0])
WaterlineShp = glob.glob(BasePath+'/*waterlines.shp')
WaterlineGDF = gpd.read_file(WaterlineShp[0])

# Produce Transects for the reference line
TransectSpec =  os.path.join(BasePath, sitename+'_Transects.shp')

if os.path.isfile(TransectSpec) is False:
    TransectGDF = Transects.ProduceTransects(SmoothingWindowSize, NoSmooths, TransectSpacing, DistanceInland, DistanceOffshore, settings['output_epsg'], sitename, BasePath, referenceLineShp)
else:
    print('Transects already exist and were loaded')
    TransectGDF = gpd.read_file(TransectSpec)

In [None]:
# Create (or load) intersections with sat and validation lines per transect

if os.path.isfile(os.path.join(filepath, sitename, sitename + '_transect_intersects.pkl')):
    print('TransectDict exists and was loaded')
    with open(os.path.join(filepath , sitename, sitename + '_transect_intersects.pkl'), 'rb') as f:
        TransectDict, TransectInterGDF = pickle.load(f)
else:
    # Get intersections
    TransectDict = Transects.GetIntersections(BasePath, TransectGDF, VeglineGDF)
    # Save newly intersected transects as shapefile
    TransectInterGDF = Transects.SaveIntersections(TransectDict, VeglineGDF, BasePath, sitename, settings['projection_epsg'])
    # Repopulate dict with intersection distances along transects normalised to transect midpoints
    TransectDict = Transects.CalculateChanges(TransectDict,TransectInterGDF)
    if settings['wetdry'] == True:
        beachslope = 0.02 # tanBeta StAnd W
        # beachslope = 0.04 # tanBeta StAnE
        TransectDict = Transects.GetBeachWidth(BasePath, TransectGDF, TransectDict, WaterlineGDF, settings, output, beachslope)  
        TransectInterGDF = Transects.SaveWaterIntersections(TransectDict, WaterlineGDF, TransectInterGDF, BasePath, sitename, settings['projection_epsg'])
    
    with open(os.path.join(filepath , sitename, sitename + '_transect_intersects.pkl'), 'wb') as f:
        pickle.dump([TransectDict,TransectInterGDF], f)

## Validation

In [None]:
# Validation of veglines against pre-existing ground surveys shapefile

# Name of date column in validation shapefile (case sensitive!) 
DatesCol = 'Date'

ValidationShp = './Validation/StAndrews_Veg_Edge_combined_2007_2022_singlepart.shp'
validpath = os.path.join(os.getcwd(), 'Data', sitename, 'validation')

if os.path.isfile(os.path.join(validpath, sitename + '_valid_dict.pkl')):
    print('ValidDict exists and was loaded')
    with open(os.path.join(validpath, sitename + '_valid_dict.pkl'), 'rb') as f:
        ValidDict = pickle.load(f)
else:
    ValidDict = Transects.ValidateSatIntersects(sitename, ValidationShp, DatesCol, TransectGDF, TransectDict)
    with open(os.path.join(validpath, sitename + '_valid_dict.pkl'), 'wb') as f:
        pickle.dump(ValidDict, f)


In [None]:
# Quantify errors between validation and satellite derived lines

# add tuples of first and last transect IDs desired for quantifying positional errors on
TransectIDList = [(0,10),(50,100)] 

for TransectIDs in TransectIDList:
    Toolbox.QuantifyErrors(sitename, VeglineShp[0],'dates',ValidDict,TransectIDs)

## Plotting

In [None]:
# Create GIF of satellite images and related shorelines

Plotting.SatGIF(metadata,settings,output)


In [None]:
# Validation Plots

# add tuples of first and last transect IDs desired for plotting positional errors of
TransectIDList = [(0,1741)] 

for TransectIDs in TransectIDList:
    PlotTitle = 'Accuracy of Transects ' + str(TransectIDs[0]) + ' to ' + str(TransectIDs[1])
    Plotting.SatViolin(sitename,VeglineShp[0],'dates',ValidDict,TransectIDs, PlotTitle)
    

In [None]:
# Weighted Peaks threshold values violin plot
sites = [sitename]
Plotting.ThresholdViolin(filepath, sites)


In [None]:
# Violin plot of validation vs satellite distances per satellite platform name
TransectIDs = (0,len(ValidDict['dates'])) # full site
Plotting.PlatformViolin(sitename, VeglineShp, 'satname', ValidDict, TransectIDs, 'Full Site Accuracy')


In [None]:
# Validation vs satellite cross-shore distance through time
TransectIDs = [0, 10, 50]
for TransectID in TransectIDs:
    Plotting.ValidTimeseries(sitename, ValidDict, TransectID)


In [None]:
# Satellite cross-shore distance through time
TransectIDs = [289,1575]
for TransectID in TransectIDs:
    DateRange = [0,len(TransectDict['dates'][TransectID])] # integers to decide where in time you want to plot
    Plotting.VegTimeseries(sitename, TransectDict, TransectID, DateRange)