<font size="5"><center> <b>Sandpyper: sandy beaches SfM-UAV analysis tools</b></center></font>
<font size="4"><center> <b> Example 4 - Hotspot analysis </b></center> <br>

    
<center><img src="images/banner.png" width="80%"  /></center>

<font face="Calibri">
<br>
<font size="5"> <b>Local Indicator of Spatial Association: hotspot analysis and transient sates</b></font>

<br>
<font size="4"> <b> Nicolas Pucino; PhD Student @ Deakin University, Australia </b> <br>

<font size="3">This notebook illustrates how to use Local Moran's I, a popular local indicator of spatial association (LISA) to capture the most signifcant areas of change within the changing beachface in every timestep. This allows us to focus on the most important areas of change, disregarding spatial outilers, to then classify magnitude of changes into relevant transient states, which will be used to model behaviours at the location and transect scales. <br>

<b>This notebook covers the following concepts:</b>

- Location-level Local Moran's I.
- Magnitude classification.
- Transient states definition.
</font>


</font>

In [2]:
from pysal.viz.mapclassify import UserDefined
import pysal.viz.mapclassify as mc

import pandas as pd
import numpy as np
from tqdm import tqdm

from sandpyper.hotspot import LISA_site_level

pd.options.mode.chained_assignment = None  # default='warn'

## Load multitemporal datased (dh)

In [3]:
dh_file_path=r"C:\my_packages\sandpyper\tests\test_outputs\dh_data.csv"

## Compute location level hotspot 

In [4]:
crs_dict_string={"mar":{'init': 'epsg:32754'},
         "leo":{'init': 'epsg:32755'}}

The function __LISA_site_level__ perform a Local Moran's I with False Discovery Rate (fdr) correction analysis for all the elevation change points in each survey in the dh_df table.

This function can use KNN-based, inverse distance weighted or binary distance-based spatial weight matrices. In this example, we model spatial relationships with a distance-based row standardised binary weight matrix with neighborhood radius of 35 m, in order to include two adjacent transect and some points from obliques without getting too far from the focal point.

we obtain a dataset containing the fdr threshold, local moran-s Is, p and z values and the quadrant in which each observation falls in a Moran's scatter plot, which represent High-High (HH, hotspot cluster), High-Low (HL, spatial outlier), Low-Low (LL, coldspot cluster) and Low-High (LH, spatial outlier) points.

We are interested in HH and LL clusters, which we generally call hotspots, and discard LH and HL points are sptial outliers.

In [5]:
distance_value=35 #enough to include two adjacent transect and some obliques without getting to the second transect
k_value=0
mode="distance" #select from "knn", "idw" or "distance"

In [6]:
%%time

lisa_df=LISA_site_level(dh_path=dh_file_path,
                        mode=mode,
                        distance_value=35,
                        unique_field="geometry",
                        crs_dict_string=crs_dict_string)

lisa_df.tail()

  0%|          | 0/2 [00:00<?, ?it/s]

Working on mar


  return _prepare_from_string(" ".join(pjargs))


  0%|          | 0/8 [00:00<?, ?it/s]

Working on leo


  return _prepare_from_string(" ".join(pjargs))


  0%|          | 0/5 [00:00<?, ?it/s]

Wall time: 50.5 s


Unnamed: 0,geometry,location,tr_id,distance,dt,date_pre,date_post,z_pre,z_post,dh,lisa_fdr,lisa_q,lisa_I,lisa_n_val_obs,lisa_opt_dist,lisa_dist_mode,lisa_p_sim,lisa_z_sim,lisa_z,decay
19862,POINT (300071.060 5773184.013),leo,18,48.0,dt_0,20180606,20180713,-0.435838,-0.624926,-0.189088,0.008598,2,-0.031762,1448,35,distance_band,0.239,-0.598137,-0.419606,0
19863,POINT (300072.023 5773184.284),leo,18,49.0,dt_0,20180606,20180713,-0.43108,-0.642019,-0.210939,0.008598,2,-0.035211,1448,35,distance_band,0.246,-0.57414,-0.461142,0
19864,POINT (300075.506 5773164.488),leo,17,47.0,dt_0,20180606,20180713,-0.316839,-0.478775,-0.161936,0.008598,3,0.010098,1448,35,distance_band,0.478,0.137724,-0.367991,0
19865,POINT (300076.469 5773164.759),leo,17,48.0,dt_0,20180606,20180713,-0.459795,-0.47709,-0.017296,0.008598,3,0.001456,1448,35,distance_band,0.49,0.086071,-0.093039,0
19866,POINT (300077.432 5773165.029),leo,17,49.0,dt_0,20180606,20180713,-0.557283,-0.493576,0.063707,0.008598,4,-0.0008,1448,35,distance_band,0.491,-0.118921,0.060942,0


In [7]:
#lisa_df.to_csv(r"C:\my_packages\doc_data\profiles\lisa_location.csv")

## Classify dh magnitudes and create classes of elevation changes (transient states)

Now that we have statistically significant hotspots of change, we can use those points to capture the most interesting areas of beachface change which we use to model behaviour.

However, there is a very important point to consider, which is relevant for our example.<br>
Our beachfaces are narrow, and we use only reliable valid points using multiple levels of filtering (LoD, beachface area, sand-only), which significantly reduce the total number of usable points in each timestep.
Because in our next step we will compute the Beachface Cluster Dynamics indices both at the location and transect scales, we need to create two different dataframes and (slightly different) transient-states classes:

* hotspot-filtered: we discard spatial outliers to capture location scale behaviour. Used for location scale BCDs.
* full: we disregard hotspot classification and retain all points beyond LoD. Used for transect scale BCDs.

This is necessary to assure that we have enough points in each transect to model their behaviours. Moreover, the hotspot classification has been run at the location scale. Therefore, the HH and LL hotspots are only relevant for location-scale BCD analysis.

In [8]:
lisa_df=pd.read_csv(r"C:\my_packages\doc_data\hotspots\lisa_location.csv")

### Location-scale

Use only hotspots beyond LoD

In [9]:
sig_hhll=lisa_df.query("lisa_p_sim <= 0.001 & lisa_q in [1,3]") # discard spatial outliers
sig_hhll= sig_hhll[~sig_hhll['dh'].between(-0.05, 0.05)]  # discard points within global limit of detection (5 cm)
sig_hhll.index=(range(sig_hhll.shape[0]))

In [10]:
sig_hhll.head()

Unnamed: 0.1,Unnamed: 0,geometry,location,tr_id,distance,dt,date_pre,date_post,z_pre,z_post,...,lisa_fdr,lisa_q,lisa_I,lisa_n_val_obs,lisa_opt_dist,lisa_dist_mode,lisa_p_sim,lisa_z_sim,lisa_z,decay
0,79,POINT (299913.6160173207 5773633.211653235),leo,66,25.9,dt_5,2019-03-28,2019-07-31,0.254045,0.063604,...,0.048329,3,0.06488,1526,35,distance_band,0.001,2.855193,-0.324401,0
1,82,POINT (299913.3229148819 5773633.14769262),leo,66,26.2,dt_5,2019-03-28,2019-07-31,0.28215,0.101436,...,0.048329,3,0.052581,1526,35,distance_band,0.001,2.782864,-0.262422,0
2,87,POINT (299912.8344108172 5773633.041091593),leo,66,26.7,dt_5,2019-03-28,2019-07-31,0.334896,0.155535,...,0.048329,3,0.050868,1526,35,distance_band,0.001,2.748782,-0.253808,0
3,94,POINT (299912.1505051267 5773632.891850157),leo,66,27.4,dt_5,2019-03-28,2019-07-31,0.422973,0.256882,...,0.048329,3,0.034007,1526,35,distance_band,0.001,2.681361,-0.169253,0
4,110,POINT (299910.1964888682 5773632.465446051),leo,66,29.4,dt_5,2019-03-28,2019-07-31,0.664177,0.523974,...,0.048329,3,0.000868,1526,35,distance_band,0.001,2.835987,-0.004299,0


separate the erosion and the deposition clusters

In [11]:
sig_hhll_ero=sig_hhll[sig_hhll.dh < 0]
sig_hhll_depo=sig_hhll[sig_hhll.dh > 0]

In [12]:
sig_hhll_ero.head()

Unnamed: 0.1,Unnamed: 0,geometry,location,tr_id,distance,dt,date_pre,date_post,z_pre,z_post,...,lisa_fdr,lisa_q,lisa_I,lisa_n_val_obs,lisa_opt_dist,lisa_dist_mode,lisa_p_sim,lisa_z_sim,lisa_z,decay
0,79,POINT (299913.6160173207 5773633.211653235),leo,66,25.9,dt_5,2019-03-28,2019-07-31,0.254045,0.063604,...,0.048329,3,0.06488,1526,35,distance_band,0.001,2.855193,-0.324401,0
1,82,POINT (299913.3229148819 5773633.14769262),leo,66,26.2,dt_5,2019-03-28,2019-07-31,0.28215,0.101436,...,0.048329,3,0.052581,1526,35,distance_band,0.001,2.782864,-0.262422,0
2,87,POINT (299912.8344108172 5773633.041091593),leo,66,26.7,dt_5,2019-03-28,2019-07-31,0.334896,0.155535,...,0.048329,3,0.050868,1526,35,distance_band,0.001,2.748782,-0.253808,0
3,94,POINT (299912.1505051267 5773632.891850157),leo,66,27.4,dt_5,2019-03-28,2019-07-31,0.422973,0.256882,...,0.048329,3,0.034007,1526,35,distance_band,0.001,2.681361,-0.169253,0
4,110,POINT (299910.1964888682 5773632.465446051),leo,66,29.4,dt_5,2019-03-28,2019-07-31,0.664177,0.523974,...,0.048329,3,0.000868,1526,35,distance_band,0.001,2.835987,-0.004299,0


In [13]:
sig_hhll_depo.head()

Unnamed: 0.1,Unnamed: 0,geometry,location,tr_id,distance,dt,date_pre,date_post,z_pre,z_post,...,lisa_fdr,lisa_q,lisa_I,lisa_n_val_obs,lisa_opt_dist,lisa_dist_mode,lisa_p_sim,lisa_z_sim,lisa_z,decay
694,1154,POINT (300010.6476322889 5773309.419492467),leo,49,21.7,dt_5,2019-03-28,2019-07-31,0.323719,0.392796,...,0.048329,1,1.274355,1526,35,distance_band,0.001,15.293572,1.329184,0
695,1155,POINT (300010.5545458604 5773309.382955953),leo,49,21.8,dt_5,2019-03-28,2019-07-31,0.333772,0.419623,...,0.048329,1,1.376144,1526,35,distance_band,0.001,15.741413,1.436061,0
696,1156,POINT (300010.4614594319 5773309.34641944),leo,49,21.9,dt_5,2019-03-28,2019-07-31,0.346146,0.443746,...,0.048329,1,1.447386,1526,35,distance_band,0.001,15.68725,1.510926,0
697,1157,POINT (300010.3683730034 5773309.309882926),leo,49,22.0,dt_5,2019-03-28,2019-07-31,0.354768,0.45819,...,0.048329,1,1.482669,1526,35,distance_band,0.001,15.896007,1.548023,0
698,1158,POINT (300010.2752865748 5773309.273346413),leo,49,22.1,dt_5,2019-03-28,2019-07-31,0.366984,0.471483,...,0.048329,1,1.489197,1526,35,distance_band,0.001,15.833367,1.554888,0


Use optimised Jenks-Caspall natural breaks to classify the absolute change

In [14]:
absolute=np.abs(sig_hhll.dh)
bins_abs_JC=mc.JenksCaspall(absolute)
print(f"Fit of the classifier: {bins_abs_JC.adcm}")

bins_abs_JC

Fit of the classifier: 1106.8712480317358


              JenksCaspall              
 
Lower          Upper               Count
        x[i] <= 0.170               6568
0.170 < x[i] <= 0.324               5265
0.324 < x[i] <= 0.537               3744
0.537 < x[i] <= 1.213               2073
1.213 < x[i] <= 4.973                737

Based on the above, classify the dh values into magnitude of change classes, which will become transient states.

In [15]:
# Hotspot filtered depositional and erosional classes
bins_depo = [0.17, 0.32,0.54,1.21] 
bins_ero = [-1.21,-0.54,-0.32,-0.17]

bins_ero_JC = UserDefined(sig_hhll_ero.dh, bins_ero)
bins_depo_JC = UserDefined(sig_hhll_depo.dh, bins_depo)

class_erosion=bins_ero_JC.yb.tolist()
class_deposition=bins_depo_JC.yb.tolist()

In [18]:
# assign every bin to the right label

states_ero={0:"ee",1:"he",2:"me",3:"se",4:"ue"}   
states_depo={0:"ud",1:"sd",2:"md",3:"hd",4:"ed"}

tags_erosion=[states_ero[i] for i in class_erosion]
tags_deposition=[states_depo[i] for i in class_deposition]

sig_hhll_ero["jc_bin"]=class_erosion
sig_hhll_depo["jc_bin"]=class_deposition

sig_hhll_ero["markov_tag"]=tags_erosion
sig_hhll_depo["markov_tag"]=tags_deposition

labelled_hotspot_df=pd.concat([sig_hhll_ero,sig_hhll_depo],ignore_index=False)
labelled_hotspot_df.head()

Unnamed: 0.1,Unnamed: 0,geometry,location,tr_id,distance,dt,date_pre,date_post,z_pre,z_post,...,lisa_I,lisa_n_val_obs,lisa_opt_dist,lisa_dist_mode,lisa_p_sim,lisa_z_sim,lisa_z,decay,jc_bin,markov_tag
0,79,POINT (299913.6160173207 5773633.211653235),leo,66,25.9,dt_5,2019-03-28,2019-07-31,0.254045,0.063604,...,0.06488,1526,35,distance_band,0.001,2.855193,-0.324401,0,3,se
1,82,POINT (299913.3229148819 5773633.14769262),leo,66,26.2,dt_5,2019-03-28,2019-07-31,0.28215,0.101436,...,0.052581,1526,35,distance_band,0.001,2.782864,-0.262422,0,3,se
2,87,POINT (299912.8344108172 5773633.041091593),leo,66,26.7,dt_5,2019-03-28,2019-07-31,0.334896,0.155535,...,0.050868,1526,35,distance_band,0.001,2.748782,-0.253808,0,3,se
3,94,POINT (299912.1505051267 5773632.891850157),leo,66,27.4,dt_5,2019-03-28,2019-07-31,0.422973,0.256882,...,0.034007,1526,35,distance_band,0.001,2.681361,-0.169253,0,4,ue
4,110,POINT (299910.1964888682 5773632.465446051),leo,66,29.4,dt_5,2019-03-28,2019-07-31,0.664177,0.523974,...,0.000868,1526,35,distance_band,0.001,2.835987,-0.004299,0,4,ue


In [21]:
labelled_hotspot_df.columns

Index(['Unnamed: 0', 'geometry', 'location', 'tr_id', 'distance', 'dt',
       'date_pre', 'date_post', 'z_pre', 'z_post', 'dh', 'lisa_fdr', 'lisa_q',
       'lisa_I', 'lisa_n_val_obs', 'lisa_opt_dist', 'lisa_dist_mode',
       'lisa_p_sim', 'lisa_z_sim', 'lisa_z', 'decay', 'jc_bin', 'markov_tag'],
      dtype='object')

The __labelled_hotspot_df__ dataframe holds all information about the LISA analysis for each point, including:
* Moran's scatterplot quadrant (HH, HL, LL, LH) (lisa_q)
* Local Moran's I (lisa_I)
* False discovery rate threshold (lisa_fdr)
* Simulated pseudo p-value and z-value (lisa_p_sim, lisa_z_value)

Moreover, the transition states, absed on the dh classification, are stored in the __markov_tag__ column.

### Transect-scale

We do the same procedure as for the location-scale, but withouth filtering for significant clusters of change (hot and coldspots).

Use everything beyond LoD

In [25]:
dh_df=pd.read_csv(dh_file_path)

In [26]:
dh_df= dh_df[~dh_df['dh'].between(-0.05, 0.05)]  # discard points within global limit of detection (5 cm)
dh_df.index=(range(dh_df.shape[0]))

separate the erosion and the deposition clusters

In [27]:
full_dh_ero=dh_df[dh_df.dh < 0]
full_dh_depo=dh_df[dh_df.dh > 0]

Use optimised Jenks-Caspall natural breaks to classify the absolute change

In [28]:
absolute_fulldh=np.abs(dh_df.dh)
bins_abs_JC_fulldh=mc.JenksCaspall(absolute_fulldh)
print(f"Fit of the classifier: {bins_abs_JC_fulldh.adcm}")

bins_abs_JC_fulldh

Fit of the classifier: 1669.4758029968957


               JenksCaspall              
 
Lower          Upper                Count
        x[i] <= 0.154               11583
0.154 < x[i] <= 0.295                8583
0.295 < x[i] <= 0.492                6197
0.492 < x[i] <= 1.154                3662
1.154 < x[i] <= 4.973                1067

Based on the above, classify the dh values into magnitude of change classes, which will become transient states.

In [29]:
bins_depo_fulldh = [0.15, 0.29,0.49,1.54] 
bins_ero_fulldh = [-1.54,-0.49,-0.29,-0.15]

bins_ero_JC_fulldh = UserDefined(full_dh_ero.dh, bins_ero_fulldh)
bins_depo_JC_fulldh = UserDefined(full_dh_depo.dh, bins_depo_fulldh)

class_erosion_fulldh=bins_ero_JC_fulldh.yb.tolist()
class_deposition_fulldh=bins_depo_JC_fulldh.yb.tolist()

In [30]:
# assign every bin to the right label (see table 1)

states_ero={0:"ee",1:"he",2:"me",3:"se",4:"ue"}   
states_depo={0:"ud",1:"sd",2:"md",3:"hd",4:"ed"}

tags_erosion_fulldh=[states_ero[i] for i in class_erosion_fulldh]
tags_deposition_fulldh=[states_depo[i] for i in class_deposition_fulldh]

full_dh_ero["jc_bin"]=class_erosion_fulldh
full_dh_depo["jc_bin"]=class_deposition_fulldh

full_dh_ero["markov_tag"]=tags_erosion_fulldh
full_dh_depo["markov_tag"]=tags_deposition_fulldh

labelled_fulldh_df=pd.concat([full_dh_ero,full_dh_depo],ignore_index=False)
labelled_fulldh_df.head()

Unnamed: 0,geometry,location,tr_id,distance,dt,date_pre,date_post,z_pre,z_post,dh,jc_bin,markov_tag
0,POINT (299901.7782793006 5773692.070767866),leo,69,26.8,dt_5,2019-03-28,2019-07-31,0.223866,0.112838,-0.111029,4,ue
1,POINT (299901.4828779417 5773692.018441608),leo,69,27.1,dt_5,2019-03-28,2019-07-31,0.25689,0.152005,-0.104885,4,ue
2,POINT (299901.2859437025 5773691.983557437),leo,69,27.3,dt_5,2019-03-28,2019-07-31,0.283333,0.17671,-0.106624,4,ue
3,POINT (299901.1874765829 5773691.966115351),leo,69,27.4,dt_5,2019-03-28,2019-07-31,0.305573,0.188821,-0.116752,4,ue
4,POINT (299901.0890094633 5773691.948673265),leo,69,27.5,dt_5,2019-03-28,2019-07-31,0.329158,0.197999,-0.13116,4,ue


## Save both datasets

Now that we have the "markov tagged" dataset, we can model beahviour at both location and transect scales!

In [31]:
# these dataframes are ready for BCD indices computation

#labelled_hotspot_df.to_csv(r"C:\my_packages\doc_data\markov_tagged\markov_tagged_df.csv")
#labelled_fulldh_df.to_csv(r"C:\my_packages\doc_data\markov_tagged\markov_tagged_fulldh_df.csv")

____