<font size="5"><center> <b>Sandpyper: sandy beaches SfM-UAV analysis tools</b></center></font>
<font size="4"><center> <b> Example 4 - Hotspot analysis </b></center> <br>

    
<center><img src="images/banner.png" width="80%"  /></center>

<font face="Calibri">
<br>
<font size="5"> <b>Local Indicator of Spatial Association: hotspot analysis and transient sates</b></font>

<br>
<font size="4"> <b> Nicolas Pucino; PhD Student @ Deakin University, Australia </b> <br>

<font size="3">This notebook illustrates how to use Local Moran's I, a popular local indicator of spatial association (LISA) to capture the most signifcant areas of change within the changing beachface in every timestep. This allows us to focus on the most important areas of change, disregarding spatial outilers, to then classify magnitude of changes into relevant transient states, which will be used to model behaviours at the location and transect scales. <br>

<b>This notebook covers the following concepts:</b>

- Location-level Local Moran's I.
- Magnitude classification.
- Transient states definition.
</font>


</font>

In [6]:
len(set(["k","k","k","u"]))

2

In [1]:
import pandas as pd
import numpy as np
from tqdm import tqdm

from sandpyper.hotspot import LISA_site_level, Discretiser  

pd.options.mode.chained_assignment = None  # default='warn'

You can install them with  `pip install urbanaccess pandana` or `conda install -c udst pandana urbanaccess`
  warn(
  from .sqlite import head_to_sql, start_sql


## Load multitemporal datased (dh)

In [2]:
dh_file_path=r"C:\my_packages\sandpyper\tests\test_outputs\dh_data.csv"
df_multitemp=pd.read_csv(dh_file_path)

## Compute location level hotspot 

In [3]:
crs_dict_string={"mar":{'init': 'epsg:32754'},
         "leo":{'init': 'epsg:32755'}}

The function __LISA_site_level__ perform a Local Moran's I with False Discovery Rate (fdr) correction analysis for all the elevation change points in each survey in the dh_df table.

This function can use KNN-based, inverse distance weighted or binary distance-based spatial weight matrices. In this example, we model spatial relationships with a distance-based row standardised binary weight matrix with neighborhood radius of 35 m, in order to include two adjacent transect and some points from obliques without getting too far from the focal point.

we obtain a dataset containing the fdr threshold, local moran-s Is, p and z values and the quadrant in which each observation falls in a Moran's scatter plot, which represent High-High (HH, hotspot cluster), High-Low (HL, spatial outlier), Low-Low (LL, coldspot cluster) and Low-High (LH, spatial outlier) points.

We are interested in HH and LL clusters, which we generally call hotspots, and discard LH and HL points are sptial outliers.

In [4]:
distance_value=35 #enough to include two adjacent transect and some obliques without getting to the second transect
k_value=0
mode="distance" #select from "knn", "idw" or "distance"

In [5]:
%%time

lisa_df=LISA_site_level(dh_df=df_multitemp,
                        mode=mode,
                        distance_value=35,
                        geometry_column="geometry",
                        crs_dict_string=crs_dict_string)

lisa_df.tail()

  0%|          | 0/2 [00:00<?, ?it/s]

Working on mar


  return _prepare_from_string(" ".join(pjargs))


  0%|          | 0/8 [00:00<?, ?it/s]

Working on leo


  return _prepare_from_string(" ".join(pjargs))


  0%|          | 0/5 [00:00<?, ?it/s]

Wall time: 41.7 s


Unnamed: 0,geometry,location,tr_id,distance,dt,date_pre,date_post,z_pre,z_post,dh,lisa_fdr,lisa_q,lisa_I,lisa_n_val_obs,lisa_opt_dist,lisa_dist_mode,lisa_p_sim,lisa_z_sim,lisa_z,decay
19862,POINT (300071.060 5773184.013),leo,18,48.0,dt_0,20180606,20180713,-0.435838,-0.624926,-0.189088,0.008633,2,-0.031762,1448,35,distance_band,0.226,-0.665112,-0.419606,0
19863,POINT (300072.023 5773184.284),leo,18,49.0,dt_0,20180606,20180713,-0.43108,-0.642019,-0.210939,0.008633,2,-0.035211,1448,35,distance_band,0.241,-0.624701,-0.461142,0
19864,POINT (300075.506 5773164.488),leo,17,47.0,dt_0,20180606,20180713,-0.316839,-0.478775,-0.161936,0.008633,3,0.010098,1448,35,distance_band,0.498,0.097243,-0.367991,0
19865,POINT (300076.469 5773164.759),leo,17,48.0,dt_0,20180606,20180713,-0.459795,-0.47709,-0.017296,0.008633,3,0.001456,1448,35,distance_band,0.479,0.026881,-0.093039,0
19866,POINT (300077.432 5773165.029),leo,17,49.0,dt_0,20180606,20180713,-0.557283,-0.493576,0.063707,0.008633,4,-0.0008,1448,35,distance_band,0.484,-0.070634,0.060942,0


In [6]:
lisa_df_knn=LISA_site_level(dh_df=df_multitemp,
                        mode='knn',k_value=50,
                        distance_value=35,
                        geometry_column="geometry",
                        crs_dict_string=crs_dict_string)
lisa_df_knn

  0%|          | 0/2 [00:00<?, ?it/s]

Working on mar


  return _prepare_from_string(" ".join(pjargs))


  0%|          | 0/8 [00:00<?, ?it/s]

Working on leo


  return _prepare_from_string(" ".join(pjargs))


  0%|          | 0/5 [00:00<?, ?it/s]

Unnamed: 0,geometry,location,tr_id,distance,dt,date_pre,date_post,z_pre,z_post,dh,lisa_fdr,lisa_q,lisa_I,lisa_n_val_obs,lisa_opt_dist,lisa_dist_mode,lisa_p_sim,lisa_z_sim,lisa_z,decay
0,POINT (731646.904 5705523.469),mar,21,0.0,dt_7,20190313,20190516,1.111801,0.007440,-1.104360,0.043465,3,1.857057,1293,50,k,0.001,8.414143,-1.637332,0
1,POINT (731646.078 5705524.033),mar,21,1.0,dt_7,20190313,20190516,1.124138,0.008439,-1.115699,0.043465,3,1.884014,1293,50,k,0.001,8.162996,-1.661817,0
2,POINT (731645.253 5705524.598),mar,21,2.0,dt_7,20190313,20190516,1.117822,0.010800,-1.107022,0.043465,3,1.863386,1293,50,k,0.001,7.607334,-1.643079,0
3,POINT (731644.427 5705525.162),mar,21,3.0,dt_7,20190313,20190516,1.148563,0.011350,-1.137213,0.043465,3,1.914173,1293,50,k,0.001,8.079277,-1.708270,0
4,POINT (731643.602 5705525.727),mar,21,4.0,dt_7,20190313,20190516,1.112438,0.028030,-1.084408,0.043465,3,1.790040,1293,50,k,0.001,8.031736,-1.594248,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19862,POINT (300071.060 5773184.013),leo,18,48.0,dt_0,20180606,20180713,-0.435838,-0.624926,-0.189088,0.003211,2,-0.045990,1448,50,k,0.186,-0.842588,-0.419606,0
19863,POINT (300072.023 5773184.284),leo,18,49.0,dt_0,20180606,20180713,-0.431080,-0.642019,-0.210939,0.003211,2,-0.056389,1448,50,k,0.160,-0.909254,-0.461142,0
19864,POINT (300075.506 5773164.488),leo,17,47.0,dt_0,20180606,20180713,-0.316839,-0.478775,-0.161936,0.003211,2,-0.015425,1448,50,k,0.327,-0.321114,-0.367991,0
19865,POINT (300076.469 5773164.759),leo,17,48.0,dt_0,20180606,20180713,-0.459795,-0.477090,-0.017296,0.003211,2,-0.004121,1448,50,k,0.307,-0.416449,-0.093039,0


In [7]:
lisa_df_idw=LISA_site_level(dh_df=df_multitemp,
                        mode='idw',k_value=50,
                        distance_value=35,
                        geometry_column="geometry",
                        crs_dict_string=crs_dict_string)

  0%|          | 0/2 [00:00<?, ?it/s]

Working on mar


  return _prepare_from_string(" ".join(pjargs))


  0%|          | 0/8 [00:00<?, ?it/s]

Working on leo


  return _prepare_from_string(" ".join(pjargs))


  0%|          | 0/5 [00:00<?, ?it/s]

In [8]:
#lisa_df.to_csv(r"C:\my_packages\doc_data\profiles\lisa_location.csv")

## Classify dh magnitudes and create classes of elevation changes (transient states)

Now that we have statistically significant hotspots of change, we can use those points to capture the most interesting areas of beachface change which we use to model behaviour.

However, there is a very important point to consider, which is relevant for our example.<br>
Our beachfaces are narrow, and we use only reliable valid points using multiple levels of filtering (LoD, beachface area, sand-only), which significantly reduce the total number of usable points in each timestep.
Because in our next step we will compute the Beachface Cluster Dynamics indices both at the location and transect scales, we need to create two different dataframes and (slightly different) transient-states classes:

* hotspot-filtered: we discard spatial outliers to capture location scale behaviour. Used for location scale BCDs.
* full: we disregard hotspot classification and retain all points beyond LoD. Used for transect scale BCDs.

This is necessary to assure that we have enough points in each transect to model their behaviours. Moreover, the hotspot classification has been run at the location scale. Therefore, the HH and LL hotspots are only relevant for location-scale BCD analysis.

In [9]:
#lisa_df=pd.read_csv(r"C:\my_packages\doc_data\hotspots\lisa_location.csv")

### Location-scale

Use only hotspots beyond LoD

In [10]:
labels=["Undefined", "Small", "Medium", "High", "Extreme"]
appendix=["_deposition", "_erosion"]

In [11]:
%%time

D = Discretiser(bins=5, method="JenksCaspall", labels=labels)

labelled_hotspot_df = D.fit(lisa_df, absolute=True, print_summary=True)
labelled_hotspot_df

Data will be partitioned into 5 discrete classes.
Labels provided.
              JenksCaspall              
 
Lower          Upper               Count
        x[i] <= 0.111               8050
0.111 < x[i] <= 0.274               5404
0.274 < x[i] <= 0.533               3634
0.533 < x[i] <= 1.213               2332
1.213 < x[i] <= 8.464                447

Fit of JenksCaspall with 5 bins: 1280.1102305326785
Wall time: 213 ms


Unnamed: 0,geometry,location,tr_id,distance,dt,date_pre,date_post,z_pre,z_post,dh,...,lisa_q,lisa_I,lisa_n_val_obs,lisa_opt_dist,lisa_dist_mode,lisa_p_sim,lisa_z_sim,lisa_z,decay,markov_tag
0,POINT (731646.904 5705523.469),mar,21,0.0,dt_7,20190313,20190516,1.111801,0.007440,-1.104360,...,3,1.421087,1293,35,distance_band,0.001,6.800312,-1.637332,0,High_erosion
1,POINT (731646.078 5705524.033),mar,21,1.0,dt_7,20190313,20190516,1.124138,0.008439,-1.115699,...,3,1.398120,1293,35,distance_band,0.001,6.899025,-1.661817,0,High_erosion
2,POINT (731645.253 5705524.598),mar,21,2.0,dt_7,20190313,20190516,1.117822,0.010800,-1.107022,...,3,1.342067,1293,35,distance_band,0.001,6.769758,-1.643079,0,High_erosion
3,POINT (731644.427 5705525.162),mar,21,3.0,dt_7,20190313,20190516,1.148563,0.011350,-1.137213,...,3,1.347891,1293,35,distance_band,0.001,6.433387,-1.708270,0,High_erosion
4,POINT (731643.602 5705525.727),mar,21,4.0,dt_7,20190313,20190516,1.112438,0.028030,-1.084408,...,3,1.216918,1293,35,distance_band,0.001,6.350944,-1.594248,0,High_erosion
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19851,POINT (300060.470 5773181.039),leo,18,37.0,dt_0,20180606,20180713,-0.362233,-0.287499,0.074735,...,1,0.031865,1448,35,distance_band,0.001,3.766666,0.081905,0,Undefined_deposition
19852,POINT (300061.433 5773181.309),leo,18,38.0,dt_0,20180606,20180713,-0.363263,-0.336683,0.026580,...,2,-0.003862,1448,35,distance_band,0.001,-3.667222,-0.009633,0,Undefined_deposition
19853,POINT (300062.396 5773181.579),leo,18,39.0,dt_0,20180606,20180713,-0.390503,-0.381073,0.009430,...,2,-0.016823,1448,35,distance_band,0.001,-3.887448,-0.042234,0,Undefined_deposition
19854,POINT (300063.358 5773181.850),leo,18,40.0,dt_0,20180606,20180713,-0.373504,-0.370397,0.003106,...,2,-0.021103,1448,35,distance_band,0.003,-3.665658,-0.054256,0,Undefined_deposition


The __labelled_hotspot_df__ dataframe holds all information about the LISA analysis for each point, including:
* Moran's scatterplot quadrant (HH, HL, LL, LH) (lisa_q)
* Local Moran's I (lisa_I)
* False discovery rate threshold (lisa_fdr)
* Simulated pseudo p-value and z-value (lisa_p_sim, lisa_z_value)

Moreover, the transition states, absed on the dh classification, are stored in the __markov_tag__ column.

## Save both datasets

Now that we have the "markov tagged" dataset, we can model beahviour at both location and transect scales!

In [31]:
# these dataframes are ready for BCD indices computation

#labelled_hotspot_df.to_csv(r"C:\my_packages\doc_data\markov_tagged\markov_tagged_df.csv")
#labelled_fulldh_df.to_csv(r"C:\my_packages\doc_data\markov_tagged\markov_tagged_fulldh_df.csv")

____