# Lab 6: Spatial Statistics

In this lab, you will examine the spatial autocorrelation of the foreign resident ratio across neighborhoods (읍면동) in Seoul, South Korea. Using the `pysal` library, you will define two spatial weights matrices based on different distance decay functions: a power-law function and a Gaussian function. Each weights matrix will calculate the influence of distances for neighborhoods within a 3,000-meter radius. Based on these matrices, you will compute both Global and Local Moran's I statistics. Additionally, you will implement a customized distance decay function to explore alternative spatial weighting schemes.

## Structure
#### 1. Data Preparation
#### 2. Calculate Moran's I and LISA with Power-Law Weights Matrix
#### 3. Calculate Moran's I and LISA with Gaussian Weights Matrix
#### 4. Visualize the Results

## Notes:
**Before you submit your lab, make sure everything runs as expected WITHOUT ANY ERROR.** <br>
**Make sure you fill in any place that says `YOUR CODE HERE` or `YOUR ANSWER HERE`:**

In [None]:
FULL_NAME = ""

In [None]:
# Import necessary packages
import esda
import libpysal
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import math

## 0. Various Distance Decay Functions

There are various distributions, such as the Gaussian distribution or the Power-law distribution, to better explain the influence of distances under the various circumstances (i.e., distance decay functions). In this lab, you will implement two distance decay functions: Gaussian and Power-law. <br> 

<div style="text-align: center">
  <img src="./data/distance_decay.png" width="500">
</div>

The Gaussian function is defined as:

<br><br>
$$\large  G(d_{ij}, d_0) = \frac{e^{-\frac{1}{2}*(\frac{d_{ij}}{d_0})^2}-e^{-\frac{1}{2}}}{1-e^{-\frac{1}{2}}}$$

Whereas the Power-law function is defined as:

<br><br>
$$\large P(d_{ij}) = d_{ij} ^{\alpha}$$

## 1. Data Preparation

In this lab, you will be using the `EMD_Foreign_Ratio.geojson` in the data folder. This dataset contains the foreigner ratio in each community area (읍면동) in Seoul, South Korea. 

**1.1.** (0 point) Load `EMD_Foreign_Ratio.geojson` in the data folder as the name of `emd_gdf` with GeoPandas.<br>
**1.2.** (0 point) Change the coordinate system of `emd_gdf` to UTM-K (EPSG:5179).<br>

In [None]:
# Your code here
emd_gdf = 


In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""
assert type(emd_gdf) == gpd.GeoDataFrame
assert emd_gdf.shape == (424, 10)
assert emd_gdf.crs == 'EPSG:5179'

print('Success!')

**1.3.** (2 points) Extract the centroid coordinates from `emd_gdf` and store them as `points`. An example structure is shown below.

```python
    points=[(10, 10), (20, 10), (40, 10), (15, 20), (30, 20), (30, 30)...]
```


In [None]:
# Your code here
points = 
points

In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""
assert type(points) == list
assert points[80] == (960407.5662211042, 1955018.1470921177)
assert len(points) == 424

print('Success!')

## 2. Calculate Moran's I and LISA with Power-law Weights Matrix

**2.1.** (2 points) Calculate the weights matrix, named `w_power` using <a href=https://pysal.org/libpysal/generated/libpysal.weights.DistanceBand.html>`libpysal.weights.DistanceBand()`</a> function with the following parameters:
- data: a list of centroid coordinates of the `emd_gdf`. See the data structure below. You can use the `points` variable you created in the previous step.
```python
    points=[(10, 10), (20, 10), (40, 10), (15, 20), (30, 20), (30, 30)...]
```
 - threshold: 3km
- binary: False
- alpha: -0.1

In [None]:
# Your code here

w_power = libpysal.weights.DistanceBand()

In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""
assert type(w_power) == libpysal.weights.distance.DistanceBand
assert type(w_power.weights) == dict
assert type(w_power.neighbors) == dict
assert len(w_power.neighbors) == 424
assert len(w_power.weights[80]) == 29
assert round(sum(w_power.weights[80]), 2) == 13.57

print('Success!')

**2.2.** (2 points) Calculate Moran's I using the <a href=https://pysal.org/esda/generated/esda.Moran.html>`esda.Moran()`</a>  function with the parameters below, and store the result in a variable named `mi_power`.
- y: `외국인비율`
- w: `w_power`
- transformation: `O`

**2.3.** (1 point) Retrieve the attribute that contains the Moran's I value from the `mi_power` object, and store it in a new variable named `mi_power_value`.


In [None]:
# Your code here
mi_power = esda.Moran(
                      transformation='O'
                      )




In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""
assert type(mi_power) == esda.moran.Moran
assert round(sum(w_power.weights[80]), 2) == 13.57
assert round(mi_power_value, 4) == 0.3269

print('Success!')

**2.4.** (2 points) Calculate the Local Moran's I using the <a href=https://pysal.org/esda/generated/esda.Moran_Local.html>`esda.Moran_Local()`</a> function with the parameters below, and store the result in a variable named `lisa_power`.
- y: `외국인비율`
- w: `w_power`
- transformation: `O`
- seed: 17

In [None]:
# Your code here
lisa_power = esda.moran.Moran_Local(
                                    seed=17
                                  )



In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""
assert type(lisa_power) == esda.moran.Moran_Local
assert lisa_power.q[80] == 1
assert lisa_power.p_sim[80] == 0.199

print('Success!')

**2.5.** (2 points) Create a new column in `emd_gdf` named `lisa_power`. For each observation, assign the appropriate quadrant label (`HH`, `LH`, `LL`, or `HL`) if the result is statistically significant based on the `p_sim` attribute of the lisa_power object. If the result is not significant, assign `NS`.

**Note** You can take advantage of the dictionary below. 
```python
    lisa_dict = {1: 'HH', 2: 'LH', 3: 'LL', 4: 'HL'}
```

In [None]:
# Your code here
lisa_dict = {1: 'HH', 2: 'LH', 3: 'LL', 4: 'HL'}



In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""
assert emd_gdf.shape == (424, 11)
assert 'lisa_power' in emd_gdf.columns
assert emd_gdf.at[80, 'lisa_power'] == 'NS'
assert emd_gdf.at[4, 'lisa_power'] == 'LH'

print('Success!')

## 3. Calculate Moran's I and LISA with Gaussian Weights Matrix

The `libpysal` library does not provide a built-in Gaussian function. Therefore, you need to implement the Gaussian distance decay function manually. The Gaussian function is defined as:

$$\large  G(d_{ij}, d_0) = \frac{e^{-\frac{1}{2}*(\frac{d_{ij}}{d_0})^2}-e^{-\frac{1}{2}}}{1-e^{-\frac{1}{2}}}$$

In python, the Gaussian function can be implemented as follows:

```python
def gaussian(dij, d0):  # Gaussian probability distribution
    # dij: distance between i and j
    # d0: distance threshold
    # val: value of the Gaussian function
    if d0 >= dij:
        val = (math.exp(-1 / 2 * ((dij / d0) ** 2)) - math.exp(-1 / 2)) / (1 - math.exp(-1 / 2))
        return val
    else:
        return 0
```

Fortunately, the `libpysal` library allows users to define their own distance decay functions. The <a href=https://pysal.org/libpysal/generated/libpysal.weights.W.html>libpysal.weights.W()</a> function takes two arguments: neighbors and weights.

`neighbors` is a dictionary that specifies the neighboring observations for each spatial unit.

`weights` is a dictionary that assigns a weight value to each neighbor.

The expected format for the `neighbors` dictionary is:

```python
    neighbors = {
                 'a': ['b'],
                 'b': ['a', 'c'],
                 'c': ['b']
                } 
```
The corresponding `weights` dictionary should be in the following format:

```python
    weights ={
              'a':[0.5],
              'b':[0.5,1.5],
              'c':[1.5]
             }
            
```

In [None]:
# Run this cell to implement the gaussian function

def gaussian(dij, d0):  # Gaussian probability distribution
    # dij: distance between i and j
    # d0: distance threshold
    # val: value of the Gaussian function
    
    if d0 >= dij:
        val = (math.exp(-1 / 2 * ((dij / d0) ** 2)) - math.exp(-1 / 2)) / (1 - math.exp(-1 / 2))
        return val
    else:
        return 0

**3.1.** (5 points) Populate the `neighbors` and `weights` dictionaries using the `emd_gdf` GeoDataFrame. Brief sketch of the logic is as shown below. 
- Create two for-loop for iterate through each index of `emd_gdf` for `i` and `j`, respectively.
- For each pair of `i` and `j`, calculate the distance between the two points.
```python
    dij = emd_gdf.at[i, 'geometry'].centroid.distance(emd_gdf.at[j, 'geometry'].centroid)
```
- If the distance is less than or equal to the threshold (3km), add `j` to the list of neighbors for `i` in the `neighbors` dictionary. In addition, calculate the weight using the Gaussian function and add the calculated weight to the `weights` dictionary.

**NOTE**: Again, 
The expected format for the `neighbors` dictionary is:

```python
    neighbors = {
                 0: [1], # Take the numbers as the index of the `emd_gdf`
                 1: [0, 2],
                 2: [1]
                } 
```
The corresponding `weights` dictionary should be in the following format:

```python
    weights ={
              0:[0.5], # Take the numbers as the gausssian function value
              1:[0.5,1.5],
              2:[1.5]
             }
            
```

In [None]:
neighbors = {}
weights = {}

for i in range(emd_gdf.shape[0]):
    neighbors[i] = []
    weights[i] = []

# neighbors

In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""
assert type(neighbors) == dict
assert len(neighbors) == 424
assert type(weights) == dict
assert len(weights) == 424
assert type(neighbors[0]) == list
assert len(neighbors[0]) == 26
assert len(weights[0]) == 26
assert type(weights[0]) == list
assert neighbors[0] == [1, 2, 4, 5, 6, 7, 9, 15, 16, 17, 18, 19, 20, 23, 26, 32, 44, 192, 193, 194, 195, 202, 203, 204, 220, 221]
assert round(sum(weights[0]),2) == 10.78

print('Success!')

**3.2.** (2 points) Investigate <a href=https://pysal.org/libpysal/generated/libpysal.weights.W.html>`libpysal.weights.W()`</a> function and create a customized weights matrix named `w_gaussian` using the `neighbors` and `weights` dictionaries you created in the previous step. <br>

In [None]:
# Your code here
w_gaussian = 

In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""

assert type(w_gaussian.weights) == dict
assert type(w_gaussian.neighbors) == dict
assert len(w_gaussian.weights[80]) == 29
assert round(sum(w_gaussian.weights[80]), 2) == 11.98

print('Success!')

**3.3.** (1 point) Calculate Moran's I using the <a href=https://pysal.org/esda/generated/esda.Moran.html>`esda.Moran()`</a>  function with the parameters below, and store the result in a variable named `mi_gaussian`.
- y: `외국인비율`
- w: `w_gaussian`
- transformation: `O`

**3.4.** (1 point) Retrieve the attribute that contains the Moran's I value from the `mi_gaussian` object, and store it in a new variable named `mi_gaussian_value`.


In [None]:
# Your code here
mi_gaussian = esda.Moran()



In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""
assert type(mi_gaussian) == esda.moran.Moran
assert round(sum(w_gaussian.weights[80]), 2) == 11.98
assert round(mi_gaussian_value,4) == 0.3994

print('Success!')

**3.5.** (1 point) Calculate the Local Moran's I using the <a href=https://pysal.org/esda/generated/esda.Moran_Local.html>`esda.Moran_Local()`</a> function with the parameters below, and store the result in a variable named `lisa_gaussian`.
- y: `외국인비율`
- w: `w_gaussian`
- transformation: `O`
- seed: 17

In [None]:
# Your code here
lisa_gaussian = 

In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""
assert type(lisa_gaussian) == esda.moran.Moran_Local
assert lisa_gaussian.q[80] == 1
assert lisa_gaussian.p_sim[80] == 0.048

print('Success!')

**3.6.** (1 point) Create a new column in `emd_gdf` named `lisa_gaussian`. For each observation, assign the appropriate quadrant label (`HH`, `LH`, `LL`, or `HL`) if the result is statistically significant based on the `p_sim` attribute of the lisa_power object. If the result is not significant, assign `NS`.

**Note** You can take advantage of the dictionary below. 
```python
    lisa_dict = {1: 'HH', 2: 'LH', 3: 'LL', 4: 'HL'}
```

In [None]:
lisa_dict = {1: 'HH', 2: 'LH', 3: 'LL', 4: 'HL'}

# Your code here

## 4. Visualize the Results

**4.1.** (0 point) Create a figure with 2 subplots, and set the figure size to (15, 10).<br>
**4.2.** (1 point) In the first subplot, plot the `lisa_power` results in the `emd_gdf` GeoDataFrame with the color provided in the `lisa_color` dictionary below. <br>
**4.3.** (1 point) In the second subplot, plot the `lisa_gaussian` results in the `emd_gdf` GeoDataFrame with the color provided in the `lisa_color` dictionary below. <br>
**4.4.** (1 point) Add a title to each subplot: `LISA Map (Power): Moran's I Value` and `LISA Map (Gaussian): Moran's I Value`.<br>

```python
lisa_color = {'HH': 'red', 'LL': 'blue', 'HL': 'orange', 'LH': 'skyblue', 'NS': 'lightgrey'}
```

The expected output should look like the following figure.
<div style="text-align: center">
  <img src="./data/lisa_map.png" width="1000">
</div>

In [None]:
# Your code here
lisa_color = {'HH': 'red', 'LL': 'blue', 'HL': 'orange', 'LH': 'skyblue', 'NS': 'lightgrey'}


# Done