# Lab 4: Geospatial Data Visualization

In this lab, you will analyze the temporal changes in the extinction ratio of the population (지역소멸지수) in Seoul, South Korea, from 2000 to 2020, every five years. Prior to the geospatial data visualization, you are required to preprocess the data to make it suitable for visualization. Then, you will calculate the extinction ratio of the population using the following formula:

$$
\text{Extinction Risk Index} = \frac{\text{Population of women aged 20-39}}{\text{Population aged 65 and over}}
$$


The data of this lab was obtained from the following resources. 
* Population data (연앙인구): https://kosis.kr/statHtml/statHtml.do?orgId=101&tblId=DT_1B040M5&conn_path=I3 (시군구/성/연령(5세)별 주민등록연앙인구)
* SGG (시군구) geometry: https://data.seoul.go.kr/dataVisual/seoul/seoulLivingPopulation.do (집계구 데이터를 재가공)

## Structure
### 1. Data Preprocessing 
**1.1.** Create an empty DataFrame (2 points) <br>
**1.2.** Import the population data (1 point) <br>
**1.3.** Calculate extinction ratio for a SGG (종로구) over the years (4 points) <br>
**1.4.** Repeat the above steps for all SGGs (시군구) (4 points) <br>
**1.5.** Join the `ratio_df` DataFrame with the SGG (시군구) geometry data (2 point) <br>

### 2. A Single Choropleth Map (4 points)
### 3. Create a customized color map (4 points)
### 4. Creating a Figure with Multiple Axes (4 points)

## Notes:
**Before you submit your lab, make sure everything runs as expected WITHOUT ANY ERROR.** <br>
**Make sure you fill in any place that says `YOUR CODE HERE` or `YOUR ANSWER HERE`:**

In [None]:
FULL_NAME = ""

In [None]:
# Import necessary packages
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

## 1. Data Preprocessing 

The structure of the ***raw population data*** is as shwon below. The population data is divided based on the age, gender, and years. <br>

<div style="text-align: center">
  <img src="./data/population_raw_data.jpg" width="500">
</div>

To effectively visualize the data, you need to preprocess the data to meet ***the format as shown below***. Simply speaking, you need to calculate the extinction ratio of the population (지역소멸지수) for each SGG (시군구) and year. <br>

<div style="text-align: center">
  <img src="./data/population_completed_data.jpg" width="500">
</div>


### 1.1. Create an empty DataFrame (2 points)

<div style="text-align: center">
  <img src="./data/empty_ratio_df.jpg" width="300">
</div>

With the two lists provided below, create an empty DataFrame (named `ratio_df`) with the columns as shown in the above image. In other words, `ratio_df` DataFrame has the `행정구역`, `Y2000`, `Y2005`, `Y2010`, `Y2015`, and `Y2020` columns. The `행정구역` column should contain the SGG (시군구) names, while the other columns should be empty. <br>

```python
    sgg_list = ['종로구', '중구', '용산구', '성동구', '광진구', '동대문구', '중랑구', '성북구', '강북구', 
                '도봉구', '노원구', '은평구', '서대문구', '마포구', '양천구', '강서구', '구로구', '금천구', 
                '영등포구', '동작구', '관악구', '서초구', '강남구', '송파구', '강동구']

    year_list = ['Y2000', 'Y2005', 'Y2010', 'Y2015', 'Y2020']
```



In [None]:
# Information needed
sgg_list = ['종로구', '중구', '용산구', '성동구', '광진구', '동대문구', '중랑구', '성북구', '강북구', 
            '도봉구', '노원구', '은평구', '서대문구', '마포구', '양천구', '강서구', '구로구', '금천구', 
            '영등포구', '동작구', '관악구', '서초구', '강남구', '송파구', '강동구']

year_list = ['Y2000', 'Y2005', 'Y2010', 'Y2015', 'Y2020']

# Your code here
ratio_df = 

ratio_df

In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""

assert type(ratio_df) == pd.DataFrame
assert ratio_df.shape == (25, 6) 

print('Success!')

### 1.2. Import the population data (1 point)

The population data named `seoul_pop.xlsx` is provided in the `data` folder. Import the data using Pandas.

In [None]:
# Your code here
pop_df = 
pop_df

In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""

assert type(pop_df) == pd.DataFrame
assert pop_df.shape == (675, 8)

print('Success!')

### 1.3. Calculate extinction ratio for a SGG (종로구) over the years (4 points)

Now we need to select young women (20-39 years old) and elderly people (65 years and older) for each SGG from the population data. <br>

#### 1.3.1 Select young women (20-39 years old) for 종로구
Before calculating the information for every SGG, let's select the young women population for 종로구 for every year ('Y2000', 'Y2005', 'Y2010', 'Y2015', 'Y2020'). <br> 
Expected return is a form of DataFrame and looks like below. Name the DataFrame as `jn_women_pop`. <br>

<div style="text-align: center">
  <img src="./data/jn_women_pop.jpg" width="300">
</div>


**Hint**: You can take advantage of `.loc[]` method to select the relevant information. For the condition within the `.loc[]` method, you can use `==` comparision and/or `.isin()` method. <br>

```python
    return_df = df.loc[(df[`Column1`] == `Value1`) & 
                       (df[`Column2`].isin([`list_of_values`])) & 
                       (df[`Column3`] == `Value3`),
                       [`Year1`, `Year2`, `Year3`...]]
                       ]
```

In [None]:
# Your code here
jn_women_pop = pop_df.loc[]

jn_women_pop

In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""

import numpy as np

assert type(jn_women_pop) == pd.DataFrame
assert jn_women_pop.shape == (4, 5)
assert np.all(jn_women_pop.values[0] == [7818.5, 6844., 5678.5, 5648.5, 5626.5])

print('Success!')

#### 1.3.2 Select old population (65 years and older) for 종로구

Now we need to select the elderly population (65 years and older) for 종로구 for every year ('Y2000', 'Y2005', 'Y2010', 'Y2015', 'Y2020'). <br>
Expected return is a form of DataFrame and looks like below. Name the DataFrame as `jn_old_pop`. <br>


<div style="text-align: center">
  <img src="./data/jn_old_pop.jpg" width="300">
</div>


In [None]:
# Your code here
jn_old_pop = pop_df.loc[]

jn_old_pop

In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""

import numpy as np

assert type(jn_old_pop) == pd.DataFrame
assert jn_old_pop.shape == (4, 5)
assert np.all(jn_old_pop.values[0] == [5599.5, 6962.5, 7810.5, 7174.5, 7636.5])

print('Success!')

#### 1.3.3. Calculate the extinction ratio of the population (지역소멸지수) for 종로구

Sum the `jn_women_pop` and `jn_old_pop` DataFrames per year, respectively. <br>
Then, divide the summed `jn_women_pop` by the summed `jn_old_pop` to get the extinction ratio of the population (지역소멸지수) for 종로구. <br> 
Expected return is a form of Series and looks like below. Name the Series as `jn_ratio`. <br>

<div style="text-align: center">
  <img src="./data/jn_ratio.jpg" width="200">
</div>


In [None]:
# Your code here
jn_ratio = 
jn_ratio

In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""

assert type(jn_ratio) == pd.Series
assert round(jn_ratio['Y2000'], 2) == 2.35
assert round(jn_ratio['Y2010'], 2) == 1.27
assert round(jn_ratio['Y2020'], 2) == 0.85

print('Success!')

#### 1.3.4. Enter the extinction ratio of the population (지역소멸지수) for 종로구 back to the `ratio_df` DataFrame

Search on the web for how to enter the values of a Series into a DataFrame. Enter the extinction ratio information for 종로구 (`jn_ratio`) into `ratio_df`. The `ratio_df` should look like below. <br>

<div style="text-align: center">
  <img src="./data/jn_ratio_df.jpg" width="400">
</div>

In [None]:
# Your code here



In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""

assert type(ratio_df) == pd.DataFrame
assert ratio_df.shape == (25, 6)
assert round(ratio_df.loc[0, 'Y2000'], 2) == 2.35
assert round(ratio_df.loc[0, 'Y2010'], 2) == 1.27
assert round(ratio_df.loc[0, 'Y2020'], 2) == 0.85

print('Success!')

### 1.4. Repeat the above steps for all SGGs (시군구) (4 points)

It is your freedom to choose any method/approach, while I recommend that you create a function (`def`) and reuse the approach above. <br>
The goal of this step is to fill in the extinction ratio of the population (지역소멸지수) for all SGGs (시군구) in the `ratio_df` DataFrame. <br>
The final `ratio_df` DataFrame should look like below. <br>

<div style="text-align: center">
  <img src="./data/completed_ratio_df.jpg" width="400">
</div>

In [None]:
# Your code here



In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""

assert type(ratio_df) == pd.DataFrame
assert ratio_df.shape == (25, 6)
assert round(ratio_df.loc[ratio_df['행정구역'] == '관악구', 'Y2000'].values[0], 2) == 4.31
assert round(ratio_df.loc[ratio_df['행정구역'] == '관악구', 'Y2010'].values[0], 2) == 2.12
assert round(ratio_df.loc[ratio_df['행정구역'] == '관악구', 'Y2020'].values[0], 2) == 1.27
assert round(ratio_df.loc[ratio_df['행정구역'] == '강서구', 'Y2000'].values[0], 2) == 3.74
assert round(ratio_df.loc[ratio_df['행정구역'] == '강서구', 'Y2010'].values[0], 2) == 2.01
assert round(ratio_df.loc[ratio_df['행정구역'] == '강서구', 'Y2020'].values[0], 2) == 1.13

print('Success!')

### 1.5. Join the `ratio_df` DataFrame with the SGG (시군구) geometry data (2 point)

Import the SGG (시군구) geometry data named `sgg_seoul.geojson` from the `data` folder and name it `sgg_gdf`. <br>
Then, join the `ratio_df` DataFrame with the `sgg_gdf` GeoDataFrame and name the result as `sgg_ratio_gdf`. <br>

***IMPORTANT***: IF you failed to complete the tasks above, you can import `seoul_extinction_ratio.xlsx` file from the data folder and restart from here. <br>

In [None]:
# Your code here



In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""

assert type(sgg_ratio_gdf) == gpd.GeoDataFrame
assert sgg_ratio_gdf.shape == (25, 9)

print('Success!')

## 2. A Single Choropleth Map (4 points)

Here, you will create a single Choropleth map with the merged `sgg_ratio_gdf` GeoDataFrame from the previous task. The result should look to the map below. <br>

<div style="text-align: center">
  <img src="./data/single_map.jpg" width="500">
</div>

### 2.1. Initiate a plot
Initiate a plot with `plt.subplots()`. Specify the figure size as 10 by 10 but making a single map. <br>

### 2.2. Create Choropleth Map
Create a Choropleth map based on the ratio in 2020. Use the following attributes for a specific style.
* Colormap (`cmap`): 'Reds' ***STRONGER RED COLOR should indicate LOWER EXTINCTION RATIO (more risky area)***
* Classification scheme (`scheme`): 'NaturalBreaks'
* Number of classes (`k`): 7


### 2.3. Create annotation
Fill in the missing information (i.e., `NAME OF YOUR DATAFRAME` and `NAME OF A COLUMN`) from the code below for annotating each SGG. <br>


```python
    for idx, row in [`NAME OF YOUR DATAFRAME`].iterrows(): # Iterate everyrow in a GeoDataFrame
        ax.text(s=row[`NAME OF A COLUMN`], # String to be displayed
                x=row['geometry'].centroid.coords[:][0][0], # X coordinate of label
                y=row['geometry'].centroid.coords[:][0][1], # Y coordinate of label
                fontsize=10, 
                color='black',
                ha='center', # Horizontal align
                va='center', # Vertical align
               )
```

### 2.4. Change the font style if you face Korean characters issue. <br>


In [None]:
# Your code here


## 3. Create a customized color map (4 points)

Given the fact that the extinction ratio has its own meaning, it is better to use a customized color map. The customized color map should look like below. <br>

### 3.1. Define a customized color map
Create a customized color map using the `matplotlib.colors.ListedColormap` method. <br>
For the extiontion ratio associated with the color, please refer to the table below. <br>

| Color | Extinction Ratio | Meaning |
|:---:|:---:|:---:|
| Green | r >= 1.5 | Safe |
| Lightgreen | 1.5 > r >= 1.0 | Normal |
| Yellow | 1.0 > r >= 0.5 | Need Attention |
| Orange | 0.5 > r >= 0.2 | Risky |
| Red | 0.2 > r | Very Risky |

The resulted color map (`cm`) should look like below. <br>

<div style="text-align: center">
  <img src="./data/custom_colors.jpg" width="500">
</div>


<br><br>

### 3.2. Apply the customized color map to the Choropleth map
Update the code for task 2 and make a choropleth map with the customized color map. <br>

<div style="text-align: center">
  <img src="./data/custom_color_map.jpg" width="500">
</div>

In [None]:
# Your code here


## 4. Creating a Figure with Multiple Axes (4 points)

### 4.1. Initiate a Figure
Create `fig` and `axes` with `plt.subplots(nrows=[number needed], ncols=[number needed], figsize=(15, 10))`. As we have four columns (`Y2000`, `Y2005`, `Y2010`, `Y2015`, `Y2020`) to be displayed, we want to have **1** row and **5** columns. <br>

### 4.2. Repeat the same style with different years
Populate each of the axes with the extiction ratio of every 5 year. You want to use the same color maps (`cm`) and classification scheme (`user_defined`) as the previous task. <br>


### 4.3. Decorate the axes
- Hide the axes ticks and labels
- Set the title of each axes with the year of the extinction ratio
- Annocate each SGG with the extiction ratio of each year. 

The filal product should look like below. <br>

<div style="text-align: center">
  <img src="./data/multiple_axes.jpg" width="1200">
</div>

In [None]:
# Your code here


# Done