<a href="https://colab.research.google.com/github/peterliu502/GEO1001_hw02/blob/master/5386586_5360684.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Analysis report for GEO1001 HW01 
---

## Authors  
1. First author: [<img src="https://avatars3.githubusercontent.com/u/59593272?s=400&u=ba1618be6d5e354f0bd7685ff405bdec6d18c101&v=4" align = "left" width = "25" height = "25" />](https://github.com/peterliu502)  
   * Name: Zhenyu Liu  
   * Student Number: 5386586  
  
2. Second author: [<img src="https://avatars2.githubusercontent.com/u/47234206?s=400&u=3f54e18f68e48f985db9f0ef1a8eb3a3ca189b1e&v=4" align = "left" style = "float:left" width = "25" height = "25" />](https://github.com/Ziyan-Wu)
  * Name: Ziyan Wu
  * Student Number: 5360684

## Source data 
The source data contains `Sentinel-2` images for the area around Delft on May 30th 2020, with 10m, 20m and 60m resolutions.

## Python packages  
This assignment uses following Python packages:  
1. `numpy`
2. `scikit-learn ` 
3. `matplotlib`  
4. `rasterio` 

In [None]:
import numpy as np
import rasterio as rio
from rasterio.windows import Window
import rasterio.plot as rp
from matplotlib import pyplot as plt
from matplotlib import colors as mc
from sklearn import cluster

## Open files and preprocess  

In [None]:
def open_raster(resolution, band):
    with rio.open(
            './GRANULE/L2A_T31UET_A025788_20200530T105134/IMG_DATA/R' + resolution
            + 'm/T31UET_20200530T105031_B' + band + '_' + resolution + 'm.jp2',
            driver="JP2OpenJPEG") as ds:
        if resolution == '10':
            # get the row and column number for the top-left pixel
            row, col = ds.index(601200, 5773695)
            # cut the raster and standardize the raster values
            return rp.adjust_band(ds.read(1, window=Window(col, row, 700, 500)))
        else:
            # standardize the raster values
            return rp.adjust_band(ds.read(1))

### Open 60m data  
Open and read `band02(B)`, `band03(G)`, `band04(R)` and `band8A(Narrow NIR)` of the raster data in 60m resolution.

In [None]:
ds_60_B2_subset = open_raster('60', '02')
ds_60_B3_subset = open_raster('60', '03')
ds_60_B4_subset = open_raster('60', '04')
ds_60_B8_subset = open_raster('60', '8A')

#### Open 10m data   
Open and read `band02(B)`, `band03(G)`, `band04(R)` and `band8(NIR)` of the raster data in 10m resolution.

In [None]:
ds_10_B2_subset = open_raster('10', '02')
ds_10_B3_subset = open_raster('10', '03')
ds_10_B4_subset = open_raster('10', '04')
ds_10_B8_subset = open_raster('10', '08')

#### Merge bands
This assignment uses `true color` (band 4, 3, 2) and `nir false color`(band 8/8A, 4, 3) images.

In [None]:
# merge the bands 4, 3 and 2 (true color)
ds_60_432 = np.dstack((ds_60_B4_subset, ds_60_B3_subset, ds_60_B2_subset))
# merge the bands 8, 4 and 3 (nir false color)
ds_60_843 = np.dstack((ds_60_B8_subset, ds_60_B4_subset, ds_60_B3_subset))
# merge the bands 4, 3 and 2 (true color)
ds_10_432 = np.dstack((ds_10_B4_subset, ds_10_B3_subset, ds_10_B2_subset))
# merge the bands 8, 4 and 3 (nir false color)
ds_10_843 = np.dstack((ds_10_B8_subset, ds_10_B4_subset, ds_10_B3_subset))

## KMeans
KMeans is a commonly used classification method. Its basic idea is assigning points to K clusters so that each point is nearest to its `cluster center` (`cluster mean`) than other cluster centers.  

One of the key points of KMeans clustering is to find the optimal `K-value` (`n_clusters`) in advance. In this assignment, we use `SSE` (`Error Sum of Squares`) to find the `optimal K-value`. The main idea of this solution is that the value of SSE will decrease with the increase of K-value in general, but if the K-value is less than the optimal K-value, the SSE will decrease sharply. After K-value is greater than the optimal K-value, SSE decreases very slowly. Therefore, the goal is to find the turning point on the SSE curve, which is the optimal K-value position. In the following sections, we will verify the optimal K-value with the output classification images.

### KMeans Preprocess

#### Creat KMeans classifier

In [None]:
# create a KMeans classifier
def kmeans_classifier(ds):
    # store the SSE of each result
    sse = []
    ds_1d = ds[:, :, :3].reshape((ds.shape[0] * ds.shape[1], ds.shape[2]))
    ds_img_cl_list = []
    for elm in range(11)[3:]:
        # create a KMeans classifier object
        ds_cl = cluster.KMeans(n_clusters=elm)
        # train the data
        ds_cl.fit(ds_1d)
        # get the labels of the classes
        ds_img_cl = ds_cl.labels_
        # reshape labels to a 3d array (one band only)
        ds_img_cl = ds_img_cl.reshape(ds[:, :, 0].shape)
        sse.append(ds_cl.inertia_)
        ds_img_cl_list.append(ds_img_cl)
    return ds_img_cl_list, sse

#### Create classification image and SSE curve generation functions

In [None]:
# SSE curve generation function
def sse(arr):
    plt.figure(figsize=(5, 5))
    X = range(3, 11)
    plt.xlabel('k')
    plt.ylabel('SSE')
    plt.plot(X, arr, 'o-')
    plt.show()


# classification image generation function
def plot_classification(ds_list):
    ds_fig = plt.figure(figsize=(30, 10))
    for elm in range(8):
        # plot the classification image
        ds_fig.add_subplot(2, 4, elm + 1)
        # set the custom color map to represent the different classes in image
        cmap = mc.LinearSegmentedColormap.from_list(
            "", ["seagreen", "tan", "white", "green", "mediumseagreen", "yellow",
                 "magenta", "red", "blue"])
        plt.imshow(ds_list[elm], cmap=cmap)
        plt.title("n_clusters="+str(elm + 3))
    plt.show()

### Plot 60m data


#### Find optimal K-value of true color classification images
According to the SSE curve, the optimal K-value is 6.

In [None]:
# plot a SSE curve
sse(kmeans_classifier(ds_60_432)[1])

#### Plot true color classification images


In [None]:
# plot true color classification image in 60m resolution
plot_classification(kmeans_classifier(ds_60_432)[0])

#### Find optimal K-value of nir false color classification images
According to the SSE curve, the optimal K-value is 6.

In [None]:
# plot a SSE curve
sse(kmeans_classifier(ds_60_843)[1])

#### Plot nir false color classification images

In [None]:
# plot nir false color classification image in 60m resolution
plot_classification(kmeans_classifier(ds_60_843)[0])

### Plot 10m data

#### Find optimal K-value of true color classification images
According to the SSE curve, the optimal K-value is 6.

In [None]:
# plot a SSE curve
sse(kmeans_classifier(ds_10_432)[1])

#### Plot true color classification images


In [None]:
# plot true color classification image in 10m resolution
plot_classification(kmeans_classifier(ds_10_432)[0])

#### Find optimal K-value of nir false color classification images
According to the SSE curve, the optimal K-value is 6.

In [None]:
# plot a SSE curve
sse(kmeans_classifier(ds_10_843)[1])

#### Plot nir false color classification images

In [None]:
# plot nir false color classification image in 10m resolution
plot_classification(kmeans_classifier(ds_10_843)[0])