### Title: Land Cover Classification Using Landsat Imagery in Google Earth Engine

#### Introduction:
This notebook provides a detailed process of conducting land cover classification using Landsat imagery using the Google Earth Engine platform. 

The Part 1 of this notebook deals with relevant data collection processes that is required to conduct the analysis i.e., defining the helper functions required to retrieve Landsat data for a specified geographic location, date range, and cloud cover threshold, defining functions for creating interactive map for visualizing Landsat imagery, and collecting the training data required for building a classification model.

The Part 2 of this notebook utilizes the data collected in Part 1 and builds a classification model to identify landcover classes and creates a visualization to identify changes in land cover from one class to other. 

#### Data & Logistics:

1. Data Provided: The location data provided was a .kml file that represented the Big Coast Project area. This large polygon was then processed in QGIS to split the large polygon into small sized polygons representing the area.
2. Data Used: Out of the several split polygons, only one polygon was used to create this project. Apart from that, for creating a training sample we will require land classification data that can be used to train a Machine Learning model and assess its accuracy. In this regard, the training data was created using the "Land Cover of North America at 30 meters, 2020" created by USGS between dates "2019-01-01T00:00:00Z–2021-12-31T00:00:00" and accessed via earth engine API(https://developers.google.com/earth-engine/datasets/catalog/USGS_NLCD_RELEASES_2020_REL_NALCMS).
3. To process and estimate the biomass cover for the desired area, data from Global Ecosystem Dynamics Investigation (GEDI), was used (https://daac.ornl.gov/GEDI/guides/GEDI_L4B_Gridded_Biomass.html). This dataset provides estimates of mean aboveground biomass density (AGBD) based on observations from 2019-04-18 to 2021-08-04 for the entire globe. The dataset available was for the entire globe, barring most areas from our desired project locations. However, I did get some representative samples of data extracted that was used for estimating biomass. (This part is covered in details in notebook 2).   

In [94]:
# Import the required libraries

import ee
import geemap
import pandas as pd
import geopandas as gpd

In [95]:
# # Authenticate and initialize the earthengine library
# # Trigger the authentication flow.
# ee.Authenticate()

# # Initialize the library.
# ee.Initialize()

### PART 1:

In [96]:
# Define the helper functions

def fetch_landsat_data(location: ee.Geometry, 
                       start_date: str, 
                       end_date: str, 
                       cloud_cover: int):
    """
    This function fetches the landsat data between a date range (start and end date)
    for a location and returns a landsat imagery containing bands between B1 and B7.
    :param location: desired geometric location (Point or Polygon)
    :param start_date: start date
    :param end_date: end date
    :param cloud_cover: percent of cloud cover
    :return:earth engine image collection
    """
    # Define the Landsat image collection
    landsat_image = ee.ImageCollection('LANDSAT/LC08/C02/T1_TOA'). \
        filterDate(ee.Date(start_date), ee.Date(end_date)). \
        filterBounds(location). \
        filterMetadata('CLOUD_COVER', 'less_than', cloud_cover).first(). \
        select('B[1-7]')

    print(f"The date of fetched image is {ee.Date(landsat_image.get('system:time_start')).format('YYYY-MM-dd').getInfo()}")

    print(f"The cloud cover is {landsat_image.get('CLOUD_COVER').getInfo()}")

    return landsat_image


def create_map(location: ee.Geometry, image: ee.Image):
    """
    This function is in the context of the Google Earth Engine Python API,
    that takes in the location and image (landsat) and outputs an interactive 
    map that can be used for visualizing geospatial data and results.
    :param location: location to focus on the basemap
    :param image: landsat imagery to overlay on basemap
    :return: geemap Map object
    """
    Map = geemap.Map()
    Map.centerObject(location, 6)
    Map.addLayer(image, {
        'bands': ['B4', 'B3', 'B2'],
        'min': 0,
        'max': 0.4,
        'gamma': 1.4
    }, 'Landsat Composite')
    return Map

#### Data Preparation:

1. Define the area of interest i.e. one polygon from the entire area
2. Define and fetch the year of study (or model training data) and date of fetching the landsat imagery- The training year is 2020, because we have nlcd data(which is used for training labels) for the year 2020.
3. Fetch the nlcd data to create training samples.
4. Fetch the image data for prediction, which must be 2 years apart (we will take, 2022 as prediction year)

In [97]:
# 1. Define the area of interest i.e. one polygon from the entire area

# Load the required polygon dataset
split_polygons = gpd.read_file('boundary_clipped.geojson')

# extract a desired polygon from the file and assign it as a area of interest.
polygon_47 = split_polygons[split_polygons['id'] == 47]
multipolygon_string = str(polygon_47['geometry'].iloc[0])

# Extract the coordinates from the string and format them into a list of lists
coordinates = multipolygon_string.strip("MULTIPOLYGON (((").rstrip("))").split(", ")
coordinate_list = [list(map(float, point.split(" "))) for point in coordinates]

# Add the first point at the end to create a closed polygon
coordinate_list.append(coordinate_list[0])

# print(coordinate_list)

# Convert the string to earth engine Multipolygon object

aoi_47 = ee.Geometry.MultiPolygon(coordinate_list)

# aoi_47.getInfo()

In [98]:
# 2. Fetch the training year data i.e., landsat imagery from 2020 between May to August

landsat_2020 = fetch_landsat_data(location = aoi_47, 
                                   start_date = ee.Date('2020-06-01'), 
                                   end_date = ee.Date('2020-08-31'), 
                                   cloud_cover = 10)


create_map(location = aoi_47, image = landsat_2020)

The date of fetched image is 2020-07-27
The cloud cover is 7.79


Map(center=[50.205892979516676, -126.7178366678171], controls=(WidgetControl(options=['position', 'transparent…

In [99]:
# 3. Fetch the nlcd data to create training samples.

# Here we are loading the land classification data from the USGS Land Cover of North America at 30 meters, 2020 and clipping the area 47
# This will provide labels of classification 
nlcd_2020 = ee.Image('USGS/NLCD_RELEASES/2020_REL/NALCMS').select('landcover').clip(aoi_47)
Map2 = geemap.Map()
Map2.centerObject(aoi_47, 6)
Map2.addLayer(nlcd_2020, {}, 'NLCD')
Map2

Map(center=[50.205892979516676, -126.7178366678171], controls=(WidgetControl(options=['position', 'transparent…

In [100]:
# 4.Fetch the image data for prediction, which must be 2 years apart (we will take, 2022 as prediction year)

landsat_2022 = fetch_landsat_data(location = aoi_47, 
                                   start_date = ee.Date('2022-05-01'), 
                                   end_date = ee.Date('2022-08-31'), 
                                   cloud_cover = 10)


create_map(location = aoi_47, image = landsat_2022)

The date of fetched image is 2022-08-18
The cloud cover is 9.73


Map(center=[50.205892979516676, -126.7178366678171], controls=(WidgetControl(options=['position', 'transparent…

In [101]:
# 5. This is an additional step, to get another image from the year 2022 to understand some images 
# from fall time frame in the desired location.  

landsat_2022_fall = fetch_landsat_data(location = aoi_47, 
                                   start_date = ee.Date('2022-08-31'), 
                                   end_date = ee.Date('2022-12-31'), 
                                   cloud_cover = 10)


create_map(location = aoi_47, image = landsat_2022_fall)

The date of fetched image is 2022-09-19
The cloud cover is 4.39


Map(center=[50.205892979516676, -126.7178366678171], controls=(WidgetControl(options=['position', 'transparent…

#### Model Training:

In this step following tasks will take place:

1. Creating a training data set for machine learning or classification task. This step samples a specific number of points within the defined ROI from the NLCD data, including both pixel values and geometries, which can be used for training and validating machine learning models for land cover classification or other geospatial analyses.
2. Creat a 3 classification model and compare the results. 