# Landcover Notebook

This notebook is being used to analyze the landcover in the area surrounding Lake Malawi. The work relies on data from the Sentinel-2 earth observation mission.

## Dependencies

Here we'll import dependencies that can be used in other cells in this notebook.

In [75]:
import folium
import json
import numpy as np
import pyproj
import rasterio
from rasterio.mask import mask
from shapely.geometry import box, shape
from shapely.ops import transform

## Helpers

Here we'll define some helper methods that can be used in other cells in this notebook.

In [76]:
years = [2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024]

# File helpers

def input_path(filename):
  return f"../data/input/{filename}"

def output_path(filename):
  return f"../data/output/{filename}"

# Coordinate helpers

def get_crs(path):
  with rasterio.open(path) as file:
    return file.crs

def peek_coordinates(coordinates):
  # Peek at the data
  first_three_coordinates = coordinates[:3]
  last_three_coordinates = coordinates[-3:]
  print(f"{first_three_coordinates}...{last_three_coordinates}")

# Raster helpers
def create_mask(file, type, coords):
  geometry = {
    "type": type,
    "coordinates": coords
  }
  return mask(file, [geometry], crop=True, nodata=file.nodata)
  
def get_stats_from_mask(masked_data, file):
  valid_data = masked_data[masked_data != file.nodata]
  unique_values, counts = np.unique(valid_data, return_counts=True)
  stats = {
    "classes": {},
    "num_classes": len(unique_values),
    "num_pixels": len(valid_data),
  }
  for value, count in zip(unique_values, counts):
    percentage = (count / len(valid_data)) * 100
    stats["classes"][f"Class {int(value)}"] = {
      "pixels": count,
      "percentage": percentage 
    }
  return stats


## Lake Malawi Coordinates

We need to get the coordinates of the perimiter of the Lake Malawi. For this, we'll use a geojson file obtained from PyGeoAPI.

In [77]:
with open(input_path("lake_malawi.json"), 'r') as f:
    malawi_data = json.load(f)

malawi_coordinates = malawi_data['geometry']['coordinates'][0][0]
peek_coordinates(malawi_coordinates)

[35.2602047055542, -14.277474460510291]...[35.2602047055542, -14.277474460510291]


## Map Lake Malawi

Lets take a peek at Lake Malawi.

In [78]:

map = folium.Map(location=[-11.6701, 34.6857], zoom_start=7)
folium.GeoJson(malawi_data).add_to(map)
map

## Coordinate Reference System

In order to buffer the coordinates for Lake Malawi by 25km around the permiter and subsequently mask the geotiff files, we need to find out what coordinate reference system (CRS) we are working with. The CRS is embedded in the geotiff files.

In [79]:
crs = None
for year in years:
  crs_from_file = get_crs(input_path(f"malawi_{year}.tif"))
  if crs and crs != crs_from_file:
    raise "File has unexpected CRS"
  crs = crs_from_file
print(crs)

EPSG:32736


## Buffered Lake Malawi Coordinates

Using the Lake Malawi coordinates and the known CRS, we will define a 25km buffer, defining new coordinates for the region of interest, which will be the lake and the 25km surrounding it.

In [80]:
malawi_geometry = shape(malawi_data['geometry'])

# Define coordinate transformation
# WGS84 (lat/lon) to a projected CRS so we can work accurately with area
transformer_to_utm = pyproj.Transformer.from_crs("EPSG:4326", crs, always_xy=True)
transformer_to_wgs84 = pyproj.Transformer.from_crs(crs, "EPSG:4326", always_xy=True)

# Transform to UTM (meters)
malawi_utm = transform(transformer_to_utm.transform, malawi_geometry)

# Buffer by 25km (25000 meters)
malawi_buffered = malawi_utm.buffer(25000)
malawi_buffered_coordinates = [list(malawi_buffered.exterior.coords)]

# Get the coordinates of the buffered perimter
malawi_buffered_wgs84 = transform(transformer_to_wgs84.transform, malawi_buffered)
malawi_buffered_wgs84_coordinates = [list(malawi_buffered_wgs84.exterior.coords)]
peek_coordinates(malawi_buffered_wgs84_coordinates)



[[(35.47707019386036, -14.19831045178232), (35.48353826652505, -14.217793727933875), (35.488171765266706, -14.237768211671032), (35.490932165245425, -14.258069838171183), (35.49179629546826, -14.278531816488693), (35.49075654464328, -14.298985995982939), (35.48782094029584, -14.319264245254097), (35.463971413975486, -14.44308510613051), (35.45876973377491, -14.464348288505134), (35.451481982267545, -14.485021816654664), (35.44217586254426, -14.504911914042387), (35.43093805487944, -14.523832086055018), (35.41787342399601, -14.541604873149732), (35.40310405270016, -14.55806352140359), (35.386768110284095, -14.57305355451169), (35.369018565791535, -14.586434232128523), (35.3500217578672, -14.598079880450072), (35.32995583444355, -14.607881082088564), (35.30900907693561, -14.615745713587659), (35.28737812489372, -14.621599820352737), (35.26526611818628, -14.625388320315738), (35.24288077472824, -14.627075529301862), (35.22043242252222, -14.626645502797926), (35.19813200532235, -14.6241021

## Save Buffered Lake Malawi Coordinates

We will save a new geojson file with the buffered coordinates.

In [82]:
expanded_geojson = {
  "type": "Feature",
  "properties": {
    "id": 10,
    "scalerank": 0,
    "name": "Lake Malawi",
    "name_alt": "Lake Nyasa",
    "admin": "admin-0",
    "featureclass": "Lake"
  },
  "geometry": {
    "type": malawi_buffered_wgs84.geom_type,
    "coordinates": malawi_buffered_wgs84_coordinates
  },
  "id": 10
}

with open(output_path("lake_malawi_expanded_25km.json"), "w") as dst:
  json.dump(expanded_geojson, dst, indent=2)

## Map Lake Malawi and Surrounding 25KM

Map the expanded boundary of Lake Malawi.

In [83]:
folium.GeoJson(expanded_geojson).add_to(map)
map

## Analyze Land Cover within Buffered Lake Malawi Coordinates

The geotiff files contain the land cover data for a large region surrounding Lake Malawi. We'll mask the files to get the cover for the immediate area surrounding the lake.

In [85]:
for year in years:
  with rasterio.open(input_path(f"malawi_{year}.tif")) as file:
    masked_data, masked_transform = create_mask(
      file,
      malawi_buffered.geom_type,
      malawi_buffered_coordinates
    )
    bounding_box = box(*file.bounds)
    if bounding_box.intersects(malawi_buffered):
      print(f"\n{year}")
      
      stats = get_stats_from_mask(masked_data, file)
      print(f"Number of pixels: {stats["num_pixels"]}")
      print(f"Number of land cover classes: {stats["num_classes"]}")

      for cls in stats["classes"]:
        print(f"{cls}: {stats["classes"][cls]["pixels"]:,} pixels ({stats["classes"][cls]["percentage"]:.2f}%)")
    else:
      print("No overlap detected. Mask is not correct")

    output_profile = file.profile.copy()
    output_profile.update({
      'height': masked_data.shape[1],
      'width': masked_data.shape[2],
      'transform': masked_transform
    })
    with rasterio.open(output_path(f"malawi_{year}_masked.tif"), 'w', **output_profile) as dst:
      dst.write(masked_data)


2017
Number of pixels: 594349727
Number of land cover classes: 8
Class 1: 296,062,405 pixels (49.81%)
Class 2: 94,756,092 pixels (15.94%)
Class 4: 279,097 pixels (0.05%)
Class 5: 23,663,623 pixels (3.98%)
Class 7: 8,954,681 pixels (1.51%)
Class 8: 163,789 pixels (0.03%)
Class 10: 5,742 pixels (0.00%)
Class 11: 170,464,298 pixels (28.68%)

2018
Number of pixels: 594349727
Number of land cover classes: 8
Class 1: 296,140,745 pixels (49.83%)
Class 2: 110,617,108 pixels (18.61%)
Class 4: 544,220 pixels (0.09%)
Class 5: 22,283,444 pixels (3.75%)
Class 7: 10,205,910 pixels (1.72%)
Class 8: 96,417 pixels (0.02%)
Class 10: 84,383 pixels (0.01%)
Class 11: 154,377,500 pixels (25.97%)

2019
Number of pixels: 594349727
Number of land cover classes: 8
Class 1: 296,188,895 pixels (49.83%)
Class 2: 110,611,997 pixels (18.61%)
Class 4: 458,157 pixels (0.08%)
Class 5: 23,421,564 pixels (3.94%)
Class 7: 10,550,809 pixels (1.78%)
Class 8: 83,680 pixels (0.01%)
Class 10: 25,875 pixels (0.00%)
Class 11: 1