# Creating green and blue space indicators from Sentinel-2 data

The notebook will estimate the following statistics for each household, as defined as the Unique Property Reference Number (UPRN), across Cheshire and Merseyside:
* Normalised Difference Vegetation Index (NDVI)
* EVI
* NWDI


## Set up the environment

We will use Google Earth Engine (GEE) to access Sentinel-2 images. The strength of using GEE is that we can store and process all the images in the cloud, saving space. Let's start with installing GEE and arrow.

In [21]:
# Install required packages
# ! pip install earthengine-api # Comes already installed in Colab so leave here for local running
! pip install pyarrow pyproj



Next we need to set up Python for the neccessary packages and link the notebook to our set up Groundswell GEE project.

In [10]:
# Load required libaries
import ee
import pandas as pd
import numpy as np
from shapely.geometry import Point
from pyproj import Proj, transform

# Set up Google Earth Engine (GEE) module
ee.Authenticate(auth_mode = "colab") # Links GEE to your Google Account and defines that working in Colab (can change to 'localhost' if working on a local machine)
ee.Initialize(project = "ee-groundswelluk") # Link to the registered project within GEE

Finally, we load in our UPRN dataset and wrangle it into the format we need for later.

In [49]:
# Load UPRNs
# If running locally, then you can load in as normal
# If using Colab, the quickest route is to upload the file manually by clicking the left hand folder button
# The file is pre-processed in 'get_uprns_cm.R'
points = pd.read_parquet("/content/uprns_cm.parquet") # Load file in
print(points.head()) # Check has loaded in ok


       UPRN   latitude  longitude
0  38000001  53.419158  -2.912226
1  38000002  53.419105  -2.912180
2  38000003  53.419060  -2.912179
3  38000004  53.419006  -2.912163
4  38000005  53.418952  -2.912162


## Process images
We next establish which images we want to use (Sentinel-2), define the region (Cheshire and Merseyside) and time period (2024) of interest, and identify the information that we want to extract from them (NDVI, EVI and NWDI).

In [40]:
# Create function that gets images
def fetch_indices(geometry, start_date, end_date):
    # Load Sentinel 2 images for defined time period and spatial extent (harmonised version is main version now and allows for time series estimation too from 28th March 2017 onwards)
    s2 = ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED') \
        .filterDate(start_date, end_date) \
        .filterBounds(geometry) \
        .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20)) # Only select images where cloud cover is <20% (20% is a decent figure given the findings of https://doi.org/10.1080/2150704X.2012.744486)

    # Function calculates NDVI, EVI, and NDWI from images
    def add_indices(image):
        ndvi = image.normalizedDifference(['B8', 'B4']).rename('NDVI') # NDVI
        evi = image.expression( # EVI
            '2.5 * ((NIR - RED) / (NIR + 6 * RED - 7.5 * BLUE + 1))', {
                'NIR': image.select('B8'),
                'RED': image.select('B4'),
                'BLUE': image.select('B2')
            }).rename('EVI')
        ndwi = image.normalizedDifference(['B8', 'B11']).rename('NDWI') # NDWI
        return image.addBands([ndvi, evi, ndwi])

    s2 = s2.map(add_indices)

    # Composite the images to get a single image
    composite = s2.median().select(['NDVI', 'EVI', 'NDWI'])

    return composite

# Define the region of interest (Cheshire and Merseyside; N 53.685701, S 52.946957, W -3.249978, E -1.974355)
aoi = ee.Geometry.Polygon([
    [[-3.249978, 52.946957], [-3.249978, 53.685701], [-1.974355, 53.685701], [-1.974355, 52.946957], [-3.249978, 52.946957]] # requires first and last to be the same to close the polygon
])

# Define the time range
start_date = '2023-05-01'
end_date = '2023-09-30'

# Fetch the indices image
indices_image = fetch_indices(aoi, start_date, end_date)


We then create a function to extract the values for each point.

In [None]:
# Function to extract average indices values within a buffer
def extract_average_indices(indices_image, points, buffer_radius):
    # Convert points to Earth Engine Geometry
    ee_points = [ee.Geometry.Point(lon, lat) for lon, lat in zip(points['longitude'], points['latitude'])]

    # Function to get average indices values for a point
    def get_average_indices(point):
        buffer = point.buffer(buffer_radius)  # Create a buffer around the point
        indices_mean = indices_image.reduceRegion(
            reducer=ee.Reducer.mean(),
            geometry=buffer,
            scale=10
        )
        return indices_mean.getInfo()

    # Extract average indices values for all points
    average_indices_values = [get_average_indices(point) for point in ee_points]

    return average_indices_values

## Sample 100 rows for testing
#points = points.sample(n=100, random_state=42)

# Define the buffer radius in meters
buffer_radius = 300

# Extract average indices values for each point within the buffer
average_indices_values = extract_average_indices(indices_image, points, buffer_radius)

# Add indices values to the DataFrame `points`
points['ndvi'] = [v.get('NDVI') for v in average_indices_values]
points['evi'] = [v.get('EVI') for v in average_indices_values]
points['ndwi'] = [v.get('NDWI') for v in average_indices_values]

# Print the DataFrame to check if it has worked
print(points.head())


Briefly check if there is any missing data produced.

In [46]:
# Check for existence of missing data
missing_data = points.isnull().any() # Subset all missing data
print("Columns with missing data:") # Print text description prior to output on next line
print(missing_data[missing_data].index.tolist()) # Print columns with missing data

Columns with missing data:
[]


We can also compute some summary descriptive statistics to see what our measures look like, as a brief check that they look ok.

In [47]:
# Summary statistics
summary_stats = points.describe() # Calculate descriptive statistics for each variable in the data
print(summary_stats) # Print values

               UPRN    latitude   longitude        ndvi         evi  \
count  1.000000e+02  100.000000  100.000000  100.000000  100.000000   
mean   3.223321e+10   53.360270   -2.792277    0.482710    1.442191   
std    5.489829e+10    0.111855    0.221799    0.139009    0.367752   
min    3.805549e+07   53.068412   -3.150536    0.133166    0.514840   
25%    3.902690e+07   53.291319   -2.953024    0.404337    1.223047   
50%    4.210227e+07   53.389669   -2.866512    0.495029    1.460489   
75%    3.257423e+10   53.417822   -2.667753    0.552867    1.655328   
max    2.000029e+11   53.641830   -2.132856    0.853834    2.387883   

             ndwi  
count  100.000000  
mean     0.111102  
std      0.100771  
min     -0.074162  
25%      0.046727  
50%      0.104962  
75%      0.156529  
max      0.403671  


Finally, we should save the dataset.

In [31]:
points.to_csv("/content/satellite_indicators_uprn_cm_2023.csv", index = False) # Save - make sure to download manually