# Zonal Statistics

`rasterstats` is a Python module for summarizing geospatial raster datasets based on vector geometries.

Geospatial data typically comes in one of two data models:
- *rasters* which are similar to images with a regular grid of pixels whose values represent some spatial phenomenon (e.g. elevation)
- *vectors* which are entities with discrete geometries (e.g. state boundaries).

This software, `rasterstats`, exists solely to extract information from geospatial raster data based on vector geometries.

This involves zonal statistics: a method of summarizing and aggregating the raster values intersecting a vector geometry. For example, zonal statistics provides answers such as the mean precipitation or maximum elevation of an administrative unit.

### Statistics

By default, the `zonal_stats` function will return the following statistics

- min
- max
- mean
- count

Optionally, these statistics are also available.

- sum
- std
- median
- majority
- minority
- unique
- range
- nodata
- percentile (see note below for details)

https://pythonhosted.org/rasterstats/manual.html


In [1]:
import glob, os
import numpy as np
import pandas as pd
import geopandas as gpd
import rasterstats
from rasterstats import zonal_stats

print(f'Rasterstats : {rasterstats.__version__}')

Rasterstats : 0.14.0


## Set paths for input and output directories

Create directories if there are missing using `Path` and `mkdir`

In [None]:
grp_letter   = 'X'
student_name = 'ndeffense'

# When you are connected to the computer room
'''
vector_path = 'X:/data/VECTOR/'
raster_path = 'X:/data/RASTER/'
output_path = f'X:/GROUP_{grp_letter}/TP/{student_name}/DATA/'
'''

# When you are connected to your personnal computer
vector_path = '/Users/Nicolas/OneDrive - UCL/LBRAT2104/VECTOR/'
raster_path = '/Volumes/nbdid-sst-lbrat2104/data/RASTER/'
output_path = '/Users/Nicolas/OneDrive - UCL/LBRAT2104/Output/'


print(f'Vector input path are set to : {vector_path}')
print(f'Raster input path are set to : {raster_path}')
print(f'Output path are set to       : {output_path}')

# Directory to store all NDVI images
ndvi_path = f'{output_path}NDVI_ROI/'

# Create directories if missing
Path(ndvi_path).mkdir(parents=True, exist_ok=True)

In [None]:
# Choose the statistic you want to calculate : min, max, mean, count, ...
stat = 'mean'

# File where in-situ polygons are located
in_situ_file = f'{output_path}IN_SITU/WALLONIA_2018_IN_SITU_ROI.shp'

# New file with NDVI statistics for each polygons
in_situ_ndvi_file = f'{in_situ_file[:-4]}_with_NDVI_{stat}.shp'


if not os.path.isfile(in_situ_ndvi_file):

    # Get list of all NDVI files corresponding to tile 31UFS and sort it
    NDVI_list = sorted(glob.glob(f'{ndvi_path}*31UFS*_NDVI_*.tif'))

    # Initiate an empty list to store all "zonal stat DataFrames" that will be created during the loop
    zs_dfs = []

    for NDVI_file in NDVI_list:
        
        # Get date of the NDVI file
        date = os.path.basename(NDVI_file)[9:9+6]

        # Compute the zonal stat and store output in a DataFrame
        zs_df = pd.DataFrame(zonal_stats(vectors=in_situ_file,
                                         raster=NDVI_file,
                                         stats=stat))

        # Replace NaN by -10000 and convert to integer
        zs_df[np.isnan(zs_df)] = nodata_val
        zs_df = zs_df.astype(int)

        # Rename column with the date
        zs_df = zs_df.rename(columns={stat: date})

        # Add the zonal stat dataframe in the list to save it
        zs_dfs.append(zs_df)


    # Once the loop is done, concatenate all the dataframe in one big dataframe
    zs_final = pd.concat(zs_dfs, axis=1)

    # Read in-situ shapefile as a GeoDataFrame
    in_situ_gdf = gpd.read_file(in_situ_file)

    # Join NDVI mean with polygons informations
    in_situ_with_ndvi_gdf = pd.concat([in_situ_gdf, zs_final], axis=1, join="inner")

    # Write into a new shapefile
    in_situ_with_ndvi_gdf.to_file(in_situ_ndvi_file)