# Zonal Statistics

`rasterstats` is a Python module for summarizing geospatial raster datasets based on vector geometries.

Geospatial data typically comes in one of two data models:
- *rasters* which are similar to images with a regular grid of pixels whose values represent some spatial phenomenon (e.g. elevation)
- *vectors* which are entities with discrete geometries (e.g. state boundaries).

This software, `rasterstats`, exists solely to extract information from geospatial raster data based on vector geometries.

This involves zonal statistics: a method of summarizing and aggregating the raster values intersecting a vector geometry. For example, zonal statistics provides answers such as the mean precipitation or maximum elevation of an administrative unit.

### Statistics

By default, the `zonal_stats` function will return the following statistics

- min
- max
- mean
- count

Optionally, these statistics are also available.

- sum
- std
- median
- majority
- minority
- unique
- range
- nodata
- percentile (see note below for details)

https://pythonhosted.org/rasterstats/manual.html


In [1]:
import glob, os
import numpy as np
import pandas as pd
import geopandas as gpd
import rasterstats
from rasterstats import zonal_stats
from pathlib import Path

print(f'Rasterstats : {rasterstats.__version__}')

Rasterstats : 0.14.0


## Choose the spectral index and the zonal statistic you want to calculate

- min
- max
- mean
- count

In [2]:
index = 'NDVI'

stat = 'mean'

nodata_val = -10000

## Set paths for input and output directories

Create directories if there are missing using `Path` and `mkdir`

In [3]:
#computer_path = 'X:/'
computer_path = '/Volumes/nbdid-sst-lbrat2104/'
grp_letter    = 'X'

# Directory for all work files
work_path = f'{computer_path}GROUP_{grp_letter}/WORK/'


# ----- #
# INPUT #
# ----- #

# Directory where images are located
index_path = f'{work_path}{index}/'

# File where polygons are located
in_situ_file = f'{work_path}IN_SITU/WALLONIA_2018_IN_SITU_ROI.shp'

# ------ #
# OUTPUT #
# ------ #

zonal_path = f'{work_path}ZONAL_STATS/'

Path(zonal_path).mkdir(parents=True, exist_ok=True)

print(f'Zonal Stats path are set to : {zonal_path}')

Zonal Stats path are set to : /Volumes/nbdid-sst-lbrat2104/GROUP_X/WORK/ZONAL_STATS/


## Compute zonal statistics

In [4]:
# New file with NDVI statistics for each polygons
in_situ_ndvi_file = f'{zonal_path}{os.path.basename(in_situ_file)[:-4]}_with_{stat}_{index}.shp'

if not os.path.isfile(in_situ_ndvi_file):

    # Get list of all spetral index files and sort it
    index_list = sorted(glob.glob(f'{index_path}*_{index}*.tif'))

    # Initiate an empty list to store all "zonal stat DataFrames" that will be created during the loop
    zs_dfs = []

    for index_file in index_list:
        
        # Get date of the NDVI file
        date = os.path.basename(index_file)[9:9+6]
        print(date)

        # Compute the zonal stat and store output in a DataFrame
        zs_df = pd.DataFrame(zonal_stats(vectors=in_situ_file,
                                         raster=index_file,
                                         stats=stat))

        # Replace NaN by -10000
        zs_df[np.isnan(zs_df)] = nodata_val

        # Convert to integer (if needed)
        #zs_df = zs_df.astype(int)

        # Rename column with the date
        zs_df = zs_df.rename(columns={stat: date})

        # Add the zonal stat dataframe in the list to save it
        zs_dfs.append(zs_df)

    
    # Once the loop is done, concatenate all the dataframe in one big dataframe
    zs_final = pd.concat(zs_dfs, axis=1)
    print(zs_final)

    
    # Read in-situ shapefile as a GeoDataFrame
    in_situ_gdf = gpd.read_file(in_situ_file)

    # Join NDVI mean with polygons informations
    in_situ_with_ndvi_gdf = pd.concat([in_situ_gdf, zs_final], axis=1, join="inner")

    # Write into a new shapefile
    in_situ_with_ndvi_gdf.to_file(in_situ_ndvi_file)
