# Tutorial: Nash-Sutcliffe Efficiency (NSE) Calculation
In this tutorial, we will explore the Nash-Sutcliffe Efficiency (NSE) calculation function `nse`. NSE is a widely used metric in hydrology and other fields to evaluate the performance of a model by comparing its predictions to observed data.

## Introduction

The Nash-Sutcliffe Efficiency (NSE) is defined as:

$NSE = 1 - \frac{\sum_{i=1}^{n}(f_i - o_i)^2}{\sum_{i=1}^{n}(o_i - \bar{o})^2}$

Where:
- $f_i$ is the forecast or predicted value
- $o_i$ is the observed value
- $\bar{o}$ is the mean of the observed values
- $n$ is the number of data points

A perfect model has an NSE value of 1, while a model performing as poorly as the mean of the observed data has an NSE value of 0 or less.

In hydrological modeling, the Nash–Sutcliffe Efficiency (NSE) is essential for assessing model performance. An NSE of 1 indicates perfect prediction, while 0 suggests the model performs as well as predicting the mean of the data. Negative values imply the observed mean is a better predictor. NSE values closer to 1 denote superior predictive ability. In regression analyses, NSE parallels the coefficient of determination (R²), representing model fit on a scale from 0 to 1.

## Using the `nse` Function

Let's start by importing the `nse` function from our module and exploring its usage with different types of input data.

In [29]:
from scores.continuous import nse
import numpy as np
import xarray as xr
import pandas as pd

np.random.seed(0)  # set the seed to make notebook reproducible

**Example 1**: Xarray DataArray

In [182]:
fcst_xr = xr.DataArray([3, 4, 5, 6, 7])
obs_xr = xr.DataArray([2, 3, 4, 5, 6])
nse_xr = nse(fcst_xr, obs_xr)
print("NSE for Xarray DataArray:", nse_xr)

NSE for Xarray DataArray: <xarray.DataArray ()>
array(0.5)


**Example 2**: Large Xarray DataArrays

In [183]:
fcst_large = xr.DataArray(
    data=np.random.random_sample((1000, 1000)) * 360, 
    dims=["space", "time"], 
    coords=[np.arange(0, 1000),  np.arange(0, 1000)]
)
obs_large = xr.DataArray(
    data=np.random.random_sample((1000, 1000)) * 360, 
    dims=["space", "time"], 
    coords=[np.arange(0, 1000),  np.arange(0, 1000)]
)
nse_large = nse(fcst_large, obs_large)
print("NSE for large Xarray DataArrays:", nse_large)


NSE for large Xarray DataArrays: <xarray.DataArray ()>
array(-1.00120337)


**Example 3**: Angular and array

In [184]:
fcst_xr = xr.DataArray([3, 4, 5, 6, 7])
obs_xr = xr.DataArray([2, 3, 4, 5, 6])
nse_anular = nse(fcst_xr, obs_xr,angular=True)
print("NSE for angular types:", nse_anular )

NSE for angular types: <xarray.DataArray ()>
array(0.5)


**Example 4**: Weight and array

In [185]:
fcst_xr = xr.DataArray([3, 4, 5, 6, 7])
obs_xr = xr.DataArray([2, 3, 4, 5, 6])
weights = np.array([1, 2, 3, 2, 1])
nse_weights=nse(fcst_xr,obs_xr, weights=weights)
print("NSE with weights types:", nse_weights )


NSE with weights types: <xarray.DataArray ()>
array(0.5)


**Example 5**: 2D Array: time and station

In [186]:
def create_synthetic_2d_data():
    # Define dimensions
    time = pd.date_range('2024-01-01', periods=5, freq='D')
    stations = ['Station1', 'Station2', 'Station3']

    # Use specified forecast and observed values
    forecast_data = np.array([[3, 4, 5, 6, 7],
                              [3, 4, 5, 6, 7],
                              [3, 4, 5, 6, 7]]).T  # Transpose to align with dimensions (time, station)
    observed_data = np.array([[2, 3, 4, 5, 6],
                              [2, 3, 4, 5, 6],
                              [2, 3, 4, 5, 6]]).T  # Transpose to align with dimensions (time, station)

    # Create forecast DataArray
    forecast_da = xr.DataArray(
        forecast_data,
        coords={
            'time': time,
            'station': stations
        },
        dims=['time', 'station'],
        name='forecast'
    )

    # Create observed DataArray
    observed_da = xr.DataArray(
        observed_data,
        coords={
            'time': time,
            'station': stations
        },
        dims=['time', 'station'],
        name='observed'
    )

    return forecast_da, observed_da

In [187]:
 # Create synthetic forecast and observed DataArrays
fcst_xr, obs_xr = create_synthetic_2d_data()
    
# Calculate the NSE for the test case
nse_value = nse(fcst_xr, obs_xr)
print("NSE for station:", nse_value)

NSE for station: <xarray.DataArray ()>
array(0.5)


**Example 5**: 3D Array: Ensemble, Station and Time

In [188]:
def create_synthetic_3d_data():
    # Define dimensions
    time = pd.date_range('2024-01-01', periods=5, freq='D')
    stations = ['Station1', 'Station2', 'Station3']
    ensemble = ['Ensemble1', 'Ensemble2', 'Ensemble3']

    # Use specified forecast and observed values
    forecast_data = np.array([[3, 4, 5, 6, 7],
                              [3, 4, 5, 6, 7],
                              [3, 4, 5, 6, 7]]).T  # Transpose to align with dimensions (time, station)
    observed_data = np.array([[2, 3, 4, 5, 6],
                              [2, 3, 4, 5, 6],
                              [2, 3, 4, 5, 6]]).T  # Transpose to align with dimensions (time, station)

    # Repeat data for each ensemble member
    forecast_data = np.repeat(forecast_data[np.newaxis, ...], len(ensemble), axis=0)
    observed_data = np.repeat(observed_data[np.newaxis, ...], len(ensemble), axis=0)

    # Create forecast DataArray
    forecast_da = xr.DataArray(
        forecast_data,
        coords={
            'ensemble': ensemble,
            'time': time,
            'station': stations
        },
        dims=['ensemble', 'time', 'station'],
        name='forecast'
    )

    # Create observed DataArray
    observed_da = xr.DataArray(
        observed_data,
        coords={
            'ensemble': ensemble,
            'time': time,
            'station': stations
        },
        dims=['ensemble', 'time', 'station'],
        name='observed'
    )

    return forecast_da, observed_da

In [189]:
fcst_xr, obs_xr = create_synthetic_3d_data()
    
# Calculate the NSE for the test case
nse_value = nse(fcst_xr, obs_xr)
print("NSE for station:", nse_value)

NSE for station: <xarray.DataArray ()>
array(0.5)


**Example 6**: 3D Array: Time, Station and lead times

In [190]:
import numpy as np
import pandas as pd
import xarray as xr

def create_synthetic_deterministic_data():
    # Define dimensions
    time = pd.date_range('2024-01-01', '2024-01-31', freq='D')
    stations = ['Station1', 'Station2', 'Station3', 'Station4', 'Station5']
    lead_times_forecast = np.arange(1, 8)  # Lead times from 1 to 7 for forecast
    lead_times_observed = np.array([1])    # Lead time of 1 day for observed data

    # Generate synthetic hydrograph data
    np.random.seed(0)  # For reproducibility

    # Base sine wave to simulate periodic streamflow variations
    days = len(time)
    base_flow = 150 + 50 * np.sin(2 * np.pi * np.arange(days) / days)  # Mean of 150, amplitude of 50

    # Reshape base_flow to match dimensions (31, 1, 1) for broadcasting
    base_flow = base_flow[:, np.newaxis, np.newaxis]

    # Adding random fluctuations around the base flow for forecast and observed data
    forecast_data = np.clip(base_flow + 20 * np.random.randn(days, len(stations), len(lead_times_forecast)), 0, 300)
    observed_data = np.clip(base_flow + 20 * np.random.randn(days, len(stations), len(lead_times_observed)), 0, 300)

    # Create forecast DataArray
    forecast_da = xr.DataArray(
        forecast_data,
        coords={
            'time': time,
            'station': stations,
            'lead_time': lead_times_forecast
        },
        dims=['time', 'station', 'lead_time'],
        name='forecast'
    )

    # Create observed DataArray
    observed_da = xr.DataArray(
        observed_data,
        coords={
            'time': time,
            'station': stations,
            'lead_time': lead_times_observed
        },
        dims=['time', 'station', 'lead_time'],
        name='observed'
    )

    return forecast_da, observed_da

In [191]:
# Display the DataArrays
# Create forecast and observed DataArrays
forecast_da, observed_da = create_synthetic_deterministic_data()
print(forecast_da.shape)
print(observed_da.shape)

(31, 5, 7)
(31, 5, 1)


xarray.DataArray'forecast'time: 31 station: 5lead_time: 7
xarray.DataArray'observed'time: 31 station: 5lead_time: 1

NSE Calculation along Lead time

In [192]:
for ilead in forecast_da['lead_time'].values:
    cur_fcst_da = forecast_da.sel(lead_time=ilead )
    nse_value = nse(cur_fcst_da, observed_da).values
    print(f"NSE for leadtime {ilead}:", nse_value)
    
    

NSE for leadtime 1: 0.5723544192588721
NSE for leadtime 2: 0.5626211975440663
NSE for leadtime 3: 0.5190530423334399
NSE for leadtime 4: 0.4552724688067896
NSE for leadtime 5: 0.6035837087071741
NSE for leadtime 6: 0.5388020768178767
NSE for leadtime 7: 0.5045349373163093


**Example 7**: 4D Array: Time, Station, lead times and ensemble

In [193]:


def create_synthetic_ensemble_data():
    # Define dimensions
    time = pd.date_range('2024-01-01', '2024-01-31', freq='D')
    stations = ['Station1', 'Station2', 'Station3', 'Station4', 'Station5']
    lead_times_forecast = np.arange(1, 8)  # Lead times from 1 to 7 for forecast
    lead_times_observed = np.array([1])    # Lead time of 1 day for observed data
    ensemble = [f'Ensemble{i}' for i in range(1, 21)]  # Ensemble members from Ensemble1 to Ensemble20

    # Generate synthetic hydrograph data
    np.random.seed(0)  # For reproducibility

    # Create base flow as a constant value of 30 cumec
    base_flow = 30

    # Adding random fluctuations around the base flow for forecast and observed data
    forecast_data = base_flow + np.random.randint(0, 301, size=(len(time), len(stations), len(lead_times_forecast), len(ensemble)))
    observed_data = base_flow + np.random.randint(0, 301, size=(len(time), len(stations), len(lead_times_observed)))

    # Clip the data to ensure it stays within a certain range
    forecast_data = np.clip(forecast_data, 0, 300)
    observed_data = np.clip(observed_data, 0, 300)

    # Create forecast DataArray
    forecast_da = xr.DataArray(
        forecast_data,
        coords={
            'time': time,
            'station': stations,
            'lead_time': lead_times_forecast,
            'ensemble': ensemble
        },
        dims=['time', 'station', 'lead_time', 'ensemble'],
        name='forecast'
    )

    # Create observed DataArray
    observed_da = xr.DataArray(
        observed_data,
        coords={
            'time': time,
            'station': stations,
            'lead_time': lead_times_observed
        },
        dims=['time', 'station', 'lead_time'],
        name='observed'
    )

    return forecast_da, observed_da


In [194]:
# Create forecast and observed DataArrays
forecast_ensemble_da, observed_da = create_synthetic_ensemble_data()

# Display the DataArrays
#print(forecast_ensemble_da)



In [195]:
print(forecast_ensemble_da.shape)
print(observed_da.shape)

(31, 5, 7, 20)
(31, 5, 1)


<xarray.DataArray 'forecast' (time: 31, station: 5, lead_time: 7, ensemble: 20)>

<xarray.DataArray 'observed' (time: 31, station: 5, lead_time: 1)>

In [196]:
# Initialize an empty list to store results
results_list = []

# Iterate through each ensemble member and lead time
for imember in forecast_ensemble_da['ensemble'].values:
    for ilead in forecast_ensemble_da['lead_time'].values:
        # Select the forecast data for the current ensemble member and lead time
        cur_fcst_da = forecast_ensemble_da.sel(ensemble=imember, lead_time=ilead)
        
        # Calculate NSE value for the selected forecast data and observed data
        nse_value = nse(cur_fcst_da, observed_da).values
        
        # Append the results as a dictionary to the list
        results_list.append({'ensemble': imember, 'lead_time': ilead, 'NSE': nse_value})

# Create a DataFrame from the list of dictionaries
nse_results = pd.DataFrame(results_list)

In [197]:
# Print the DataFrame
print(nse_results.head(3))

    ensemble  lead_time                  NSE
0  Ensemble1          1  -1.1721371704833188
1  Ensemble1          2  -1.0448237401444582
2  Ensemble1          3   -1.060897482889457
