<a href="https://colab.research.google.com/github/lawrencejesse/Sentinel2_Extractor/blob/main/NDVI_Mean_and_St_Dev_Bell_Curve_V1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# README

This notebook provides a workflow for analyzing an uploaded NDVI raster. It includes steps to:

1.  Upload and load the raster data.
2.  Handle NoData values.
3.  Calculate and visualize the distribution of valid NDVI values using a histogram.
4.  Calculate and display the mean and standard deviation, and add lines to the histogram representing these statistics.
5.  Calculate the Median Absolute Deviation (MAD) and robust standard deviation.
6.  Perform a test to determine the appropriateness of using standard z-scores.
7.  Visualize the distribution of NDVI values with lines representing the median and robust standard deviations.

## Z-score vs. Robust Z-score Discussion

Standard z-scores measure how many standard deviations a data point is from the mean. They are appropriate for data that is normally distributed or close to normal.

Robust z-scores, on the other hand, measure how many robust standard deviations a data point is from the median. They are calculated using the Median Absolute Deviation (MAD) to estimate the spread of the data.

Based on the analysis performed in this notebook (specifically the test in the previous section), if the distribution of your NDVI data is skewed or contains outliers, using robust z-scores is generally recommended. This is because the median and MAD are less sensitive to extreme values than the mean and standard deviation, providing a more accurate representation of the data's center and spread in such cases. The visualization showing the median and robust standard deviations provides a robust view of the data's typical range.

# Task
Create a histogram of an uploaded NDVI raster, calculate and display the mean, and add lines to the histogram representing the mean and +/- 1 and 2 standard deviations.

## Upload and load raster

### Subtask:
Provide instructions for the user to upload their NDVI raster and load it into a suitable data structure (e.g., using `rasterio`).


**Reasoning**:
I need to provide instructions to the user to upload their file and then provide code to open and read the raster data using rasterio.



In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print(f'User uploaded file "{fn}" with length {len(uploaded[fn])} bytes')

In [None]:
%pip install rasterio

# Please upload your NDVI raster file using the file upload functionality in the environment.
# Once the file is uploaded, replace 'your_uploaded_file.tif' with the actual name of your uploaded file.

import rasterio
import numpy as np

try:
    # Replace 'your_uploaded_file.tif' with the actual name of your uploaded file
    # Assuming the user uploaded a single file, we can get the filename from the 'uploaded' dictionary
    if uploaded:
      uploaded_filename = list(uploaded.keys())[0]
      with rasterio.open(uploaded_filename) as src:
          # Read the first band of the raster data
          ndvi_data = src.read(1)

      print("NDVI raster data loaded successfully into a NumPy array.")
      # You can optionally display some information about the loaded data
      print(f"Data shape: {ndvi_data.shape}")
      print(f"Data type: {ndvi_data.dtype}")
      print(f"Min value: {np.min(ndvi_data)}")
      print(f"Max value: {np.max(ndvi_data)}")

      # Handle NoData values (-10000 and -9999)
      nodata_values = [-10000, -9999]
      ndvi_data_valid = ndvi_data[~np.isin(ndvi_data, nodata_values)]


      print(f"Number of valid data points: {ndvi_data_valid.size}")

    else:
      print("No file was uploaded. Please upload your NDVI raster file.")

except rasterio.errors.RasterioIOError:
    print("Error: Could not open or read the raster file.")
    print("Please ensure you have uploaded the file and that it is a valid raster file.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

In [None]:
# Calculate mean and standard deviation, excluding NoData values
mean_ndvi = np.mean(ndvi_data_valid)
std_ndvi = np.std(ndvi_data_valid)

print(f"Mean NDVI (excluding NoData): {mean_ndvi}")
print(f"Standard Deviation of NDVI (excluding NoData): {std_ndvi}")

**Reasoning**:
The previous command failed because the `rasterio` library is not installed. I need to install `rasterio` and then re-run the code to open and read the raster file.



In [None]:
%pip install rasterio

import rasterio
import numpy as np

try:
    # Replace 'your_uploaded_file.tif' with the actual name of your uploaded file
    with rasterio.open('your_uploaded_file.tif') as src:
        # Read the first band of the raster data
        ndvi_data = src.read(1)

    print("NDVI raster data loaded successfully into a NumPy array.")
    # You can optionally display some information about the loaded data
    print(f"Data shape: {ndvi_data.shape}")
    print(f"Data type: {ndvi_data.dtype}")
    print(f"Min value: {np.min(ndvi_data)}")
    print(f"Max value: {np.max(ndvi_data)}")

except rasterio.errors.RasterioIOError:
    print("Error: Could not open or read the raster file.")
    print("Please ensure you have uploaded the file and replaced 'your_uploaded_file.tif' with the correct filename.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.colors as mcolors

# Assuming ndvi_data_valid is already created from the previous step
# If ndvi_data_valid is not available, the previous steps need to be executed first.

if 'ndvi_data_valid' in locals() and ndvi_data_valid is not None:
    plt.figure(figsize=(10, 6))

    # Create a colormap from red to yellow to green
    cmap = mcolors.LinearSegmentedColormap.from_list("ryg", ["red", "yellow", "green"])

    # Plot the histogram
    n, bins, patches = plt.hist(ndvi_data_valid.flatten(), bins=50, edgecolor='black')

    # Apply the color gradient to the bars
    # Adjusting the normalization for more contrast
    bin_centers = 0.5 * (bins[:-1] + bins[1:])
    # Scale the bin centers to the range [0, 1] for the colormap, with a focus on increasing contrast
    # You can experiment with the scaling factor (e.g., multiplying by a value > 1 or using a power)
    # Here, we'll use a simple scaling that stretches the extremes more.
    normalized_bin_centers = (bin_centers - np.min(bin_centers)) / (np.max(bin_centers) - np.min(bin_centers))
    # Applying a power transformation to increase contrast (e.g., power > 1 for darker ends, < 1 for lighter ends)
    # Let's try a power of 2 to push values towards the ends
    contrasted_colors = cmap(normalized_bin_centers**1.5)


    for c, p in zip(contrasted_colors, patches):
        plt.setp(p, 'facecolor', c)


    plt.title('Distribution of NDVI Values (excluding NoData) with R-Y-G Gradient')
    plt.xlabel('NDVI Value')
    plt.ylabel('Frequency')
    plt.grid(axis='y', alpha=0.75)
    plt.show()
else:
    print("Valid NDVI data is not available. Please ensure the previous steps to load and handle NoData were successful.")

**Reasoning**:
Generate a histogram of the raster data using matplotlib, add a title and axis labels, and display the plot.



## Add statistical lines

### Subtask:
Add vertical lines to the histogram representing the mean and +/- 1 and 2 standard deviations.


## Add statistical lines

### Subtask:
Add vertical lines to the histogram representing the mean and +/- 1 and 2 standard deviations.


In [None]:
import matplotlib.pyplot as plt
import numpy as np
import rasterio

try:
    # Important: Replace 'your_uploaded_file.tif' with the actual name of your uploaded NDVI raster file.
    if uploaded:
      uploaded_filename = list(uploaded.keys())[0]
      with rasterio.open(uploaded_filename) as src:
          # Read the first band of the raster data
          ndvi_data = src.read(1)

      print("NDVI raster data loaded successfully into a NumPy array.")

      # Handle NoData values (-10000 and -9999)
      nodata_values = [-10000, -9999]
      ndvi_data_valid = ndvi_data[~np.isin(ndvi_data, nodata_values)]


      print(f"Number of valid data points: {ndvi_data_valid.size}")

      # Calculate mean and standard deviation
      mean_ndvi = np.mean(ndvi_data_valid)
      std_ndvi = np.std(ndvi_data_valid)

      print(f"Mean NDVI (excluding NoData): {mean_ndvi}")
      print(f"Standard Deviation of NDVI (excluding NoData): {std_ndvi}")

      plt.figure(figsize=(12, 7)) # Increased figure size
      plt.hist(ndvi_data_valid.flatten(), bins=100, color='#607c8e', edgecolor='#333333', alpha=0.7) # Changed color, added edge color and transparency
      plt.title('Distribution of NDVI Values with Mean and Standard Deviations (excluding NoData)', fontsize=16) # Increased title font size
      plt.xlabel('NDVI Value', fontsize=12) # Increased xlabel font size
      plt.ylabel('Frequency', fontsize=12) # Increased ylabel font size
      plt.grid(axis='y', alpha=0.5) # Adjusted grid transparency

      # Add vertical lines for mean and standard deviations with improved styling
      plt.axvline(mean_ndvi, color='crimson', linestyle='solid', linewidth=2, label=f'Mean ({mean_ndvi:.2f})')
      plt.axvline(mean_ndvi + std_ndvi, color='forestgreen', linestyle='dashed', linewidth=2, label=f'+1 Std Dev ({mean_ndvi + std_ndvi:.2f})')
      plt.axvline(mean_ndvi - std_ndvi, color='darkorange', linestyle='dashed', linewidth=2, label=f'-1 Std Dev ({mean_ndvi - std_ndvi:.2f})')
      plt.axvline(mean_ndvi + 2 * std_ndvi, color='rebeccapurple', linestyle='dotted', linewidth=2, label=f'+2 Std Dev ({mean_ndvi + 2 * std_ndvi:.2f})')
      plt.axvline(mean_ndvi - 2 * std_ndvi, color='saddlebrown', linestyle='dotted', linewidth=2, label=f'-2 Std Dev ({mean_ndvi - 2 * std_ndvi:.2f})')

      plt.legend(fontsize=10) # Added legend with adjusted font size
      plt.tight_layout() # Adjust layout to prevent labels overlapping
      plt.show()

    else:
      print("No file was uploaded. Please upload your NDVI raster file.")

except rasterio.errors.RasterioIOError:
    print("Error: Could not open or read the raster file.")
    print("Please ensure you have uploaded the file and replaced 'your_uploaded_file.tif' with the correct filename in the code above.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

## Visualize histogram

### Subtask:
Visualize the histogram with the added lines.


In [None]:
# Perform the test for using standard z-scores

# Check the first condition: |μ − m| ≤ 0.2 σ
mean_median_diff = abs(mean_ndvi - median_ndvi)
std_dev_threshold = 0.2 * std_ndvi

condition1 = mean_median_diff <= std_dev_threshold

# Check the second condition: 0.8 ≤ (MADs / σ) ≤ 1.2
mad_s_over_std = robust_std_ndvi / std_ndvi

condition2 = (mad_s_over_std >= 0.8) and (mad_s_over_std <= 1.2)

# Determine if standard z-scores are appropriate
if condition1 and condition2:
    print("Based on the rule of thumb, the distribution is close enough; standard z-scores are appropriate.")
else:
    print("Based on the rule of thumb, the distribution is not close enough; use robust z-scores for this year.")

# Optionally, print the values for reference
print(f"\n|μ − m|: {mean_median_diff:.4f}")
print(f"0.2 σ: {std_dev_threshold:.4f}")
print(f"MADs / σ: {mad_s_over_std:.4f}")

In [None]:
# Calculate the Median Absolute Deviation (MAD)
median_ndvi = np.median(ndvi_data_valid)
mad_ndvi = np.median(np.abs(ndvi_data_valid - median_ndvi))

# Calculate the robust standard deviation
robust_std_ndvi = 1.4826 * mad_ndvi

print(f"Median NDVI (excluding NoData): {median_ndvi}")
print(f"Median Absolute Deviation (MAD) of NDVI (excluding NoData): {mad_ndvi}")
print(f"Robust Standard Deviation of NDVI (excluding NoData): {robust_std_ndvi}")

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Assuming ndvi_data_valid, median_ndvi, and robust_std_ndvi are available from previous steps

if 'ndvi_data_valid' in locals() and ndvi_data_valid is not None and \
   'median_ndvi' in locals() and median_ndvi is not None and \
   'robust_std_ndvi' in locals() and robust_std_ndvi is not None:

    plt.figure(figsize=(12, 7)) # Increased figure size
    plt.hist(ndvi_data_valid.flatten(), bins=100, color='#607c8e', edgecolor='#333333', alpha=0.7) # Changed color, added edge color and transparency
    plt.title('Distribution of NDVI Values with Median and Robust Standard Deviations (excluding NoData)', fontsize=16) # Increased title font size
    plt.xlabel('NDVI Value', fontsize=12) # Increased xlabel font size
    plt.ylabel('Frequency', fontsize=12) # Increased ylabel font size
    plt.grid(axis='y', alpha=0.5) # Adjusted grid transparency

    # Add vertical lines for median and robust standard deviations with improved styling
    plt.axvline(median_ndvi, color='purple', linestyle='solid', linewidth=2, label=f'Median ({median_ndvi:.2f})')
    plt.axvline(median_ndvi + robust_std_ndvi, color='darkgreen', linestyle='dashed', linewidth=2, label=f'+1 Robust Std Dev ({median_ndvi + robust_std_ndvi:.2f})')
    plt.axvline(median_ndvi - robust_std_ndvi, color='darkorange', linestyle='dashed', linewidth=2, label=f'-1 Robust Std Dev ({median_ndvi - robust_std_ndvi:.2f})')
    plt.axvline(median_ndvi + 2 * robust_std_ndvi, color='indigo', linestyle='dotted', linewidth=2, label=f'+2 Robust Std Dev ({median_ndvi + 2 * robust_std_ndvi:.2f})')
    plt.axvline(median_ndvi - 2 * robust_std_ndvi, color='sienna', linestyle='dotted', linewidth=2, label=f'-2 Robust Std Dev ({median_ndvi - 2 * robust_std_ndvi:.2f})')


    plt.legend(fontsize=10) # Added legend with adjusted font size
    plt.tight_layout() # Adjust layout to prevent labels overlapping
    plt.show()

else:
    print("Required data (ndvi_data_valid, median_ndvi, robust_std_ndvi) is not available. Please ensure previous steps were successful.")