
#Drought Identification and Trend Analysis using Google Earth Engine (GEE) - Admin 2 Level

Author - Samuel Gartenstein

Date - Oct 2024

Version - 1.0


[**Google Earth Engine**](https://earthengine.google.com/) is a public data archive of petabytes of historical satellite imagery and geospatial datasets. The advantage lies in its remarkable computation speed as processing is outsourced to Google servers. The platform provides a variety of constantly updated datasets; no download of raw imagery is required. While it is free of charge, one still needs to activate access to Google Earth Engine with a valid Google account.

The Jupyter Notebook uses the following [paper](https://www.mdpi.com/2071-1050/13/3/1042) to recreate the analysis for drought identification and trend analysis in Kenya using Google Earth Engine (GEE) for Python, follow these steps:

## Analysis Steps

### Step 1: Data Collection

Obtain long-term satellite-derived precipitation data using the CHIRPS dataset available in GEE. This data will be used to analyze drought conditions in Kenya.

### Step 2: Study Area Definition

Define the study area using Kenya's geographical boundaries.

### Step 3: Data Preprocessing

- **Extract Monthly Precipitation**: Extract monthly CHIRPS precipitation data for the study area to calculate Standardized Precipitation Index (SPI) at different time scales (e.g., SPI1, SPI3, SPI6, SPI12).
- **Clip to Region**: Clip the precipitation data to Kenya's boundaries to ensure that the analysis focuses solely on Kenya. This step has already been implemented using GEE.

### Step 4: Calculate Standardized Precipitation Index (SPI)

- **SPI Calculation**: Calculate SPI at different time scales (1-, 3-, 6-, and 12-month) for drought evaluation. Convert the CHIRPS precipitation data into SPI values by fitting a gamma distribution to each pixel's monthly time series.

### Step 5: Drought Characterization

- **Identify Drought Events**: Use run theory to identify drought events based on SPI values.
- **Drought Duration**: Identify the number of months with SPI values below thresholds like -1.0 for moderate drought.
- **Drought Severity**: Calculate the severity by summing all SPI values during the drought.
- **Drought Intensity**: Calculate drought intensity as severity divided by duration.

### Step 6: Trend Analysis

- **Mann-Kendall Test**: Implement the Mann-Kendall trend test to determine trends in SPI values at annual, seasonal, and monthly time scales.
- **Sen’s Slope Estimator**: Apply Sen’s slope estimator to understand the magnitude of the detected trends.

### Step 7: Clustering Analysis of Drought Metrics

This step aims to perform a comprehensive clustering analysis on drought metrics across administrative units in Kenya, including data preprocessing, dimensionality reduction, and clustering.

#### Workflow Steps

- **Data Preprocessing**:
  - Encode categorical features, and normalize numerical metrics using `MinMaxScaler`.
- **Feature Engineering**:
  - Construct a feature vector (`feature_vector`) for clustering, including encoded categorical features and  normalized numerical metrics.
- **Dimensionality Reduction Using PCA**:
  - Determine the optimal number of components for dimensionality reduction by analyzing cumulative explained variance and applying PCA accordingly.
- **Clustering Analysis**:
  - **K-Means Clustering**:
    - Determine the optimal number of clusters using the **Elbow Method** and apply K-Means.
    - Visualize the clusters geospatially.
  - **Hierarchical Clustering**:
    - Create a linkage matrix using **Ward's method** and determine the optimal clusters using a dendrogram.
    - Assign cluster labels and visualize geospatially.
  - **DBSCAN Clustering**:
    - Perform a grid search to determine the best parameters (`eps`, `min_samples`) and use the **Silhouette Score** for optimization.
    - Assign cluster labels and visualize geospatially.


### Step 8: EM-DAT Analysis

Compare historical drought years from the [Emergency Events Database (EM-DAT)](https://www.emdat.be/) to SPI-derived results.

### DISCLAIMER

This is a set of scripts  shared for educational purposes only.  Anyone who uses this code or its
functionality or structure, assumes full liability and credits the author.

#### Map Disclaimer

The designations employed and the presentation of the material on this map do not imply the expression
of any opinion whatsoever on the part of the author concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its
frontiers or boundaries.


In [None]:
pip install pymannkendall

Collecting pymannkendall
  Downloading pymannkendall-1.4.3-py3-none-any.whl.metadata (14 kB)
Downloading pymannkendall-1.4.3-py3-none-any.whl (12 kB)
Installing collected packages: pymannkendall
Successfully installed pymannkendall-1.4.3


In [None]:
import ee
import geemap
import numpy as np
import pandas as pd
from scipy.stats import gamma, norm, kstest, probplot
import time
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import matplotlib.patches as mpatches
from ipywidgets import interact, Dropdown, fixed, widgets, Checkbox, RadioButtons
import pymannkendall as mk
import json
import geopandas as gpd
import seaborn as sns
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler, LabelEncoder
from sklearn.cluster import KMeans, DBSCAN
from sklearn.decomposition import PCA
from scipy.cluster.hierarchy import dendrogram, linkage, fcluster
from sklearn.metrics import silhouette_score

In [None]:
# Initialize Google Earth Engine
def initialize_ee():
    ee.Authenticate()
    ee.Initialize(project='ee-sg4283') #Change it to your project ID

initialize_ee()


### Define Analysis Boundaries

We use the following crop calendar to select the regions and time periods to focus on.

We define boundaries for the Western and Rift Valley, and Eastern and Northern Kenya. We use the FAO GAUL: Global Administrative Unit Layers for the analysis.

The Global Administrative Unit Layers (GAUL) compiles and disseminates the best available information on administrative units for all the countries in the world, providing a contribution to the standardization of the spatial dataset representing administrative units. The GAUL always maintains global layers with a unified coding system at country, first (e.g. departments), and second administrative levels (e.g. districts). Where data is available, it provides layers on a country by country basis down to third, fourth, and lowers levels.

**Note to self**: Add in clickable links


In [None]:
# Load the FAO GAUL dataset for Zambia at administrative level 1
admin_level = 'level2'
country_name = 'Zambia'
roi = ee.FeatureCollection(f"FAO/GAUL/2015/{admin_level}")
roi = roi.filter(ee.Filter.eq('ADM0_NAME', country_name))

# Print the available counties (ADM1_NAME) for verification
counties_list = roi.aggregate_array('ADM2_NAME').getInfo()
print(counties_list)

['Chibombo', 'Kabwe', 'Kapiri-Mposhi', 'Mkushi', 'Mumbwa', 'Serenje', 'Chililabombwe', 'Chingola', 'Kalulushi', 'Kitwe', 'Luanshya', 'Lufwanyama', 'Masaiti', 'Mpongwe', 'Mufulira', 'Ndola', 'Chadiza', 'Chama', 'Chipata', 'Katete', 'Lundazi', 'Mambwe', 'Nyimba', 'Petauke', 'Chienge', 'Kawambwa', 'Mansa', 'Milenge', 'Mwense', 'Nchelenge', 'Samfya', 'Chongwe', 'Kafue', 'Luangwa', 'Lusaka', 'Kabompo', 'Kasempa', 'Mufumbwe', 'Mwinilunga', 'Solwezi', 'Chilubi', 'Chinsali', 'Isoka', 'Kaputa', 'Kasama', 'Luwingu', 'Mbala', 'Mpika', 'Mporokoso', 'Mpulungu', 'Mungwi', 'Nakonde', 'Choma', 'Gwembe', 'Itezhi-tezhi', 'Kalomo', 'Kazungula', 'Livingstone', 'Mazabuka', 'Monze', 'Namwala', 'Siavonga', 'Sinazongwe', 'Kaoma', 'Mongu', 'Sesheke', 'Chavuma', 'Zambezi', 'Kalabo', 'Lukulu', 'Senanga', "Shang'ombo"]


In [None]:
# Creating a map centered on Zambia
m = geemap.Map()
m.setCenter(27.8493, -13.1339, 6)  # Longitude, Latitude, and zoom level

# Add Zambia's Admin 1 boundaries as a layer
m.addLayer(roi, {"color": "green"}, "Zambia Admin 2 Boundaries")

# Display the map
m


Map(center=[-13.1339, 27.8493], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=SearchD…

# Load CHIRPS Pentad Dataset


In [None]:
def fetch_precipitation_data_admin2(admin2_name):
    """
    Fetch monthly precipitation data for a given Admin Level 2 region in Zambia for 2000 to 2023,
    and return a restructured DataFrame with the following columns:
    - year, month, date, region, admin2_name, precipitation.
    """
    chirps = ee.ImageCollection('UCSB-CHG/CHIRPS/DAILY')
    startyear, endyear = 2000, 2023  # Updated to the years 2000 to 2023
    startdate, enddate = ee.Date.fromYMD(startyear, 1, 1), ee.Date.fromYMD(endyear, 12, 31)

    # Define the region from Zambia (Admin Level 2)
    region = ee.FeatureCollection('FAO/GAUL/2015/level2') \
              .filter(ee.Filter.eq('ADM0_NAME', 'Zambia')) \
              .filter(ee.Filter.eq('ADM2_NAME', admin2_name)).first()

    def MonthlySum(year):
        """
        Sum precipitation data for each month of a given year.
        """
        def monthSum(month):
            # Filter the CHIRPS dataset for the specific month and year
            monthly_sum = chirps.filterDate(startdate, enddate) \
                                .filter(ee.Filter.calendarRange(year, year, 'year')) \
                                .filter(ee.Filter.calendarRange(month, month, 'month')) \
                                .sum() \
                                .reduceRegion(ee.Reducer.mean(), geometry=region.geometry(), scale=5000, maxPixels=1e8)

            # Return the precipitation data and additional info
            return ee.Feature(None, {
                'year': year,
                'month': month,
                'date': ee.Date.fromYMD(year, month, 1).format(),
                'region': 'Zambia',  # Set as Zambia
                'admin2_name': admin2_name,  # Admin Level 2 region name
                'precipitation': monthly_sum.get('precipitation')
            })
        return ee.List.sequence(1, 12).map(monthSum)

    # List of years from 2000 to 2023
    years = ee.List.sequence(startyear, endyear)

    # Map over the years and fetch monthly precipitation
    monthlyPrecip = years.map(MonthlySum).flatten()

    # Convert the result to a FeatureCollection
    monthlyPrecipCollection = ee.FeatureCollection(monthlyPrecip)

    # Retrieve the result on the client side as a dictionary
    properties_list = monthlyPrecipCollection.getInfo()

    # Convert the result to a pandas DataFrame
    if properties_list['features']:
        data = [feature['properties'] for feature in properties_list['features']]
        df = pd.DataFrame(data)
    else:
        df = pd.DataFrame()

    # Define the restructure function inside the fetch function
    def restructure_dataframe(df):
        """
        Restructure the DataFrame to match the desired column order:
        ['year', 'month', 'date', 'region', 'admin2_name', 'precipitation']
        """
        desired_order = ['year', 'month', 'date', 'region', 'admin2_name', 'precipitation']
        df = df[desired_order]  # Ensure all columns are present in the DataFrame
        return df

    # Call the restructure function and return the restructured DataFrame
    return restructure_dataframe(df)


In [None]:
# List of all Admin Level 2 regions in Zambia
admin2_names = ['Chibombo', 'Kabwe', 'Kapiri-Mposhi', 'Mkushi', 'Mumbwa', 'Serenje',
                'Chililabombwe', 'Chingola', 'Kalulushi', 'Kitwe', 'Luanshya',
                'Lufwanyama', 'Masaiti', 'Mpongwe', 'Mufulira', 'Ndola', 'Chadiza',
                'Chama', 'Chipata', 'Katete', 'Lundazi', 'Mambwe', 'Nyimba',
                'Petauke', 'Chienge', 'Kawambwa', 'Mansa', 'Milenge', 'Mwense',
                'Nchelenge', 'Samfya', 'Chongwe', 'Kafue', 'Luangwa', 'Lusaka',
                'Kabompo', 'Kasempa', 'Mufumbwe', 'Mwinilunga', 'Solwezi',
                'Chilubi', 'Chinsali', 'Isoka', 'Kaputa', 'Kasama', 'Luwingu',
                'Mbala', 'Mpika', 'Mporokoso', 'Mpulungu', 'Mungwi', 'Nakonde',
                'Choma', 'Gwembe', 'Itezhi-tezhi', 'Kalomo', 'Kazungula',
                'Livingstone', 'Mazabuka', 'Monze', 'Namwala', 'Siavonga',
                'Sinazongwe', 'Kaoma', 'Mongu', 'Sesheke', 'Chavuma', 'Zambezi',
                'Kalabo', 'Lukulu', 'Senanga', "Shang'ombo"]

def fetch_and_combine_precipitation_data():
    """
    Fetch and combine monthly precipitation data for all Admin Level 2 regions in Zambia
    from 2000 to 2023, and return a single combined DataFrame.

    Returns:
    pd.DataFrame: Combined DataFrame containing precipitation data for all regions.
    """
    # List to store DataFrames for each region
    all_precip_data = []

    # Loop through each Admin Level 2 region and fetch the data
    for admin2_name in admin2_names:
        print(f"Fetching precipitation data for {admin2_name}...")
        df = fetch_precipitation_data_admin2(admin2_name)
        all_precip_data.append(df)

    # Combine all the individual DataFrames into one large DataFrame
    combined_df = pd.concat(all_precip_data, ignore_index=True)

    return combined_df

# Fetch and combine the precipitation data
combined_precip_data = fetch_and_combine_precipitation_data()

# Display the first few rows of the combined DataFrame
print("Combined Precipitation Data for All Admin Level 2 Regions:")
print(combined_precip_data.head())


Fetching precipitation data for Chibombo...
Fetching precipitation data for Kabwe...
Fetching precipitation data for Kapiri-Mposhi...
Fetching precipitation data for Mkushi...
Fetching precipitation data for Mumbwa...
Fetching precipitation data for Serenje...
Fetching precipitation data for Chililabombwe...
Fetching precipitation data for Chingola...
Fetching precipitation data for Kalulushi...
Fetching precipitation data for Kitwe...
Fetching precipitation data for Luanshya...
Fetching precipitation data for Lufwanyama...
Fetching precipitation data for Masaiti...
Fetching precipitation data for Mpongwe...
Fetching precipitation data for Mufulira...
Fetching precipitation data for Ndola...
Fetching precipitation data for Chadiza...
Fetching precipitation data for Chama...
Fetching precipitation data for Chipata...
Fetching precipitation data for Katete...
Fetching precipitation data for Lundazi...
Fetching precipitation data for Mambwe...
Fetching precipitation data for Nyimba...
Fet

In [None]:
combined_precip_data

Unnamed: 0,year,month,date,region,admin2_name,precipitation
0,2000,1,2000-01-01T00:00:00,Zambia,Chibombo,216.899197
1,2000,2,2000-02-01T00:00:00,Zambia,Chibombo,242.517915
2,2000,3,2000-03-01T00:00:00,Zambia,Chibombo,195.530641
3,2000,4,2000-04-01T00:00:00,Zambia,Chibombo,6.394654
4,2000,5,2000-05-01T00:00:00,Zambia,Chibombo,1.417097
...,...,...,...,...,...,...
20731,2023,8,2023-08-01T00:00:00,Zambia,Shang'ombo,0.000000
20732,2023,9,2023-09-01T00:00:00,Zambia,Shang'ombo,0.044391
20733,2023,10,2023-10-01T00:00:00,Zambia,Shang'ombo,7.807645
20734,2023,11,2023-11-01T00:00:00,Zambia,Shang'ombo,34.044939


### Filtering for Rain Season

In the analysis, we are only interested in examining rainfall seasons. In Zambia, this takes place between November and March. As a result, I will filter `combined_df`.

In [None]:
# Filter combined_df to include only months from November (11) to March (3)
zambia_rain_df = combined_precip_data[combined_precip_data['month'].isin([11, 12, 1, 2, 3])]

# Print the filtered combined DataFrame
print(zambia_rain_df.head(12))
print(zambia_rain_df.tail(12))


    year  month                 date  region admin2_name  precipitation
0   2000      1  2000-01-01T00:00:00  Zambia    Chibombo     216.899197
1   2000      2  2000-02-01T00:00:00  Zambia    Chibombo     242.517915
2   2000      3  2000-03-01T00:00:00  Zambia    Chibombo     195.530641
10  2000     11  2000-11-01T00:00:00  Zambia    Chibombo     117.328103
11  2000     12  2000-12-01T00:00:00  Zambia    Chibombo     201.903206
12  2001      1  2001-01-01T00:00:00  Zambia    Chibombo     222.092396
13  2001      2  2001-02-01T00:00:00  Zambia    Chibombo     331.400414
14  2001      3  2001-03-01T00:00:00  Zambia    Chibombo     160.703736
22  2001     11  2001-11-01T00:00:00  Zambia    Chibombo     134.069714
23  2001     12  2001-12-01T00:00:00  Zambia    Chibombo     199.873594
24  2002      1  2002-01-01T00:00:00  Zambia    Chibombo     131.593461
25  2002      2  2002-02-01T00:00:00  Zambia    Chibombo      64.783264
       year  month                 date  region admin2_name  pre

# Calculate Standardized Precipitation Index (SPI)

## SPI Definitions

* SPI1 (1-month scale)

Used to capture short-term monthly precipitation fluctuations and early warning of meteorological drought.

* SPI3 (3-month scale)

Useful for seasonal drought analysis, which helps in understanding the drought conditions over a quarterly period, closely related to agricultural impacts.

* SPI6 (6-month scale)

Provides insights into medium-term drought conditions, capturing both the end of one season and the start of another, which can influence soil moisture and crop yield over a longer period.

* SPI12 (12-month scale)

Used to assess long-term drought conditions, representing annual fluctuations and providing insights into the overall hydrological drought scenario, which can affect groundwater recharge and surface water availability.


## Selecting SPI Time Scales
Based on the above definitions we select the following time Scales for Long and Short Rains:

- Long Rains: March to June (4-month period) - We will use 1-month and 3-month SPI for long rains to understand short-term and medium-term droughts.
- Short Rains: October to December (3-month period) - We will use 1-month and 3-month SPI to capture variability within the shorter rainy season.

## Determining Gamma Distribution

What is a Gamma Distribution?

A gamma distribution is a type of statistical model used to describe data that are always positive and usually skewed, meaning most values are small but there can be a few large values. It's often used for things like rainfall, where:

- Many days have little or no rain, and
- A few days have a lot of rain.

This type of pattern creates a graph that has a high peak near the small values and a long tail extending to larger values. That’s what the gamma distribution looks like.

Why Do We Use Gamma Distribution for Rainfall?

- Always Positive: Rainfall cannot be negative; it either doesn’t rain or it rains some amount. The gamma distribution is great for representing only positive values.
- Right-Skewed Data: Most of the time, we have small amounts of rainfall, and occasionally, we get heavy rainfall. The gamma distribution is good for representing this kind of uneven distribution.
- Flexibility: The shape of the gamma distribution can change to fit different types of rainfall patterns, making it a very flexible tool for modeling different weather conditions.

The Gamma Distribution is the Standardized Precipitation Index (SPI) as a way to figure out if a place is having normal, dry, or wet weather compared to its history.

1. Modeling Historical Rainfall:

We use the gamma distribution to fit the historical rainfall data. This helps us understand what the usual rainfall is like for each month.

2. Comparing to Current Rainfall:

Once we have the gamma model of typical rainfall, we compare current rainfall to see how much it differs from the usual.
This comparison tells us if it is drier or wetter than normal, and by how much.

3. Getting SPI:

The final step is to convert this difference into a number called SPI, which tells us:
- If the SPI is negative, it means it’s drier than usual (possible drought).
- If the SPI is positive, it means it’s wetter than usual (possible flooding).

In [None]:
# Checking if all precipitation values are positive in each dataset
print("Zambia Rainy Season: All values positive? ", (zambia_rain_df['precipitation'] > 0).all())

Zambia Rainy Season: All values positive?  True


### Histogram

In [None]:
# Get the list of unique Admin Level 2 regions (admin2_name) from the DataFrame
admin2_names = zambia_rain_df['admin2_name'].unique()

# Function to update the plot based on selected admin2_name
def update_plot(admin2_name):
    # Filter the data for the selected admin2_name
    region_data = zambia_rain_df[zambia_rain_df['admin2_name'] == admin2_name]

    # Set up the plot
    plt.figure(figsize=(8, 5))

    # Plot histogram for the selected region
    plt.hist(region_data['precipitation'], bins=20, color='skyblue', edgecolor='black')
    plt.title(f'{admin2_name} Rain Season')
    plt.xlabel('Precipitation')
    plt.ylabel('Frequency')

    # Show the plot
    plt.show()

# Create an interactive dropdown menu for selecting the admin2_name
dropdown = widgets.Dropdown(
    options=admin2_names,
    description='Select Region:',
    value=admin2_names[0]  # Default to the first region
)

# Display the dropdown and connect it to the update_plot function
interactive_plot = widgets.interactive(update_plot, admin2_name=dropdown)
display(interactive_plot)

interactive(children=(Dropdown(description='Select Region:', options=('Chibombo', 'Kabwe', 'Kapiri-Mposhi', 'M…

### Kolmogorov-Smirnov Test

Understanding the Kolmogorov-Smirnov Test:
The Kolmogorov-Smirnov test compares the empirical distribution of your data to a theoretical distribution (in this case, the gamma distribution).
The null hypothesis (H₀) of the KS test is that the data follows the specified distribution (in this case, gamma).
The alternative hypothesis (H₁) is that the data does not follow the specified distribution.

In the context of the Kolmogorov-Smirnov (KS) test, a p-value greater than 0.05 is typically desired when checking the fit of a distribution.

Interpreting the p-value:

- p-value > 0.05:
If the p-value is greater than 0.05, it means there is insufficient evidence to reject the null hypothesis.
In other words, the data fits the gamma distribution well. Therefore, we accept the null hypothesis and conclude that the gamma distribution is likely a good fit.

- p-value < 0.05:
If the p-value is less than 0.05, it means that there is significant evidence to reject the null hypothesis.
This suggests that the gamma distribution may not be a good fit for the data.

In [None]:
def perform_ks_test(data, region_name):
    # Fit a gamma distribution to the data
    shape, loc, scale = gamma.fit(data, floc=0)  # floc=0 to ensure non-negative values

    # Perform Kolmogorov-Smirnov test
    test_stat, p_value = kstest(data, gamma(shape, loc, scale).cdf)

    # Print results for each region
    if p_value > 0.05:
        print(f"The data for {region_name} fits the gamma distribution well (p-value: {p_value:.4f}).")
    else:
        print(f"The gamma distribution may not be the best fit for {region_name} (p-value: {p_value:.4f}).")

# Perform KS test for each admin1_name
for region_name, group in zambia_rain_df.groupby('admin2_name'):
    print(f"Testing for {region_name}:")
    perform_ks_test(group['precipitation'].dropna(), region_name)
    print()  # Add a line break between region results


Testing for Chadiza:
The data for Chadiza fits the gamma distribution well (p-value: 0.1303).

Testing for Chama:
The gamma distribution may not be the best fit for Chama (p-value: 0.0021).

Testing for Chavuma:
The data for Chavuma fits the gamma distribution well (p-value: 0.7088).

Testing for Chibombo:
The data for Chibombo fits the gamma distribution well (p-value: 0.3589).

Testing for Chienge:
The data for Chienge fits the gamma distribution well (p-value: 0.9080).

Testing for Chililabombwe:
The data for Chililabombwe fits the gamma distribution well (p-value: 0.1908).

Testing for Chilubi:
The data for Chilubi fits the gamma distribution well (p-value: 0.1109).

Testing for Chingola:
The data for Chingola fits the gamma distribution well (p-value: 0.0867).

Testing for Chinsali:
The gamma distribution may not be the best fit for Chinsali (p-value: 0.0027).

Testing for Chipata:
The data for Chipata fits the gamma distribution well (p-value: 0.0550).

Testing for Choma:
The dat

In [None]:
# Function to perform Kolmogorov-Smirnov test for each admin2_name and determine if it passes or fails
def perform_ks_test_for_all(region, pass_or_fail="pass"):
    """
    Perform Kolmogorov-Smirnov test for each admin2_name within the selected region.

    Parameters:
    region (str): Name of the region.
    pass_or_fail (str): Specify whether to print areas that pass or fail the test ('pass' or 'fail').

    Returns:
    None: Prints the names of admin2 areas that pass or fail the KS test.
    """
    # Filter the dataset based on the provided region
    filtered_df = zambia_rain_df[zambia_rain_df['region'] == region]

    # Initialize lists to hold the names of areas that pass or fail the test
    passing_admin2_names = []
    failing_admin2_names = []

    # Loop through each admin2_name in the filtered DataFrame
    for admin2_name in filtered_df['admin2_name'].unique():
        # Get the precipitation data for the current admin2_name
        data = filtered_df[filtered_df['admin2_name'] == admin2_name]['precipitation'].dropna()

        # Check if there is enough data to perform the test
        if len(data) < 2:
            continue

        # Fit a gamma distribution to the data
        shape, loc, scale = gamma.fit(data, floc=0)  # floc=0 to ensure non-negative values

        # Perform Kolmogorov-Smirnov test
        _, p_value = kstest(data, gamma(shape, loc, scale).cdf)

        # Determine if the admin2_name passes or fails based on the p-value
        if p_value > 0.05:
            passing_admin2_names.append(admin2_name)
        else:
            failing_admin2_names.append(admin2_name)

    # Print the results based on the pass_or_fail argument
    if pass_or_fail == "pass":
        if passing_admin2_names:
            print(f"Admin2 areas in {region} that pass the KS test:")
            for name in passing_admin2_names:
                print(name)
        else:
            print(f"No admin2 areas in {region} passed the KS test.")
    elif pass_or_fail == "fail":
        if failing_admin2_names:
            print(f"Admin2 areas in {region} that fail the KS test:")
            for name in failing_admin2_names:
                print(name)
        else:
            print(f"No admin2 areas in {region} failed the KS test.")
    else:
        print("Invalid argument for 'pass_or_fail'. Please use 'pass' or 'fail'.")

# Interactive widgets
region_dropdown = Dropdown(
    options=zambia_rain_df['region'].unique(),
    description='Select Region:',
    style={'description_width': 'initial'}
)

pass_or_fail_dropdown = Dropdown(
    options=['pass', 'fail'],
    description='Pass or Fail:',
    style={'description_width': 'initial'}
)

# Interactive function to display the results of the KS test for all admin2 areas
interact(
    perform_ks_test_for_all,
    region=region_dropdown,
    pass_or_fail=pass_or_fail_dropdown
)

interactive(children=(Dropdown(description='Select Region:', options=('Zambia',), style=DescriptionStyle(descr…

### Notes

Severel adminstrative zones failed Kolmogorov-Smirnov test.

### QQ Plots

A QQ plot, or Quantile-Quantile plot, is a graphical tool used to compare the distribution of a dataset with a theoretical distribution, such as the normal or gamma distribution. It's a provides a visual assessment of how well the dataset fits a chosen probability distribution. The QQ plot compares the quantiles of the dataset with the quantiles of the specified theoretical distribution.

If the data follows the theoretical distribution closely, the points on the plot will align approximately along a straight line. Deviations from this line indicate that the data does not conform well to the expected distribution.

In [None]:
# Define an interactive function for generating QQ plots
def interactive_qq_plot(region, admin2_name):
    """
    Generates a QQ plot for the selected dataset using a gamma distribution.

    Parameters:
    region (str): The selected region.
    admin2_name (str): The selected Admin 2 area within the region.
    """
    # Filter the DataFrame based on the selected region and admin2_name
    filtered_df = zambia_rain_df[
        (zambia_rain_df['region'] == region) &
        (zambia_rain_df['admin2_name'] == admin2_name)
    ]

    if filtered_df.empty:
        print("No data available for the selected region and admin2_name.")
        return

    data = filtered_df['precipitation'].dropna()

    # Fit the gamma distribution to the data
    shape, loc, scale = gamma.fit(data, floc=0)

    # Generate QQ plot
    plt.figure(figsize=(8, 6))
    probplot(data, dist="gamma", sparams=(shape, loc, scale), plot=plt)
    plt.title(f"QQ Plot to Check Gamma Fit: {admin2_name} ({region})")
    plt.xlabel("Theoretical Quantiles")
    plt.ylabel("Ordered Values")
    plt.grid(True)
    plt.show()

# Define the interactive dropdown widgets
region_dropdown = Dropdown(
    options=zambia_rain_df['region'].unique(),
    description='Select Region:',
    style={'description_width': 'initial'}
)

admin2_name_dropdown = Dropdown(
    options=[],  # Initially empty, will be updated based on the region selected
    description='Select Admin2:',
    style={'description_width': 'initial'}
)

# Function to update admin2_name options based on the selected region
def update_admin2_dropdown(change):
    selected_region = change['new']
    if selected_region:
        admin2_options = zambia_rain_df[zambia_rain_df['region'] == selected_region]['admin2_name'].unique()
        admin2_name_dropdown.options = admin2_options
    else:
        admin2_name_dropdown.options = []

# Attach an observer to the region dropdown to update the admin2_name dropdown
region_dropdown.observe(update_admin2_dropdown, names='value')

# Set an initial value to trigger the observer and populate admin2_name dropdown
update_admin2_dropdown({'new': region_dropdown.value})

# Use `interact` to create an interactive QQ plot based on the selected region and admin2_name
interact(interactive_qq_plot, region=region_dropdown, admin2_name=admin2_name_dropdown)



interactive(children=(Dropdown(description='Select Region:', options=('Zambia',), style=DescriptionStyle(descr…

### Analysis:

## Calculate SPI

In [None]:
def calculate_spi(precip_series, scale=1):
    """
    Calculate Standardized Precipitation Index (SPI) for a given precipitation series.

    Parameters:
    precip_series (pd.Series): Precipitation data (assumed monthly in this case).
    scale (int): Time scale over which to calculate SPI (e.g., 1-month, 3-month).

    Returns:
    pd.Series: SPI values for the given precipitation series.
    """
    # Calculate rolling sum over the specified time scale
    precip_rolling = precip_series.rolling(window=scale).sum()

    # Drop NaN values resulting from the rolling operation
    precip_rolling = precip_rolling.dropna()

    # Fit a gamma distribution to the rolling sum data
    shape, loc, scale_param = gamma.fit(precip_rolling, floc=0)

    # Calculate cumulative distribution function (CDF) for gamma distribution
    gamma_cdf = gamma.cdf(precip_rolling, shape, loc=loc, scale=scale_param)

    # Convert gamma CDF to standard normal distribution to calculate SPI
    spi_values = norm.ppf(gamma_cdf)

    # Return the SPI values as a pandas Series with the same index
    return pd.Series(spi_values, index=precip_rolling.index)

# Initialize an empty DataFrame to store the SPI results across regions
spi_results_df = pd.DataFrame()

# Group by 'admin2_name' and process each region separately
admin2_names = zambia_rain_df['admin2_name'].unique()

for admin2 in admin2_names:
    # Clean the admin2_name to avoid spaces and special characters
    admin2_cleaned = admin2.replace(" ", "_").replace("-", "_")

    # Filter data for the current admin2_name and remove duplicate dates by aggregating the precipitation values
    region_data = zambia_rain_df[zambia_rain_df['admin2_name'] == admin2][['date', 'precipitation']]
    region_data.set_index('date', inplace=True)
    region_data = region_data.groupby(region_data.index).mean()  # Aggregate by mean to remove duplicate indices

    # Calculate 1-month SPI
    spi_1_month = calculate_spi(region_data['precipitation'], scale=1)
    spi_1_month.name = f'{admin2_cleaned}_1_month'

    # Calculate 3-month SPI
    spi_3_month = calculate_spi(region_data['precipitation'], scale=3)
    spi_3_month.name = f'{admin2_cleaned}_3_month'

    # Combine the 1-month and 3-month SPI results into a single DataFrame
    region_spi_df = pd.concat([spi_1_month, spi_3_month], axis=1)

    # Concatenate the SPI results for the region horizontally (across columns)
    spi_results_df = pd.concat([spi_results_df, region_spi_df], axis=1)

# Display the SPI results
print("SPI Results for all regions (admin2_name):")
print(spi_results_df.head())


SPI Results for all regions (admin2_name):
                     Chibombo_1_month  Chibombo_3_month  Kabwe_1_month  \
date                                                                     
2000-01-01T00:00:00          0.707354               NaN       0.712670   
2000-02-01T00:00:00          0.939230               NaN       0.870516   
2000-03-01T00:00:00          0.499873          1.107143       0.508235   
2000-11-01T00:00:00         -0.423085          0.505439      -0.185352   
2000-12-01T00:00:00          0.563240          0.239539       0.574574   

                     Kabwe_3_month  Kapiri_Mposhi_1_month  \
date                                                        
2000-01-01T00:00:00            NaN               0.648742   
2000-02-01T00:00:00            NaN               0.748236   
2000-03-01T00:00:00       1.077942               0.599916   
2000-11-01T00:00:00       0.582342              -0.465606   
2000-12-01T00:00:00       0.382142               0.956502   

          

In [None]:
spi_results_df.head(10)

Unnamed: 0_level_0,Chibombo_1_month,Chibombo_3_month,Kabwe_1_month,Kabwe_3_month,Kapiri_Mposhi_1_month,Kapiri_Mposhi_3_month,Mkushi_1_month,Mkushi_3_month,Mumbwa_1_month,Mumbwa_3_month,...,Zambezi_1_month,Zambezi_3_month,Kalabo_1_month,Kalabo_3_month,Lukulu_1_month,Lukulu_3_month,Senanga_1_month,Senanga_3_month,Shang'ombo_1_month,Shang'ombo_3_month
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2000-01-01T00:00:00,0.707354,,0.71267,,0.648742,,0.797834,,0.663515,,...,-0.273487,,0.326802,,0.279895,,0.407395,,0.755311,
2000-02-01T00:00:00,0.93923,,0.870516,,0.748236,,0.979661,,0.842098,,...,-0.196266,,0.255304,,0.130457,,0.324206,,0.512878,
2000-03-01T00:00:00,0.499873,1.107143,0.508235,1.077942,0.599916,1.0157,0.812508,1.408205,0.680558,1.070248,...,0.699022,0.022854,1.12269,0.774047,0.866132,0.598993,1.004829,0.764305,0.924211,0.966809
2000-11-01T00:00:00,-0.423085,0.505439,-0.185352,0.582342,-0.465606,0.427519,-0.56785,0.691621,-0.85624,0.35972,...,-1.752704,-0.694527,-1.860482,-0.037682,-1.603831,-0.269462,-1.532048,0.029298,-1.7198,0.056462
2000-12-01T00:00:00,0.56324,0.239539,0.574574,0.382142,0.956502,0.569185,0.469456,0.326851,0.950942,0.432463,...,0.484227,-0.258185,0.681026,0.205023,0.517602,-0.031562,0.577456,0.173978,0.76591,0.207086
2001-01-01T00:00:00,0.755741,0.414939,0.706458,0.512425,1.098556,0.895536,0.851652,0.356751,0.874133,0.555377,...,0.672842,-0.276453,-0.286857,-0.635774,0.714005,-0.134353,-0.46936,-0.687532,-1.035348,-0.816733
2001-02-01T00:00:00,1.639217,1.656409,1.510206,1.541609,1.460106,1.928723,1.665778,1.733703,1.648308,1.825246,...,0.752863,0.992213,1.087309,0.695484,0.910058,1.075904,1.26459,0.675594,1.401326,0.682975
2001-03-01T00:00:00,0.126956,1.437211,0.451796,1.476004,0.694377,1.78493,0.839671,1.939491,0.21746,1.462395,...,-0.114893,0.659552,0.327355,0.510354,-0.142601,0.735719,0.191384,0.480017,0.259647,0.420041
2001-11-01T00:00:00,-0.196644,0.939809,-0.428806,0.909803,-0.465387,0.986569,-0.690858,1.207422,-0.082642,0.983274,...,-0.223435,0.132519,-0.174129,0.55841,-0.109822,0.275778,-0.178931,0.596036,-0.087102,0.746987
2001-12-01T00:00:00,0.543203,0.102667,0.488761,0.167426,0.359151,0.239544,0.378282,0.232126,0.293624,0.08019,...,-0.062657,-0.390306,-0.677608,-0.376967,-0.392929,-0.509886,-0.225913,-0.264354,-0.166762,-0.1472


In [None]:
# Interactive plotting function
def plot_spi(region_time_scale):
    # Extract the SPI series based on the selected region and time scale
    spi_series = spi_results_df[region_time_scale].dropna()

    # Plot the SPI values
    plt.figure(figsize=(10, 6))
    plt.plot(spi_series.index, spi_series, color='blue', linestyle='-', marker='o', label='SPI')
    plt.axhline(y=0, color='black', linestyle='--')  # Reference line at SPI=0
    plt.axhline(y=-1, color='red', linestyle='--', label='Mild Drought (SPI=-1)')
    plt.axhline(y=-1.5, color='orange', linestyle='--', label='Moderate Drought (SPI=-1.5)')
    plt.axhline(y=-2, color='darkred', linestyle='--', label='Severe Drought (SPI=-2)')

    plt.xlabel('Date')
    plt.ylabel('SPI Value')
    plt.title(f'Standardized Precipitation Index (SPI) for {region_time_scale}')
    plt.legend()
    plt.grid(True)
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()

# Get the list of region_time_scale options from the DataFrame columns
region_time_scale_options = spi_results_df.columns.tolist()

# Create a dropdown widget for selecting the region and time scale
interact(
    plot_spi,
    region_time_scale=Dropdown(
        options=region_time_scale_options,
        description='Select Region & Scale:',
        style={'description_width': 'initial'},
    )
)

interactive(children=(Dropdown(description='Select Region & Scale:', options=('Chibombo_1_month', 'Chibombo_3_…

## Identify Drought Events

To characterize droughts, we need to analyze the SPI values calculated previously to identify distinct drought events. This involves analyzing when SPI values fall below a threshold, like -1.0, to determine the duration, severity, and intensity of each drought event.

Definitions:

- Drought Event: A period during which SPI is below a defined threshold (e.g., SPI < -1.0 for moderate drought).
- Drought Duration: The number of months during which the SPI value stays below the threshold.
- Drought Severity: The sum of all SPI values during the drought period.
- Drought Intensity: Severity divided by duration, indicating how intense the drought is on average.

In [None]:
def characterize_drought_events(spi_results_df, spi_threshold=-1.0):
    """
    Characterize drought events based on SPI values for each region (admin2_name) and time scale.

    Parameters:
    spi_results_df (pd.DataFrame): Input DataFrame with SPI values for different regions and time scales.
    spi_threshold (float): Threshold for defining a drought event (default is -1.0 for moderate drought).

    Returns:
    pd.DataFrame: Output DataFrame containing drought event details for each region and time scale.
    """
    drought_events = []

    # Loop through the columns of the SPI DataFrame
    for column in spi_results_df.columns:
        admin2_name, time_scale = column.rsplit('_', 1)  # Split column name to get admin2_name and time scale
        spi_series = spi_results_df[column].dropna()  # Drop NaN values

        # Sort the SPI series by the index (date)
        spi_series = spi_series.sort_index()

        in_drought = False
        start_date = None
        severity = 0
        drought_event_count = 0

        # Loop through each date and SPI value to identify drought events
        for date, spi_value in spi_series.items():
            if spi_value < spi_threshold:
                if not in_drought:
                    # Start of a new drought event
                    in_drought = True
                    start_date = date
                    severity = spi_value
                    drought_event_count += 1
                else:
                    # Accumulate severity if already in a drought
                    severity += spi_value
            else:
                if in_drought:
                    # End of the drought event
                    in_drought = False
                    end_date = date
                    duration = (pd.to_datetime(end_date) - pd.to_datetime(start_date)).days // 30  # Duration in months
                    intensity = severity / duration if duration > 0 else 0

                    # Append the drought event details
                    drought_events.append({
                        'admin2_name': admin2_name,
                        'time_scale': time_scale,
                        'Drought Event': drought_event_count,
                        'Drought Duration (months)': duration,
                        'Drought Severity': severity,
                        'Drought Intensity': intensity,
                        'Drought Start': start_date,
                        'Drought End': end_date
                    })

        # Handle ongoing drought at the end of the series
        if in_drought:
            end_date = spi_series.index[-1]
            duration = (pd.to_datetime(end_date) - pd.to_datetime(start_date)).days // 30
            intensity = severity / duration if duration > 0 else 0

            drought_events.append({
                'admin2_name': admin2_name,
                'time_scale': time_scale,
                'Drought Event': drought_event_count,
                'Drought Duration (months)': duration,
                'Drought Severity': severity,
                'Drought Intensity': intensity,
                'Drought Start': start_date,
                'Drought End': end_date
            })

    # Convert the list of drought events to a DataFrame
    drought_events_df = pd.DataFrame(drought_events)

    return drought_events_df

# Apply the function to the spi_results_df
drought_events_df = characterize_drought_events(spi_results_df)
drought_events_df


Unnamed: 0,admin2_name,time_scale,Drought Event,Drought Duration (months),Drought Severity,Drought Intensity,Drought Start,Drought End
0,Chibombo_1,month,1,9,-3.268724,-0.363192,2002-02-01T00:00:00,2002-11-01T00:00:00
1,Chibombo_1,month,2,1,-1.160000,-1.160000,2004-11-01T00:00:00,2004-12-01T00:00:00
2,Chibombo_1,month,3,9,-3.019505,-0.335501,2005-02-01T00:00:00,2005-11-01T00:00:00
3,Chibombo_1,month,4,8,-1.432502,-0.179063,2007-03-01T00:00:00,2007-11-01T00:00:00
4,Chibombo_1,month,5,0,-1.045599,0.000000,2009-02-01T00:00:00,2009-03-01T00:00:00
...,...,...,...,...,...,...,...,...
1901,Shang'ombo_3,month,4,1,-1.188577,-1.188577,2013-11-01T00:00:00,2013-12-01T00:00:00
1902,Shang'ombo_3,month,5,10,-5.190560,-0.519056,2015-03-01T00:00:00,2016-01-01T00:00:00
1903,Shang'ombo_3,month,6,12,-9.957015,-0.829751,2019-01-01T00:00:00,2020-01-01T00:00:00
1904,Shang'ombo_3,month,7,2,-3.406525,-1.703263,2021-11-01T00:00:00,2022-01-01T00:00:00


Note - The issue where drought durations exceed expected limits (e.g., durations exceeding 3 months for a "3-month SPI scale") arises from the way the drought characterization function is designed. Specifically, the function might be identifying prolonged drought events that span beyond a single season or multi-month drought events that are not being properly split when SPI values rise above the threshold.

Due to continuous negative SPI Values, the function currently identifies a drought event as continuing until the SPI rises above the threshold. If the SPI remains below the threshold for an extended period (even beyond a season), the function will count that entire period as a single drought event.

### Distribution Analysis

In [None]:
# Interactive plotting function
def plot_interactive_histogram(admin2_name, variable, df=drought_events_df):
    """
    Plots an interactive histogram or density plot for a selected drought characteristic.

    Parameters:
    - admin2_name (str): Selected admin2_name (administrative unit).
    - variable (str): The drought characteristic to plot (e.g., 'Drought Duration (months)', 'Drought Severity', 'Drought Intensity').
    - df (pd.DataFrame): The DataFrame containing drought characteristics data.
    """
    # Filter data based on user selections
    filtered_df = df[
        #(df['region'] == region) &   # Commented out region filter
        (df['admin2_name'] == admin2_name)
        #& (df['season'] == season)   # Commented out season filter
    ]

    # Check if filtered data is not empty
    if filtered_df.empty:
        print("No data available for the selected filters.")
        return

    # Plot the histogram
    plt.figure(figsize=(10, 6))
    sns.histplot(filtered_df[variable], bins=20, kde=True, color='skyblue', edgecolor='black')
    plt.title(f"Distribution of {variable} for {admin2_name}")  # Updated title without region and season
    plt.xlabel(variable)
    plt.ylabel('Frequency')
    plt.grid(True)
    plt.tight_layout()
    plt.show()

# Dropdown options
# region_options = drought_events_df['region'].unique().tolist()  # Commented out region options
admin2_name_options = drought_events_df['admin2_name'].unique().tolist()
# season_options = drought_events_df['season'].unique().tolist()  # Commented out season options
variable_options = ['Drought Duration (months)', 'Drought Severity', 'Drought Intensity']

# Create dropdown widgets
# region_dropdown = Dropdown(options=region_options, description='Region:', style={'description_width': 'initial'})  # Commented out region dropdown
admin2_name_dropdown = Dropdown(options=admin2_name_options, description='Admin2 Name:', style={'description_width': 'initial'})
# season_dropdown = Dropdown(options=season_options, description='Season:', style={'description_width': 'initial'})  # Commented out season dropdown
variable_dropdown = Dropdown(options=variable_options, description='Variable:', style={'description_width': 'initial'})

# Function to update the Admin2 dropdown based on the selected region
# def update_admin2_options(*args):  # Commented out region dependency function
#     selected_region = region_dropdown.value
#     filtered_admin2_options = drought_events_df[drought_events_df['region'] == selected_region]['admin2_name'].unique().tolist()
#     admin2_name_dropdown.options = filtered_admin2_options

# Attach the update function to the 'region' dropdown
# region_dropdown.observe(update_admin2_options, names='value')

# Create interactive dropdown widgets for selecting admin2_name and variable
interact(
    plot_interactive_histogram,
    # region=region_dropdown,  # Commented out region dropdown
    admin2_name=admin2_name_dropdown,
    # season=season_dropdown,  # Commented out season dropdown
    variable=variable_dropdown,
    df=fixed(drought_events_df)  # Pass the DataFrame as a fixed value
)


interactive(children=(Dropdown(description='Admin2 Name:', options=('Chibombo_1', 'Chibombo_3', 'Kabwe_1', 'Ka…

### Drought Event Frequency

## Trend Analysis

The Mann-Kendall test is a non-parametric test used to identify trends in time series data. It is particularly useful for climate data, like SPI values, to determine if there's a significant increasing or decreasing trend in drought conditions over time. We will also use Sen’s slope estimator to determine the magnitude of the trends identified.

In [None]:
def perform_trend_analysis(spi_series, time_scale):
    """
    Perform Mann-Kendall test and Sen's Slope Estimation for the given SPI series.

    Parameters:
    spi_series (pd.Series): Series of SPI values.
    time_scale (str): Time scale (e.g., '1_month', '3_month').

    Returns:
    dict: Dictionary containing trend, p-value, and Sen's slope.
    """
    # Drop NaN values
    spi_series = spi_series.dropna()

    # Perform Mann-Kendall Trend Test
    mk_result = mk.original_test(spi_series)

    # Extract trend, p-value, and Sen's slope
    trend = mk_result.trend
    p_value = mk_result.p
    sen_slope = mk_result.slope

    return {
        'time_scale': time_scale,
        'trend': trend,
        'p_value': p_value,
        'sen_slope': sen_slope
    }


In [None]:
def perform_trend_analysis(spi_series, time_scale):
    """
    Perform Mann-Kendall test and Sen's Slope Estimation for the given SPI series.

    Parameters:
    spi_series (pd.Series): Series of SPI values.
    time_scale (str): Time scale (e.g., '1_month', '3_month').

    Returns:
    dict: Dictionary containing trend, p-value, and Sen's slope.
    """
    # Drop NaN values
    spi_series = spi_series.dropna()

    # Perform Mann-Kendall Trend Test
    mk_result = mk.original_test(spi_series)

    # Extract trend, p-value, and Sen's slope
    trend = mk_result.trend
    p_value = mk_result.p
    sen_slope = mk_result.slope

    return {
        'time_scale': time_scale,
        'trend': trend,
        'p_value': p_value,
        'sen_slope': sen_slope
    }

# Applying the Mann-Kendall Test and Sen's Slope Estimation
trend_analysis_results = {}

# Loop through the columns of spi_results_df and apply trend analysis
for column in spi_results_df.columns:
    time_scale = column.split('_')[-1]  # Extract time scale from the column name (e.g., '1_month', '3_month')
    spi_series = spi_results_df[column]

    # Perform trend analysis for each column (region and time scale)
    trend_analysis_results[column] = perform_trend_analysis(spi_series, time_scale)

# Convert results to DataFrame for better visualization
trend_analysis_df = pd.DataFrame(trend_analysis_results).T

# Print out the trends
print("Trend Analysis Results (Increasing or Decreasing):")

increasing_trends = []
decreasing_trends = []
no_trend = []

# Loop through the trend results and categorize them
for region, row in trend_analysis_df.iterrows():
    trend = row['trend']

    if trend == 'increasing':
        increasing_trends.append(region)
    elif trend == 'decreasing':
        decreasing_trends.append(region)
    else:
        no_trend.append(region)

# Print the results
if increasing_trends:
    print("Regions with increasing trends:")
    for region in increasing_trends:
        print(region)

if decreasing_trends:
    print("\nRegions with decreasing trends:")
    for region in decreasing_trends:
        print(region)

if no_trend:
    print("\nRegions with no significant trend:")
    for region in no_trend:
        print(region)

Trend Analysis Results (Increasing or Decreasing):
Regions with increasing trends:
Kaputa_3_month

Regions with decreasing trends:
Chililabombwe_3_month
Chingola_3_month
Kalulushi_3_month
Kitwe_3_month
Luanshya_3_month
Lufwanyama_3_month
Masaiti_3_month
Mufulira_3_month
Ndola_3_month
Kawambwa_3_month
Mansa_3_month
Milenge_3_month
Mwense_3_month
Samfya_3_month
Kabompo_3_month
Solwezi_3_month
Luwingu_3_month
Itezhi_tezhi_3_month
Kazungula_3_month
Kaoma_3_month
Mongu_3_month
Lukulu_3_month

Regions with no significant trend:
Chibombo_1_month
Chibombo_3_month
Kabwe_1_month
Kabwe_3_month
Kapiri_Mposhi_1_month
Kapiri_Mposhi_3_month
Mkushi_1_month
Mkushi_3_month
Mumbwa_1_month
Mumbwa_3_month
Serenje_1_month
Serenje_3_month
Chililabombwe_1_month
Chingola_1_month
Kalulushi_1_month
Kitwe_1_month
Luanshya_1_month
Lufwanyama_1_month
Masaiti_1_month
Mpongwe_1_month
Mpongwe_3_month
Mufulira_1_month
Ndola_1_month
Chadiza_1_month
Chadiza_3_month
Chama_1_month
Chama_3_month
Chipata_1_month
Chipata_3_mo

### Notes: