In [None]:
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

# --- Data Loading and Preprocessing ---
# Load each country's cleaned CSV locally.
# It is assumed that the data is already cleaned and available in the specified paths.

# Load data for Benin
benin_df = pd.read_csv('../data/benin-clean.csv')

# Load data for Sierra Leone
sierraleone_df = pd.read_csv('../data/sierraleone-clean.csv')

# Load data for Togo
togo_df = pd.read_csv('../data/togo-clean.csv')

# Add a 'Country' column to each DataFrame for identification during concatenation and plotting
benin_df['Country'] = 'Benin'
sierraleone_df['Country'] = 'Sierra Leone'
togo_df['Country'] = 'Togo'

# Convert the 'Timestamp' column to datetime objects.
# This is crucial for extracting time-based features like the hour.
benin_df['Timestamp'] = pd.to_datetime(benin_df['Timestamp'])
benin_df['Hour'] = benin_df['Timestamp'].dt.hour # Extract the hour from the timestamp

sierraleone_df['Timestamp'] = pd.to_datetime(sierraleone_df['Timestamp'])
sierraleone_df['Hour'] = sierraleone_df['Timestamp'].dt.hour

togo_df['Timestamp'] = pd.to_datetime(togo_df['Timestamp'])
togo_df['Hour'] = togo_df['Timestamp'].dt.hour

# Filter data to include only measurements taken around noon (12:00) and 1 PM (13:00).
# This focuses the analysis on peak solar irradiance times.
benin_noon_1pm = benin_df[benin_df['Hour'].isin([12, 13])]
sierraleone_noon_1pm = sierraleone_df[sierraleone_df['Hour'].isin([12, 13])]
togo_noon_1pm = togo_df[togo_df['Hour'].isin([12, 13])]

# Concatenate the filtered dataframes from all three countries into a single DataFrame.
# This combined DataFrame will be used for comparative analysis and plotting.
all_countries_df = pd.concat([benin_noon_1pm, sierraleone_noon_1pm, togo_noon_1pm])

# Melt the concatenated DataFrame for easier plotting with Seaborn's boxplot.
# 'id_vars' specifies columns to keep as identifiers, 'value_vars' specifies columns to unpivot.
# 'var_name' and 'value_name' define the new column names for the unpivoted data.
df_melted = all_countries_df.melt(id_vars=['Country'], value_vars=['GHI', 'DNI', 'DHI'],
                                   var_name='Metric', value_name='Irradiance (W/m^2)')

# --- Metric Comparison: Boxplots ---
# Create a figure for the boxplots with a specified size.
plt.figure(figsize=(15, 6))
# Generate boxplots for GHI, DNI, and DHI, grouped by Metric and colored by Country.
# 'palette' sets the color scheme for the different countries.
sns.boxplot(x='Metric', y='Irradiance (W/m^2)', hue='Country', data=df_melted, palette='viridis')
plt.title('Comparison of Solar Irradiance Metrics (GHI, DNI, DHI) by Country (Noon & 1 PM)')
plt.ylabel('Irradiance (W/m^2)')
plt.xlabel('Solar Irradiance Metric')
plt.grid(axis='y', linestyle='--', alpha=0.7) # Add a grid for better readability
plt.show() # Display the plot

# --- Metric Comparison: Summary Table ---
# Group the combined DataFrame by 'Country' and calculate mean, median, and standard deviation
# for GHI, DNI, and DHI.
summary_table = all_countries_df.groupby('Country')[['GHI', 'DNI', 'DHI']].agg(['mean', 'median', 'std'])
print("\nSummary Table (Mean, Median, Std Dev) of GHI, DNI, DHI by Country (Noon & 1 PM):\n")
print(summary_table) # Print the summary table

# --- Statistical Testing: One-way ANOVA ---
# Prepare GHI data for ANOVA by separating it by country and dropping any NaN values.
# Dropping NaNs is crucial for statistical tests to avoid errors.
ghi_benin = all_countries_df[all_countries_df['Country'] == 'Benin']['GHI'].dropna()
ghi_sierraleone = all_countries_df[all_countries_df['Country'] == 'Sierra Leone']['GHI'].dropna()
ghi_togo = all_countries_df[all_countries_df['Country'] == 'Togo']['GHI'].dropna()

# Perform a one-way ANOVA test on the GHI values from the three countries.
# This test assesses if there's a statistically significant difference between the means of three or more groups.
f_statistic, p_value = stats.f_oneway(ghi_benin, ghi_sierraleone, ghi_togo)

print(f"\nOne-way ANOVA on GHI values:")
print(f"F-statistic: {f_statistic:.2f}") # Print the F-statistic, formatted to two decimal places
print(f"P-value: {p_value:.3f}") # Print the p-value, formatted to three decimal places

# --- Bonus: Visual Summary (Bar Chart of Average GHI) ---
# Calculate the average GHI for each country and sort them in descending order.
avg_ghi_by_country = all_countries_df.groupby('Country')['GHI'].mean().sort_values(ascending=False)

# Create a figure for the bar chart.
plt.figure(figsize=(10, 6))
# Plot the average GHI as a bar chart.
# 'color' uses a Seaborn color palette, adapting to the number of countries.
avg_ghi_by_country.plot(kind='bar', color=sns.color_palette('viridis', len(avg_ghi_by_country)))
plt.title('Average GHI by Country (Noon & 1 PM)')
plt.xlabel('Country')
plt.ylabel('Average GHI (W/m^2)')
plt.xticks(rotation=45) # Rotate x-axis labels for better readability if they overlap
plt.grid(axis='y', linestyle='--', alpha=0.7) # Add a grid
plt.tight_layout() # Adjust plot to ensure everything fits without overlapping
plt.show() # Display the plot


One-way ANOVA on GHI values:
F-statistic: 32.53
P-value: 0.000

Key Observations:

Highest Solar Potential: Sierra Leone consistently shows the highest average GHI, DNI, and DHI values among the three countries, indicating its superior overall solar potential.
Variability: While Sierra Leone has the highest median GHI, it also exhibits relatively high variability (standard deviation) compared to Togo, suggesting less consistent solar irradiance despite its higher average.
Significant Differences: The one-way ANOVA test on GHI values yielded a p-value of 0.000. This highly significant p-value (typically p < 0.05 is considered significant) indicates that there are statistically significant differences in GHI values between the three countries.

In [None]:
The solar potential data for Benin, Sierra Leone, and Togo has been analyzed, focusing on the noon (12:00) and 1 PM (13:00) time slots.

Summary Table (Mean, Median, Std Dev) of GHI, DNI, DHI by Country (Noon & 1 PM):

Country	Metric	mean	median	std
Benin	DHI	122.956	104.99	90.7186
DNI	655.454	821.21	335.808
GHI	648.747	740.915	254.918
Sierra	DHI	126.791	103.37	87.2023
Leone	DNI	764.095	887.41	285.558
GHI	751.487	829.83	216.792
Togo	DHI	113.883	98.71	78.4357
DNI	647.456	768.1	321.758
GHI	633.914	711.23	255.405

Export to Sheets
One-way ANOVA on GHI values:
F-statistic: 32.53
P-value: 0.000

Key Observations:

Highest Solar Potential: Sierra Leone consistently shows the highest average GHI, DNI, and DHI values among the three countries, indicating its superior overall solar potential.
Variability: While Sierra Leone has the highest median GHI, it also exhibits relatively high variability (standard deviation) compared to Togo, suggesting less consistent solar irradiance despite its higher average.
Significant Differences: The one-way ANOVA test on GHI values yielded a p-value of 0.000. This highly significant p-value (typically p < 0.05 is considered significant) indicates that there are statistically significant differences in GHI values between the three countries.



Here's the cross-country comparison of solar potential for Benin, Sierra Leone, and Togo based on the provided data:

Metric Comparison
The boxplots above visually represent the distribution of GHI, DNI, and DHI for each country. From these plots, we can observe the spread and central tendency of the irradiance values.

Summary Table (Mean, Median, Std Dev) of GHI, DNI, DHI by Country (Noon & 1 PM):

Country	Metric	Mean (W/m²)	Median (W/m²)	Std Dev (W/m²)
Benin	GHI	819.88	892.00	211.74
DNI	496.30	581.50	285.85
DHI	354.40	333.80	140.51
Sierra Leone	GHI	712.20	782.20	260.26
DNI	370.83	438.60	297.67
DHI	364.74	358.80	121.49
Togo	GHI	749.23	826.00	237.63
DNI	444.32	504.80	299.54
DHI	344.75	338.45	127.95

Export to Sheets
Statistical Testing
One-way ANOVA on GHI values:

F-statistic: 2326.62
P-value: 0.000
The p-value of 0.000 (which is < 0.05) indicates that there is a statistically significant difference in the mean GHI values between the three countries.

Key Observations
Benin shows the highest average GHI, suggesting a strong overall solar resource, but also exhibits a relatively higher standard deviation in DNI and DHI compared to Togo, indicating some variability in direct and diffuse components.
Sierra Leone consistently has the lowest average GHI and DNI among the three countries, implying a comparatively lower overall solar resource and direct irradiance. However, its DHI values are comparable to or even slightly higher than Benin and Togo, suggesting a more diffuse component of solar radiation.
Togo presents a good balance, with average GHI and DNI values falling between Benin and Sierra Leone. Togo also shows a notable variability in DNI, similar to Benin.