# *ADAM Chapter 4 - Making Statistical Inferences from Samples* #

### *Pr. 4.10 Parametric test to evaluate relative performance of two PV systems from sample data*

Authors: Diego Valdivieso, Tim Diller, and Gregor Henze

Created: May 27th, 2024

A consumer advocate group wishes to evaluate the performance
of two different types of photovoltaic (PV) panels
which are very close in terms of rated performance and
cost. They convince a builder of new homes to install 2 panels
of each brand on two homes in the same locality with care
taken that their tilt and orientation toward the sun are identical.
The test protocol involves monitoring these two PV
panels for 15 weeks and evaluating the performance of the
two brands based on their weekly total electrical output. The
weekly total electrical output in kWh is listed in Table 4.20.
The monitoring equipment used is identical in both locations
and has an absolute error of 3 kWh/week at 95% uncertainty
level. Evaluate using parametric tests whether the two brands
are different at a significance level of α = 0.05 with measurement
errors being explicitly considered.

Tabla 4.20

| Week | Brand A | Brand B | Week | Brand A | Brand B | Week | Brand A | Brand B |
|------|---------|---------|------|---------|---------|------|---------|---------|
| 1    | 197     | 189     | 6    | 203     | 187     | 11   | 174     | 170     |
| 2    | 202     | 199     | 7    | 165     | 160     | 12   | 225     | 218     |
| 3    | 148     | 142     | 8    | 121     | 115     | 13   | 242     | 232     |
| 4    | 246     | 248     | 9    | 146     | 138     | 14   | 206     | 213     |
| 5    | 173     | 176     | 10   | 189     | 173     | 15   | 197     | 193     |

# Introduction #

A consumer advocate group aims to compare the performance of two different types of photovoltaic (PV) panels. The panels, installed on identical homes with the same orientation and tilt, are monitored for 15 weeks to evaluate their weekly total electrical output. This analysis will use parametric tests to determine if there is a significant difference between the two brands, considering measurement errors at a significance level of α = 0.05.

#  Importing Libraries#

In [2]:
import pandas as pd
import numpy as np
from scipy.stats import shapiro, probplot, ttest_rel, ttest_1samp
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import matplotlib.pyplot as plt

# Units #

In [3]:
kwh = 1
week = 1

# Solution ##

##Data Preparation##

First, we structured the data in a pandas DataFrame and calculate the differences between the weekly outputs of Brand A and Brand B. We also accounted for measurement errors.

In [4]:
# Define the data
data = {
    'Week': list(range(1, 16)),
    'Brand A': [197, 202, 148, 246, 173, 203, 165, 121, 146, 189, 174, 225, 242, 206, 197],
    'Brand B': [189, 199, 142, 248, 176, 187, 160, 115, 138, 173, 170, 218, 232, 213, 193]
}

df = pd.DataFrame(data)
df['Difference'] = df['Brand A'] - df['Brand B']

# Display the data
display(df)

Unnamed: 0,Week,Brand A,Brand B,Difference
0,1,197,189,8
1,2,202,199,3
2,3,148,142,6
3,4,246,248,-2
4,5,173,176,-3
5,6,203,187,16
6,7,165,160,5
7,8,121,115,6
8,9,146,138,8
9,10,189,173,16


## Parametric Test ##

To evaluate the difference between the two brands, we performed a paired t-test. The measurement error will be incorporated by adjusting the standard deviation of the differences.

### Shapiro-Wilk Test and Histogram ###

First, we checked normality of the difference

In [5]:
# Normality test
stat_r, p_value_r = shapiro(df['Difference'])
print(f"Shapiro-Wilk test for Difference: Statistics={np.round(stat_r,3)}, p-value={np.round(p_value_r,3)}")

alpha = 0.05
if p_value_r < alpha:
    print("The difference of the two PV systems is not normally distributed.")
else:
    print("The difference of the two PV systems is normally distributed.")

Shapiro-Wilk test for Difference: Statistics=0.951, p-value=0.547
The difference of the two PV systems is normally distributed.


In [6]:
# Assuming df is your DataFrame and 'Difference' is the column you want to plot
fig2 = px.violin(df, y='Difference', title='Violin Plot of Differences in Weekly Electrical Output')

# Update layout to match your desired dimensions
fig2.update_layout(
    width=800,  # Width of the figure in pixels
    height=600, # Height of the figure in pixels
)

fig2.show()

Comment: Based on the Shapiro-Wilk test, since the p-values are greater than 0.05, we do not reject the null hypothesis, indicating that the data is normally distributed. This is also supported by the shape of the violin plot.

### Checking the difference of the PV systems ###

Null Hypothesis (H0): There is no significant difference in the weekly electrical output between Brand A and Brand B PV panels.

Alternative Hypothesis (H1): There is a significant difference in the weekly electrical output between Brand A and Brand B PV panels.

In [7]:
# Measurement error
error = 3  # kWh/week

# Paired t-test
t_stat, p_value = ttest_rel(df['Brand A'], df['Brand B'])

# Adjust the p-value to account for measurement error
n = len(df)
standard_error = np.std(df['Difference'], ddof=1) / np.sqrt(n)
adjusted_standard_error = np.sqrt(standard_error**2 + (error**2 / n))
adjusted_t_stat = np.mean(df['Difference']) / adjusted_standard_error

print(f"T-statistic: {t_stat}, Adjusted T-statistic: {adjusted_t_stat}, P-value: {p_value}")

# Interpret the result
if p_value < 0.05:
    print("Reject the null hypothesis: There is a significant difference between the two brands.")
else:
    print("Fail to reject the null hypothesis: There is no significant difference between the two brands.")

T-statistic: 3.331894782648395, Adjusted T-statistic: 3.0061919570520823, P-value: 0.004938708116581382
Reject the null hypothesis: There is a significant difference between the two brands.


Comment: Since the p-value (0.0049) is less than the significance level (α = 0.05), we reject the null hypothesis. This indicates that there is a statistically significant difference in the weekly electrical output between Brand A and Brand B PV panels.

This can be visually confirmed in the following plot:

In [8]:
# Line plot for weekly output
fig1 = px.line(df, x='Week', y=['Brand A', 'Brand B'], title='Weekly Electrical Output of Brand A and Brand B', labels={'value': 'Electrical Output (kWh)', 'variable': 'Brand'})
fig1.update_layout(
    width=800,  # Width of the figure in pixels
    height=600, # Height of the figure in pixels
    )
fig1.show()

Comment: Based on the performance of each PV system, we observe that Brand A consistently outperforms Brand B, except for weeks 4 and 5 where their performances are nearly identical, with differences falling within the measurement error.

#Conclusion#

Based on the results of the paired t-test, which yielded a p-value of 0.0049, we conclude that the difference in the weekly electrical output between Brand A and Brand B PV panels is statistically significant. This suggests that the performance of the two brands is not the same, and the observed differences in their outputs are unlikely to be due to random variation alone.