# *ADAM Chapter 4 - Making Statistical Inferences from Samples* #

### *Pr. 4.11 Comparing two instruments using parametric, nonparametric and bootstrap methods*

Authors: Diego Valdivieso, Tim Diller, and Gregor Henze

Created: May 24th, 2024

A pyranometer meant to measure global solar radiation is
being cross compared with a primary reference instrument.
Fourteen (14) simultaneous observations in (kW/m2) were
taken with both instruments deployed side by side as shown
in Table 4.21. Determine, at a significance level α = 0.05,
whether the secondary field pyranometer differs from the
primary instrument based on:

(a) Parametric tests

(b) Non-parametric tests

(c) The bootstrap method with a sample size of 1000


Tabla 4.21

| Observation | Reference | Secondary | Observation | Reference | Secondary | Observation | Reference | Secondary |
|-------------|-----------|-----------|-------------|-----------|-----------|-------------|-----------|-----------|
| 1           | 0.96      | 0.93      | 6           | 0.89      | 0.91      | 11          | 0.84      | 0.81      |
| 2           | 0.82      | 0.78      | 7           | 0.64      | 0.62      | 12          | 0.59      | 0.55      |
| 3           | 0.75      | 0.76      | 8           | 0.81      | 0.77      | 13          | 0.94      | 0.87      |
| 4           | 0.61      | 0.64      | 9           | 0.68      | 0.63      | 14          | 0.91      | 0.86      |
| 5           | 0.77      | 0.74      | 10          | 0.65      | 0.62      |             |           |           |


# Introduction #

A pyranometer is used to measure global solar radiation and is being cross-compared with a primary reference instrument. Fourteen simultaneous observations (in kW/m²) were taken with both instruments deployed side by side. We determined, at a significance level of α = 0.05, whether the secondary field pyranometer differs significantly from the primary instrument

#  Importing Libraries#

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import shapiro, probplot, ttest_rel, ttest_1samp, wilcoxon
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import matplotlib.pyplot as plt

# Units #

In [2]:
kW = 1
m2 = 1

# Solution ##

##Data Preparation##

In [3]:
# Define the data
data = {
    'Observation': list(range(1, 15)),
    'Reference': [0.96, 0.82, 0.75, 0.61, 0.77, 0.89, 0.64, 0.81, 0.68, 0.65, 0.84, 0.59, 0.94, 0.91],
    'Secondary': [0.93, 0.78, 0.76, 0.64, 0.74, 0.91, 0.62, 0.77, 0.63, 0.62, 0.81, 0.55, 0.87, 0.86]
}

df = pd.DataFrame(data)

# Calculate differences
df['Difference'] = df['Reference'] - df['Secondary']

# Display the data
display(df)

Unnamed: 0,Observation,Reference,Secondary,Difference
0,1,0.96,0.93,0.03
1,2,0.82,0.78,0.04
2,3,0.75,0.76,-0.01
3,4,0.61,0.64,-0.03
4,5,0.77,0.74,0.03
5,6,0.89,0.91,-0.02
6,7,0.64,0.62,0.02
7,8,0.81,0.77,0.04
8,9,0.68,0.63,0.05
9,10,0.65,0.62,0.03


## (a) Parametric Test ##

The paired t-test is employed here to compare measurements from the primary reference instrument and the secondary field pyranometer because it is specifically designed for paired data. This test is ideal when each observation from one group has a natural pairing with an observation from the other group, such as simultaneous measurements taken under identical conditions from two instruments. By analyzing the differences within each pair, the paired t-test minimizes variability between pairs, enhancing the test's statistical power. This method effectively assesses whether there is a significant difference between the two sets of readings. To evaluate this difference, we will perform a paired t-test, but first, we need to verify the normality of the data.

### Shapiro-Wilk Test and Histogram ###

First, we checked normality of the difference

In [4]:
# Normality test
stat_r, p_value_r = shapiro(df['Difference'])
print(f"Shapiro-Wilk test for Difference: Statistics={np.round(stat_r,3)}, p-value={np.round(p_value_r,3)}")

alpha = 0.05
if p_value_r < alpha:
    print("The difference of the two pyranometers is not normally distributed.")
else:
    print("The difference of the two pyranometers is normally distributed.")

Shapiro-Wilk test for Difference: Statistics=0.894, p-value=0.092
The difference of the two pyranometers is normally distributed.


In [6]:

# Assuming df is your DataFrame and 'Difference' is the column you want to plot
fig2 = px.violin(df, y='Difference', title='Violin Plot of Differences in Measurements')

# Update layout to match your desired dimensions
fig2.update_layout(
    width=800,  # Width of the figure in pixels
    height=600, # Height of the figure in pixels
)

fig2.show()

Comment: Based on the Shapiro-Wilk test, since the p-values are greater than 0.05, we do not reject the null hypothesis, indicating that the data is normally distributed. The shape of the violin plot however is not exactly in the shape of normal distribution. Further data will be needed to conclusively graphically identify the normal distribution.

### Test ###

Null Hypothesis (H0): The mean difference between the Reference and Secondary measurements is zero.

Alternative Hypothesis (H1): The mean difference between the Reference and Secondary measurements is not zero.

In [None]:
# Perform paired t-test
t_stat, p_value = ttest_rel(df['Reference'], df['Secondary'])

print(f"T-statistic: {t_stat}, P-value: {p_value}")

alpha = 0.05
# Interpretation
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between the two instruments.")
else:
    print("Fail to reject the null hypothesis: There is no significant difference between the two instruments.")

T-statistic: 3.5094533727887787, P-value: 0.00384374217668293
Reject the null hypothesis: There is a significant difference between the two instruments.


:Comment: The p-value for the parametric paired t-test is 0.00384, which is significantly less than the significance level of 0.05. This indicates a statistically significant difference between the measurements of the primary reference instrument and the secondary field pyranometer. Therefore, we reject the null hypothesis, confirming that the secondary pyranometer's readings differ from the primary instrument's readings.

## (b) Non-parametric tests ##

The [Wilcoxon signed-rank test](https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test) is used here as a non-parametric alternative to the paired t-test because it does not assume a normal distribution of the differences between paired observations. This is particularly useful when dealing with small sample sizes or when the data does not meet the normality assumption, which can compromise the validity of parametric tests. The Wilcoxon test assesses whether the median difference between pairs of observations is zero, providing a robust method to detect significant differences without relying on distributional assumptions. This makes it an appropriate choice for comparing the performance of the two pyranometers, ensuring reliable results even if the data is not normally distributed.

Null Hypothesis (H0): The median difference between the Reference and Secondary measurements is zero.

Alternative Hypothesis (H1): The median difference between the Reference and Secondary measurements is not zero.


In [None]:
# Perform Wilcoxon signed-rank test
w_stat, p_value = wilcoxon(df['Reference'], df['Secondary'])

print(f"Wilcoxon statistic: {w_stat}, P-value: {p_value}")

# Interpretation
if p_value < 0.05:
    print("Reject the null hypothesis: There is a significant difference between the two instruments.")
else:
    print("Fail to reject the null hypothesis: There is no significant difference between the two instruments.")

Wilcoxon statistic: 10.5, P-value: 0.0067138671875
Reject the null hypothesis: There is a significant difference between the two instruments.


Comment: The p-value for the Wilcoxon signed-rank test is 0.00671, which is less than the significance level of 0.05. This indicates that the difference between the measurements of the primary reference instrument and the secondary field pyranometer is statistically significant. Therefore, we reject the null hypothesis, concluding that the secondary pyranometer's readings are significantly different from those of the primary instrument. The Wilcoxon test, being non-parametric, confirms this result without assuming a normal distribution of differences, providing robust evidence that the two instruments do not measure solar radiation identically.

## (c) The bootstrap method with a sample size of 1000 ##

The bootstrap method is highly suitable for this problem as it provides a robust, non-parametric approach to statistical inference, particularly useful when dealing with small sample sizes or when the data's distributional assumptions are uncertain. By repeatedly resampling the observed data with replacement, the bootstrap method generates an empirical distribution of the statistic of interest, such as the mean difference between the primary reference instrument and the secondary pyranometer. This approach allows us to construct confidence intervals and perform hypothesis testing without relying on traditional assumptions of normality. Therefore, the bootstrap method offers a flexible and reliable means of assessing whether the two pyranometers yield significantly different measurements.

In [None]:
# Bootstrap function with seed
def bootstrap(data, n_samples=1000, alpha=5, seed=42):
    np.random.seed(seed)
    boot_means = []
    for _ in range(n_samples):
        sample = data.sample(frac=1, replace=True)
        boot_mean = sample['Reference'].mean() - sample['Secondary'].mean()
        boot_means.append(boot_mean)
    return np.percentile(boot_means, [alpha/2, 100-alpha/2]), boot_means

ci, boot_means = bootstrap(df)
print(f"Bootstrap 95% CI: {ci}")

# Interpretation
if ci[0] > 0 or ci[1] < 0:
    print("Reject the null hypothesis: There is a significant difference between the two instruments.")
else:
    print("Fail to reject the null hypothesis: There is no significant difference between the two instruments.")

Bootstrap 95% CI: [0.012125   0.03928571]
Reject the null hypothesis: There is a significant difference between the two instruments.


## Visualization ##

Using visualization in this analysis is crucial as it provides an intuitive understanding of the data and the differences between the primary reference instrument and the secondary field pyranometer. Visual tools such as line plots allow us to quickly identify patterns, trends, and anomalies that may not be apparent through numerical analysis alone. For instance, a line plot can show the weekly variations in measurements, highlighting consistency or deviations between the instruments.

In [None]:
# Line plot for Reference and Secondary
fig1 = px.line(df, x='Observation', y=['Reference', 'Secondary'], title='Simultaneous Observations of Reference and Secondary Instruments', labels={'value': 'Measurement (kW/m²)', 'variable': 'Instrument'})
fig1.update_layout(
    width=800,
    height=600,
)
fig1.show()

Comment: Based on the line plot, we observe a consistent difference between the two measurements, which further supports our findings from the parametric, non-parametric, and bootstrap methods.

# Conclusion #

Based on the results from the parametric paired t-test, the non-parametric Wilcoxon signed-rank test, and the bootstrap method, we conclude that there is a significant difference between the primary reference instrument and the secondary field pyranometer. The parametric test yielded a p-value of 0.00384, the Wilcoxon test produced a p-value of 0.00671, and the bootstrap method provided a confidence interval that did not contain zero. All three methods consistently indicate that the secondary pyranometer's measurements are significantly different from those of the primary instrument. This conclusion is further supported by the line plot, which shows a consistent difference between the two sets of measurements, reinforcing the statistical evidence.