# *Advanced Data Analysis and Modelling*

##Problem 6.1

## Full-factorial design for evaluating three different missile systems

*Authors: Tim Diller, Gregor Henze*

A full-factorial experiment is conducted to determine which of three different missile systems is preferable. The propellant burning rate for 24 static firings was measured using four different propellant types. The experiment performed duplicate observations (replicate r = 2) of burning rates (in minutes) at each combination of the treatments. The data, after coding, are given in Table 6.32. The following hypotheses tests are to be studied:

(i) There is no difference in the mean propellant burning rates when different missile systems are used.

(ii) There is no difference in the mean propellant burning rates of the four propellant types.

(iii) There is no interaction between the different missile systems and the different propellant types.

In [2]:
# first we import the relevant libraries
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import plotly.express as px
import plotly.graph_objects as go

The first step is to import the data, and turn it into a dataframe to work with it.

In [3]:
# we turn the data from table 6.32 into a dataframe to make it comfortable to work with.
data = {
    'Missile_System': ['A1', 'A1', 'A1', 'A1','A1', 'A1', 'A1', 'A1', 'A2', 'A2', 'A2', 'A2','A2', 'A2', 'A2', 'A2', 'A3', 'A3', 'A3', 'A3', 'A3', 'A3', 'A3', 'A3'],
    'Propellant_Type': ['b1', 'b1', 'b2', 'b2', 'b3', 'b3', 'b4', 'b4','b1', 'b1', 'b2', 'b2', 'b3', 'b3', 'b4', 'b4','b1', 'b1', 'b2', 'b2', 'b3', 'b3', 'b4', 'b4'],
    'Burning_Rate': [34.0, 32.7, 30.1, 32.8, 29.8, 26.7, 29.0, 28.9, 32.0, 33.2, 30.2, 29.8, 28.7, 28.1, 27.6, 27.8, 28.4, 29.3, 27.3, 28.9, 29.7, 27.3, 28.8, 29.1]
}

df = pd.DataFrame(data)


Next we will make a scatterplot of the data, to enable a visual inspection

In [4]:
# we map each Missile_System to its corresponding marker symbol
marker_symbols = {'A1': 'triangle-up', 'A2': 'diamond', 'A3': 'circle'}
# we map the colors to the different types of propellants
color_values = {'b1': 'red', 'b2': 'blue', 'b3': 'purple', 'b4': 'green'}


# Create a figure
fig = go.Figure()

# Scatter the data, connect the repeated runs.
for missile_system, symbol in marker_symbols.items():
    for propellant, color in color_values.items():
        scatter_df = df[(df['Missile_System'] == missile_system) & (df['Propellant_Type'] == propellant)]
        color = color_values[propellant]
        fig.add_trace(go.Scatter(x=scatter_df.index, y=scatter_df['Burning_Rate'], mode='markers+lines', marker=dict(symbol=symbol, color=color, size=10),line=dict(color=color, dash='dash'), name='Missile_system:' + missile_system + ', Propellant: ' + propellant))

# Update layout to adjust the width of the plot
fig.update_layout(width=800, xaxis_title='Dataframe Index', yaxis_title='Burning Time [Minutes]', title='Burning Time as Function of Missile System and Propellant')

fig.show()


A number of conclusions can be drawn from the plot. Especially in Missile system A1, there is a significant variance in the burn rates for the replicated experiments. Also, it can be seen that some propellants seem to have higher average firing rates then others.

However, an actual statistical analysis is required to see whether any of the hypotheses pass the p-value threshold of 0.05.

To conduct the statistical analysis for each hypothesis, we use the anova_lm functionality of the statsmodels API for each hypothesis separately.

In [None]:
# Fit the ANOVA model for each hypothesis
# (i) There is no difference in the mean propellant burning rates when different missile systems are used.
model_i = ols('Burning_Rate ~ Missile_System', data=df).fit()
anova_table_i = sm.stats.anova_lm(model_i, typ=2)

# (ii) There is no difference in the mean propellant burning rates of the four propellant types.
model_ii = ols('Burning_Rate ~ Propellant_Type', data=df).fit()
anova_table_ii = sm.stats.anova_lm(model_ii, typ=2)

# (iii) There is no interaction between the different missile systems and the different propellant types.
model_iii = ols('Burning_Rate ~ Missile_System * Propellant_Type', data=df).fit()
anova_table_iii = sm.stats.anova_lm(model_iii, typ=2)

# Print ANOVA tables
print("ANOVA Table for Hypothesis (i):")
print(anova_table_i)
print("\nANOVA Table for Hypothesis (ii):")
print(anova_table_ii)
print("\nANOVA Table for Hypothesis (iii):")
print(anova_table_iii)

ANOVA Table for Hypothesis (i):
                   sum_sq    df         F    PR(>F)
Missile_System  14.523333   2.0  1.976476  0.163502
Residual        77.155000  21.0       NaN       NaN

ANOVA Table for Hypothesis (ii):
                    sum_sq    df         F    PR(>F)
Propellant_Type  40.081667   3.0  5.178844  0.008235
Residual         51.596667  20.0       NaN       NaN

ANOVA Table for Hypothesis (iii):
                                   sum_sq    df          F    PR(>F)
Missile_System                  14.523333   2.0   5.844400  0.016898
Propellant_Type                 40.081667   3.0  10.752962  0.001020
Missile_System:Propellant_Type  22.163333   6.0   2.972949  0.051168
Residual                        14.910000  12.0        NaN       NaN


## Conclusion

we can see that based on the data, we have to reject hypothesis (i), while we confirm hypothesis (ii). in hypothesis (iii), the interaction term very narrowly misses the threshold for statistical significance.

However, this does not mean that it is likely that there is no relevant interaction term. It just means that more data is required to make a conclusive assessment of this interaction. See also specifically the Notebook on ANOVA testing in Chapter 6 for further explanation of the interaction between the effect strength, sample variance, and sample size in determining the p-value of an ANOVA statistic.