# Case Study 6. Cecil SAN Configuration

Researchers in the Department of Engineering Science use optimisation
techniques to reconfigure existing (or design new) Storage Area Networks
(SANs). In order to verify the performance of these designs they build
computer simulations and test the performance of each SAN under peak
traffic conditions.

One Masters project for 2005 involved the development of a simulation
for the existing Cecil SAN and a proposed new design using ARENA (a
simulation package introduced in ENGSCI 355). In order to assess the
performance of the new design, the through-times for the various flow
paths were compared (with each simulation receiving the same network
traffic). The through-times for the first 20 jobs sent from ''Server
26'' to ''Device 7'' in the peak time period are stored in the following
variable:

\begin{align*}
\textbf{ThroughTime} &\quad\quad \textrm{through-time for the particular job from Server 26 to Device 7} \\
\textbf{Conf} &\quad\quad \textrm{the particular configuration ('c' - changed, 'p' - present)}
\end{align*}

In [None]:
install.packages("s20x")
library(s20x)
library(repr)
options(repr.plot.width=8, repr.plot.height=6)

In [None]:
Cecil.df = read.table("data/Cecil2Sample.txt", header = TRUE)
attach(Cecil.df)
#head(Cecil.df)
#tail(Cecil.df)

In [None]:
layout20x(1, 2)
boxplot(ThroughTime ~ Conf, data = Cecil.df, main = "Boxplots of Through-Times")
twosampPlot(ThroughTime ~ Conf, data = Cecil.df)

In [None]:
summaryStats(ThroughTime ~ Conf, data = Cecil.df)

In [None]:
normcheck(ThroughTime[Conf == "c"], shapiro.wilk = TRUE)
normcheck(ThroughTime[Conf == "p"], shapiro.wilk = TRUE)
#normcheck(lm(ThroughTime ~ Conf, data = Cecil.df))

In [None]:
eovcheck(ThroughTime ~ Conf, data = Cecil.df, levene = TRUE)

In [None]:
t.test(ThroughTime ~ Conf, data = Cecil.df, var.equal = FALSE)
#t.test(ThroughTime ~ Conf, data = Cecil.df, var.equal = TRUE)

## Methods and Assumption Checks

We have a numerical measurement made on two independent configurations,
so we should do a two-sample $t$-test.

We assume the individual jobs are independent of one another. The
equality of variance assumption of the residuals is clearly not met, as
on the residual plot one group has much larger spread than the other.
The Normality assumption is also probably not satisfied as points on the
Q-Q plot do not lie on the straight line, but we can use the Central
Limit Theorem to justify the Normality assumption. Because of the
unequal variance, we should use the Welch version of the two-sample
$t$-test.

The model fitted is
${\tt ThroughTime}_{ij} = \mu + \alpha_i + \varepsilon_{ij}$, where
$\alpha_i$ is the effect of being in each configuration, either
changed or present, and $\varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)$.

## Executive Summary

In order to assess the performance of the new Cecil SAN design, the
through-times for the various flow paths were compared using simulation
models (with each simulation receiving the same network traffic). In
particular, the through-times for the first 20 jobs sent from ''Server
26'' to ''Device 7'' in the peak time period were analysed.

We observe that the through-times on the present configuration are
longer, on average, than the through-times on the proposed
reconfiguration.

We estimate that the mean through-time on the present network is between
$4.52 \times 10^{-5}$ and $1.50 \times 10^{-4}$ seconds longer than
on the proposed reconfiguration.