\begin{figure}[H]
\includegraphics[width=12cm]{Pictures/statistical_test.jpeg}
\end{figure}

\section{T test}
t

\subsection{Welch's t-test or unequal variances t-test}
t

\section{ANOVA}
t

\section{Repeated measures ANOVA}
t

\section{Factorial ANOVA}
t

\section{Mixed-Design ANOVA}
t

\section{ANCOVA}
In ANCOVA we look at the effects of the categorical independent variable on an interval dependent (i.e. response) variable, after effects of interval covariates are controlled.

Dependent variable: mixed categorical and continuous

An analysis of covariance (ANCOVA) is a special case of linear models where we are interested in the effetcs of one numerical continuous explanatory variable and one categorical explanatory variable and their interaction. The response/depdendent variable is still a numerical variable. 

An ANCOVA is also used in slightly different instance where we want to control for the effects of some continuous variable while assessing the difference between some categorical variable of interest. Imagine trying to determine the effects of some treatment on the growth of an organism, but the organisms are at different sizes at the start. An ANCOVA would try to compare the means of the treatment while controlling for the different initial sizes. 

ANCOVA is a combination of simple linear regression and one-way ANOVA

ANCOVA offers an insight into how linear models work in general. The flexibility of linear models lets us cobine any number of categorical and numerical explanatory variables to model our phenomenon of interest/or make predictions.

The ANCOVA tries to decompose each observation as follows:
\[
y_{ij} = \mu + \alpha_{i} + \beta x_{ij} + \epsilon_{ij} 
\]
where $y_{ij}$ is the jth observation under the ith categorical group, $x_{ij}$ is the jth observation of the continuous IV under the ith group, $\alpha_i$ is the effect of the ith level of the categorical IV, $\beta$ is the slope of relationship between the DV and the continuous IV, $\epsilon_{ij}$ the associated unobserved error term for the jth observation in the ith group

Example: Tetrahymena data. Variables: glucose: presence of glucose in the growth media, concentration and diameter

\begin{minted}[breaklines]{R}
#read in the data
tetrahymena <- read.csv2( "http://staff.pubhealth.ku.dk/~linearpredictors/datafiles/Tetrahymena.csv",
sep = ";",dec = ".", header=TRUE, colClasses = c("factor","numeric","numeric"),
na.strings=".")

#Plot the relation ship between cell size and concenrtation for both glucose levels and some regression lines for visuals
eda <- ggplot(tetrahymena, aes(x = concentration, y = diameter, colour = glucose)) +
geom_point(aes(colour = glucose)) + geom_smooth(method = "lm", se = FALSE) +
theme_light() + scale_color_brewer(palette = "Set1") + facet_grid(~glucose) ; eda

#eda is the previoulsy generated plot
eda + scale_x_continuous(trans = 'log10')

tet_model1 <- lm(diameter ~ log10(concentration) + glucose + log10(concentration):glucose, data=tetrahymena)
#equivalent to log10(diameter) ~ log10(concentration)*glucose
#let's just look at a summary of the model
summary(tet_model1)
\end{minted}

Interpret the result:
\begin{enumerate}
    \item $(Intercept)$: under no glucose treatment. In other words this would be the intercept for a simple linear regression of diameter on concentration for the subset of the data where there was no glucose treatment.
    \item $log10(concentration)$: this is the slope for a simple linear regression of diameter on log10(concentration) for the subset of the data where there was no glucose treatment. 
    \item $glucose1$: under the glucose treatment and for a concentration of 0 cells the diameter is 0.755 $\mu m$ lower compared to no glucose treatment. In other words this is the difference in intercepts of two simple linear regressions of diameter on log10(concentration), one for no glucose samples and one for the glucose treated samples. 
    \item $log10(concentration):glucose$: the effect of concentration on diameter is higher by 0.1482 in the glucose condition compared to no glucose treatment. In other words the slope between concentration and diameter is $-3.0092+0.1482$ in elevated temperature condition.
\end{enumerate}

Simplifying the model: let's drop the interaction term (we have no reason to believe the relationship between cell size and concentration in the presence/absense og glucose)

\begin{minted}[breaklines]{R}
tet_model2 <- lm(diameter ~ log10(concentration)+glucose, data=tetrahymena)
#let's just look at a summary of the coefficents to be brief
summary(tet_model2)$coefficients

#since the two models we fit are nested we could use the likelehood ratio tests to see which is more apprpriate
anova(tet_model1, tet_model2)

ggplot(tetrahymena, aes(x = concentration, y = diameter, colour = glucose)) +
geom_point(aes(colour = glucose))+ geom_smooth(method = "lm", se = T) +
theme_light() + scale_color_brewer(palette = "Set1") +
scale_x_continuous(trans = 'log10') + xlab("log10(concentration)")
\end{minted}

\section{Extensions of the linear model}
Dependence of residuals: avoiding pseudoreplication and correlation with time and space, methods that deal with this are called mixed models

Heterogeneity of variances: we use a more general method to solve for the intercepts called general least squares and these methods are referred to as General Linear Models

Non-normality of data/residuals: if a transformation doesn't work and it won't if your response variable is binary, categorical or ordinal you can use what are called generalized linear models

For assumptions of multiple violations there are even broader methods such as generalized linear mixed models or generalized additive mixed models

\section{MANOVA}
Multivariate ANOVA

\section{Repeated measures MANOVA}

\section{Factorial MANOVA}

\section{Mixed-Design MANOVA}

\section{MANCOVA}
Multivariate ANCOVA

