# 01NAEX - Exercise 02
Data and exercises come from D.C. Montgomery: Design and Analysis of Experiment [text odkazu](https://)- Chapter 02


Get requirements:


In [None]:
list_of_packages <- c("tidyverse", "nortest","lattice","pwr","MASS","agricolae")
missing_packages <- list_of_packages[!(list_of_packages %in% installed.packages()[,"Package"])]
missing_packages


In [None]:
# If you need to check your settings

#getwd()
#print(.libPaths())
#print(sessionInfo())
#print(version)

If you install agricolae package on Google colab, it takes a long time ... It installs some extra packages and it can cause problems. Be patient.

In [None]:
if(length(missing_packages)) install.packages(missing_packages)
lapply(list_of_packages, library, character.only = TRUE)

## Assigment:

*  Do exercise 3.7, 3.8, 3.9, and 3.10.
* Use the R to create and analyze given designs.

Data and exercises come from D.C. Montgomery: Design and Analysis of Experiment.\


### Exercises 3.07

The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically. A completely randomized experiment was conducted	and the following data were collected:
	
| Mixing | Technique Tensile Strength (lb/in2)||||
|--------------------------------------------|||||
| 1      |  3129  |  3000  |  2865  |  2890  |
| 2      |  3200  |  3300  |  2975  |  3150  |
| 3      |  2800  |  2900  |  2985  |  3050  |
| 4      |  2600  |  2700  |  2600  |  2765  |

* Construct a graphical display to compare the mean tensile strengths for the
four mixing techniques. What are your conclusions?
* Test the hypothesis that mixing techniques affect the
strength of the cement. Use  $\alpha = 0.05$.
* Use the Fisher LSD method with  $\alpha = 0.05$ to make
comparisons between pairs of means.
*  Construct a normal probability plot of the residuals.
What conclusion would you draw about the validity of
the normality assumption?
*  Plot the residuals versus the predicted tensile strength. Comment on the plot.
* Prepare a scatter plot of the results to aid the interpretation of the results of this experiment.

In [None]:
Ex03_7 <- read.table("https://raw.githubusercontent.com/francji1/01NAEX/main/data/Ex03_7.csv",header=TRUE,sep=";")
head(Ex03_7)
str(Ex03_7)

library(ggplot2)
library(tidyverse)
install.packages('car')
library(car)
data <- Ex03_7 %>% transmute(stren = Tensile_Strength,
                          tech = as.factor(Technique))
ggplot(data=data, aes(x=tech, y=stren)) + 
  geom_boxplot() + 
  stat_summary(fun.y="mean")



It seems that there is obvious difference in mixing techniques... to get also quantitative insight we can now 
conduct one-way ANOVA

In [None]:
anova <- aov(stren ~ tech, data=data)
summary(anova)

In [None]:
summary(lm(stren ~ tech, data=data))

We can reject null hypothesis about equality in means of strengths on level of 0.05. To find out which pairs significantly differ we will perform post-hoc analysis

In [None]:
? LSD.test

In [None]:
# basic Fishers LSD test (with none correction)
out1<-LSD.test(anova,"tech",p.adj="none",console=TRUE)
plot(out1,variation="SD") # variation standard deviation

In [None]:
#  Fishers LSD test with bonferroni correction
out1<-LSD.test(anova,"tech",p.adj="bonferroni",console=TRUE)
plot(out1,variation="SD") # variation standard deviation

Fishers methods suggests that there is significant difference between means in group 1&4, 2&4 and 3&4

In [None]:
ggplot(data=data,mapping=aes(x=resid(anova), y=..density..)) +
  geom_histogram(color="gray30",fill="gray70",binwidth=88) 

In [None]:
plot(anova, which=1)

In [None]:
plot(anova, which=2)

In [None]:
plot(anova, which=5)

In [None]:
bartlett.test(stren ~ tech, data=data)
leveneTest(anova)


Results of both tests suggest that we do not have enough evidence to reject hypothesis about equality of variances between groups.

In [None]:
ggplot(data, aes(x=tech, y=stren, color=tech)) +
  geom_point(size=5, shape=16)

Graphical methods sugests that maybe there is inly difference between group 1&4 2&4 3&4


### Exercises 3.08 and 3.09 

Reconsider the experiment in Problem 3.07. 

* Rework part (3) of Problem 3.07 using Tukey’s test	with 	$\alpha = 0.05$. Do you get the same conclusions from Tukey’s test that you did from the graphical procedure and/or the Fisher LSD method?
* Explain the difference between the Tukey and Fisher procedures.
*Find a 95percent confidence interval on the mean tensile strength of the Portland cement produced by each of the four mixing techniques. Also find a 95 percent confidence interval on the difference in means for techniques 1 and 3. Does this aid you in interpreting the results of the experiment?

In [None]:
thsd <-TukeyHSD(anova, ordered=F, conf.level=0.95)
thsd
plot(thsd)

As we can see that Tukeys HSD method showed that there is signifcant difference in means between groups 1&4 2&4 3&4 which is same conclusion as in graphical method and Fisher's LSD method with bonferroni's correction but different than basic Fisher's LSD method with none correction.

Fisher LSD method is similar to pairwaise t-test (difference is only in dfs) and with more tests there is still higher probability of error of type I. On the other hand Tukeys method holds the significance level on chosen level and uses more exotic distribution.

In [None]:
confint(lm(stren~tech-1, data=data), level=0.95)  # get rid of intercept to get CI for all means

In [None]:

confint(lm(stren~tech, data=data), level=0.95) # technique 1 is our reference so we get 95% CI for tech_i - tech_1

### Exercises 3.10

A product developer is investigating the tensile strength
of a new synthetic fiber that will be used to make cloth for
men’s shirts. Strength is usually affected by the percentage of
cotton used in the blend of materials for the fiber. The engineer
conducts a completely randomized experiment with five levels
of cotton content and replicates the experiment five times.


* Is there evidence to support the claim that cotton content
affects the mean tensile strength? Use $\alpha = 0.05$.
* Use the Fisher LSD method to make comparisons
between the pairs of means. What conclusions can you
draw?
* Analyze the residuals from this experiment and comment
on model adequacy.


In [None]:
Ex03_10 <- read.table("https://raw.githubusercontent.com/francji1/01NAEX/main/data/Ex03_10.csv",header=TRUE,sep=";")
head(Ex03_10)
str(Ex03_10)

In [None]:
data2 <- Ex03_10 %>% transmute(percentage = as.factor(Cotton_Weight),
                          strength = Observations)

In [None]:
ggplot(data=data2, aes(x=percentage, y=strength)) + 
  geom_boxplot() + 
  stat_summary(fun.y="mean", size=1, color='purple') + 
  geom_point(size=3, shape=16, color='red')

In [None]:
summary(lm(strength ~ percentage, data=data2))

We can see that p-value for F-test is 0 so we can reject null hypothesis about equality of means so there exists at least one significant difference in means tensile strength between some group.

In [None]:
out2<-LSD.test(aov(strength ~ percentage, data=data2),"percentage",p.adj="none",console=TRUE)
plot(out2,variation="SD") # variation standard deviation

In [None]:
bartlett.test(strength ~ percentage, data=data2)
leveneTest(strength ~ percentage, data=data2)

In [None]:
plot(lm(strength ~ percentage, data=data2))

In [None]:
ggplot(data=data2,mapping=aes(x=resid(lm(strength ~ percentage, data=data2)),
 y=..density..)) +
  geom_histogram(color="gray30",fill="gray70",binwidth=2.5) 

Graphical and statistical test suggest that there is no obvious problem with variance equality or normality of residuals. Residual plots does not show any trends ... Independence of measurements is hardly to be tested if at all...

Overall analysis sugest that there is some linear trend between cotton percentage and tensile strength within interval 15-30 % but between 30% and 35% exist some kind of threshold where another increasing of cotton percentage results in drop of tensile strength