# **Exercise Session 7**
# Developed by Biljana Jonoska Stojkova, PhD
# Revised by Johnson Chen

## **Lecture 7 - Hypothesis Testing: Basic Concepts and Basic Tests (Non-Parametric Tests, One-Way ANOVA) and Their Assumptions**

Today's exercise will focus on formulating predefined research questions, and we will practice determining which of the statistical concepts discussed in today's lecture are applicable to the Plant Growth dataset.

We will keep the same teams as those on the previous days. Each team will continue working on the Plant Growth dataset. Each team member will have to answer the questions and upload their Jupyter Notebook on Canvas.

### **Today's Learning Goal:**

- Explore the Plant Growth dataset and define research questions, then translate them into statistical hypotheses.
- Determine what kind of exploratory data analysis plot would be relevant to test the statistical hypothesis.
- Identify which statistical concepts discussed today can be useful in testing the statistical hypothesis you formulated on the Plant Growth dataset.
- Assess whether One-Way ANOVA is appropriate to handle the complexities in the Plant Growth dataset for both primary and secondary hypotheses.
- Learn how to interpret the results from One-Way ANOVA based on your defined statistical hypothesis.
- Ensure you upload your Jupyter Notebook at the end of the day.

### **Tasks for All Teams**

You will continue working within your team.

- Define a statistical hypothesis.
- Determine an appropriate graph to explore your statistical hypothesis.
- Determine whether One-Way ANOVA is appropriate to answer your research question.
- Justify your decision based on the research question of interest and data structures.
- Discuss which assumptions may be violated and how useful and trustworthy the results would be.

**Assumptions of One-Way ANOVA:**

AA1. Comparison groups are two or more uncorrelated samples.

AA2. The outcome variable of interest is normally distributed within each comparison group.

AA3. The outcome variable of interest has equal variance within each of the comparison groups (related to equal sample size within each group).

AA4. Independent observations: Each comparison group represents a simple random sample from their respective populations and are independent of each other.

In [None]:
#Run this code
library(tidyverse)
data("PlantGrowth")
head(PlantGrowth)

#### **Teams 1 - 11:**

Your task is to examine the dataset and formulate statistical hypotheses. Start by examining the dataset and running the code cells.

**Q1.** How many comparison groups are there in the Plant Growth dataset? State the group names clearly in your answer below.


In [None]:
#Run this code to answer Q1
unique(PlantGrowth$group)

**Answer 1.**

**Q2.** Are the comparison groups uncorrelated or paired?

In [None]:
#Run this code to answer Q2
help("PlantGrowth")

You can download the cited book <a href = "../books/978-1-4899-3174-0.pdf"> here </a> 



**Answer 2.**


**Q3.** Formulate a statistical hypothesis.

Hint: The experiment was run to test how each of the two treatments (trt1 and trt2) are affecting the dried weight of plants. Define a statistical hypothesis with an omnibus test (to compare the plant dried weight between the three groups all at once). State clearly the ANOVA type statistical hypothesis.

In [None]:
#Run this code for question Q3 (formulate research question)
g1 = PlantGrowth %>% ggplot(aes(y=weight, x=group)) + geom_boxplot() + labs(title="Boxplots of dried weight of plants for each group")
g1


**Answer 3.**  


**Q4.** Determine if One-Way ANOVA assumptions are appropriate for the Plant Growth dataset (Run the ANOVA analysis code).

In [None]:
# Run ANOVA analysis Q4.
m1 <- aov(weight ~ group, data=PlantGrowth)
m1

**Answer 4. AA1. Comparison groups are two or more un-correlated samples.**  
This question can be answered based on the experimental design. 

**Answer 4. AA2. The outcome variable of interest is normally distributed within each comparison group.**

In [None]:
#Run this code to examine the Assumption 2, normality within each group 

g1 = PlantGrowth %>% ggplot(aes(x=weight)) + geom_histogram() + facet_wrap(~group)+ labs(title="Distribution of plant weight for each group")
g1

**Answer 4. AA3. The outcome variable of interest has equal variance within each of the two comparison groups(related to equal sample size within each group).**
            
            
This question can be answered from the boxplots and by examining the sample sizes in each group. 

In [None]:
#run this code to examine sample sizes in each group
PlantGrowth %>% count(group)

**Answer 4. AA4. Independent observations: each comparison group represents a simple random sample from their respective populations and are independent of each other.** 

This question can be answered based on the experimental design. 

**Q5.** Interpret the results from ANOVA for the primary statistical hypothesis.

**Hint**: Remember that the omnibus test in ANOVA tests the hypothesis: H0: all the group means are the same, HA: at least one of the three groups is different from the rest of the two groups. The statsitical problem was formuated according to this hypothesis. What kind of statement is now possible from the One-Way ANOVA?


**Answer 5.**


**Q6.** Formulate secondary and tertiary statistical hypotheses.

Hint: Is it possible to conduct pairwise comparisons? Perhaps the secondary hypothesis is to compare the outcome between trt1 and ctrl, and trt2 to ctrl, and the tertiary is to compare trt1 to trt2. 



In [None]:
# Run the first of the two secondary pairwise hypotheses, trt 1 versus ctrl
trt1_ctrl <- t.test(weight~group,data=PlantGrowth %>% filter(group !="trt2"),var.equal=T)
print(trt1_ctrl)

In [None]:
# Run the second of the two secondary pairwise hypotheses, trt 2 versus ctrl
trt2_ctrl <- t.test(weight~group,data=PlantGrowth %>% filter(group !="trt1"),var.equal=T)
print(trt2_ctrl)

In [None]:
# Run the tertiary pairwise hypotheses, trt 1 versus trt 2
trt1_trt2 <- t.test(weight~group,data=PlantGrowth %>% filter(group !="ctrl"),var.equal=T)
print(trt1_trt2)

**Answer 6.** 

**Bonus:**

**Q7.** Interpret the results from the post-hoc pairwise comparisons for the secondary and tertiary statistical hypotheses.

**Q8.** Discuss the implications of multiple comparisons in the above analysis. How will the setup of the hypotheses affect the strength of the evidence (is Type I Error inflated)?

**Q9.** Can you estimate the differences in the mean plant weight between the groups with ANOVA?

Type your answers in the cells below:

**Answer 7.**


**Answer 8.**

**Answer 9.**

**Upload your work from Lecture 7 Exercise session**

**Note.** Jupiter Notebook is acceptable for Class participation mark. 
          Please make sure you save your JupiterNotebook with Answers.

- Each student will upload the Jupiter Notebook on Canvas Course 1: https://canvas.ubc.ca/courses/144703:

 `[Lecture_7_Exercise_Session 7]_[TeamNumber]_[student name].ipynb`
eg., `Lecture_7_Exercise_Session 7_Team21_Biljana_Jonoska_Stojkova.ipynb`

- Please write at the title who was responsible for writing each paragraph. 

Navigate to the Assignments section on Canvas Course 1, and upload the Jupiter document on Canvas under:
`Class Participation\Lecture 7 - Hypothesis testing, basic concepts, basic tests (non-parametric tests, One-Way ANOVA) and their assumptions ` 
