-
Notifications
You must be signed in to change notification settings - Fork 0
/
Analyzing_ToothData(Appendix).Rmd
143 lines (97 loc) · 5.78 KB
/
Analyzing_ToothData(Appendix).Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
title: "Analyzing ToothGrowth Data"
author: "Harish Kumar Rongala"
date: "October 23, 2016"
output:
html_document: default
pdf_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## 1. Overview
This document aims at drawing conclusions on the data using statistical tests. The experimental data used in this document is **ToothGrowth** data set from *datasets* package in R. The following steps will complete this process.
1. Perform Exploratory Data Analysis on the data
2. Perform T-tests
3. Infer from the tests and Draw conclusions
4. Finally, list the assumptions
## 2. Dataset description
The variable of interest is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
## 3. Exploratory data analysis
### 3.1. Structure of the data
Using R's **str** function, we can look at the structure of ToothGrowth data
```{r echo=FALSE}
str(ToothGrowth)
```
So, we are dealing with rather smaller data set. We have 60 observations with 3 variables to deal with. Among them, two are numeric/continuous and one is categorical/discrete. Let's dive in and look at more quantitative details, using R's **summary** function.
```{r echo=FALSE}
summary(ToothGrowth)
```
We have 30 observations for each type of supplement. Interesting values like mean, median, quantiles of the variables are listed. However, this may not make sense when individually observed. So, we move further and combine these variables to unearth any interesting findings.
### 3.2. Finding patterns
If we understand the experiment, **len** (Tooth length) is something we are interested in. We are trying to discover the affects of the supplements, **Orange Juice (OJ)** and **Ascorbic acid (VC)** given in different doses, on tooth growth.
Let's quickly plot the affect of supplements in different doses on tooth length. we group the tooth growth by supplement to closely observe the affect of each supplement with increase in the dose.
```{r echo=FALSE, warning=FALSE}
library(ggplot2)
library(gridExtra)
tdata<-ToothGrowth
tdata$dose<-as.factor(tdata$dose)
#plot2<-ggplot(tdata,aes(supp,len))+geom_boxplot(aes(fill=supp))+facet_grid(.~dose)+labs(x="Supplement",y="Tooth length",title="Tooth growth grouped by dose")
plot1<-ggplot(tdata,aes(dose,len))+geom_boxplot(aes(fill=dose))+facet_grid(.~supp)+labs(x="Dose",y="Tooth length",title="Tooth growth grouped by supplement")
print(plot1)
```
**Observation**: From the above plot, we notice that both Orange Juice (OJ) and ascorbic acid (VC) increase the tooth growth as the dose increases. Although for dose 0.5 and 1, Orange Juice (OJ) seems to contribute more to tooth length than ascorbic acid (VC). However we cannot conclude this as the results are inconsistent.
## 4. Statistical testing
Since we have a total of 60 observations, which is relatively very less to make a proper estimate. We can go with Gosset's (Student's) t-test which can help deal with smaller sample size.
**T-test for length and supplement**
```{r echo=TRUE}
#T-test for length and supplement
t.test(data=ToothGrowth, len~supp, var.equal=FALSE, paired=FALSE)$conf
```
**Result:** Our 95% Confidence interval **-0.17 to 7.57** contains **0**, which means supplementary has no significant affect on the tooth growth.
**T-test for length and dose**
As the grouping factor must have exactly two levels, we cannot test for all 3 dose values (0.5, 1, 2) at once. We will consider two at a time
```{r echo=T}
#T-test for length and dose (0.5,1)
t.test(data=subset(ToothGrowth,dose!=2), len~dose, var.equal=FALSE, paired=FALSE)$conf
```
**Result:** Our 95% Confidence interval is **-11.98 to -6.27**, which means dose **1** has more affect than **0.5** on the tooth growth.
```{r echo=T}
#T-test for length and dose (1,2)
t.test(data=subset(ToothGrowth,dose!=0.5), len~dose, var.equal=FALSE, paired=FALSE)$conf
```
**Result:** Our 95% Confidence interval is **-8.99 to -3.73**, which means dose **2** has more affect than **1** on the tooth growth.
```{r echo=T}
#T-test for length and dose (0.5,2)
t.test(data=subset(ToothGrowth,dose!=1), len~dose, var.equal=FALSE, paired=FALSE)$conf
```
**Result:** Our 95% Confidence interval is **-18.15 to -12.83**, which means dose **2** has more affect than **0.5** on the tooth growth.
## 5. Conclusion
From our Exploratory data analysis and T-tests, we can draw following conclusions about the Tooth Growth data
* There is no significant difference between Orange Juice (OJ) and ascorbic acid (VC) in the case of tooth growth.
* Increase in the dosage of supplement increases the tooth growth.
## 6. Assumptions
The above conclusions are made on the following assumptions
* Data is independent, Guinea pigs are randomly selected
* Data is not paired, No guinea pig took both supplements
## 7. Appendix
Code for plot in the Exploratory data analysis section can be found here.
```{r eval=FALSE}
library(ggplot2)
tdata<-ToothGrowth
# Convert variable dose into factor to better group in the plots
tdata$dose<-as.factor(tdata$dose)
#Tooth growth grouped by supplement
plot1<-ggplot(tdata,aes(dose,len))+geom_boxplot(aes(fill=dose))+facet_grid(.~supp)+labs(x="Dose",y="Tooth length",title="Tooth growth grouped by supplement")
print(plot1)
```
To closely compare the tooth growth by supplement, we could consider the following plot grouped by dose
```{r }
library(ggplot2)
tdata<-ToothGrowth
# Convert variable dose into factor to better group in the plots
tdata$dose<-as.factor(tdata$dose)
#Tooth growth grouped by dose
plot2<-ggplot(tdata,aes(supp,len))+geom_boxplot(aes(fill=supp))+facet_grid(.~dose)+labs(x="Supplement",y="Tooth length",title="Tooth growth grouped by dose")
print(plot2)
```