# Hypothesis Testing

In [1]:
set.seed(37)

## Student's t-test

The `Student's t-test` compares the means of two samples to see if they are different. Here is a `two-sided` Student's t-test.

In [2]:
x <- rnorm(1000, mean=0, sd=1)
y <- rnorm(1000, mean=1, sd=1)

r <- t.test(x, y, alternative='two.sided')
print(r)


	Welch Two Sample t-test

data:  x and y
t = -23.159, df = 1998, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.1425178 -0.9641235
sample estimates:
  mean of x   mean of y 
-0.01839959  1.03492108 



Here is a directional Student's t-test to see if the mean of `x` is greater than the mean of `y`.

In [3]:
x <- rnorm(1000, mean=0, sd=1)
y <- rnorm(1000, mean=1, sd=1)

r <- t.test(x, y, alternative='greater')
print(r)


	Welch Two Sample t-test

data:  x and y
t = -22.576, df = 1991.2, p-value = 1
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 -1.118479       Inf
sample estimates:
 mean of x  mean of y 
0.01325957 1.05574987 



Here is a directional Student's t-test to see if the mean of `x` is less than the mean of `y`.

In [4]:
x <- rnorm(1000, mean=0, sd=1)
y <- rnorm(1000, mean=1, sd=1)

r <- t.test(x, y, alternative='less')
print(r)


	Welch Two Sample t-test

data:  x and y
t = -22.097, df = 1996.7, p-value < 2.2e-16
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
       -Inf -0.9224035
sample estimates:
 mean of x  mean of y 
0.01069279 1.00731729 



We may also perform a `one-sample` Student's t-test.

In [5]:
x <- rnorm(1000, mean=0, sd=1)

r <- t.test(x, mu=5)
print(r)


	One Sample t-test

data:  x
t = -159.87, df = 999, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 5
95 percent confidence interval:
 -0.13452024 -0.01000024
sample estimates:
  mean of x 
-0.07226024 



If your data is in long format, you may use a formula to perform a Student's t-test.

In [6]:
data <- data.frame(
    score = c(90, 89, 70, 99, 100, 77, 80, 67, 70),
    gender = c(rep('girl', 5), rep('boy', 4))
)

r <- t.test(score ~ gender, data=data)
print(r)


	Welch Two Sample t-test

data:  score by gender
t = -2.6069, df = 6.0971, p-value = 0.0397
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -31.15404  -1.04596
sample estimates:
 mean in group boy mean in group girl 
              73.5               89.6 



## Wilcoxon U-Test

The `Wilcoxon U-Test` is non-parametric test used to compare two samples. The function `wilcox.text` behaves the same way as the `t.test` function.

In [7]:
x <- rnorm(1000, mean=0, sd=1)
y <- rnorm(1000, mean=0.5, sd=1)

r <- wilcox.test(x, y)
print(r)


	Wilcoxon rank sum test with continuity correction

data:  x and y
W = 339274, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0



## Correlation

May also compute correlation and test the it as well.

In [8]:
x <- seq(1, 1000)
y <- x * 2 + rnorm(1000, mean=5, sd=5)

c <- cor(x, y)
print(c)

[1] 0.9999633


We compute the covariance with the `cov` function.`

In [9]:
x <- seq(1, 1000)
y <- x * 2 + rnorm(1000, mean=5, sd=5)

c <- cov(x, y)
print(c)

[1] 166818.4


We compute the significance with `cor.test`.

In [10]:
x <- seq(1, 1000)
y <- x * 2 + rnorm(1000, mean=5, sd=5)

r <- cor.test(x, y)
print(r)


	Pearson's product-moment correlation

data:  x and y
t = 3806.6, df = 998, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.9999610 0.9999696
sample estimates:
      cor 
0.9999656 



## Chi-squared test

A `Chi-squared` test is used to test for association with contigency tables.

In [11]:
df <- data.frame(
    rural = c(10, 15, 12),
    urban = c(20, 30, 25),
    row.names=c('DC', 'MD', 'VA')
)

r <- chisq.test(df)
print(r)


	Pearson's Chi-squared test

data:  df
X-squared = 0.0090902, df = 2, p-value = 0.9955



A `goodness of fit` test using the `Chi-squared test` is performed as follows.

In [12]:
df <- data.frame(
    rural = c(10, 15, 12),
    urban = c(20, 30, 25),
    row.names=c('DC', 'MD', 'VA')
)

r <- chisq.test(df$rural, p=df$urban, rescale.p=TRUE)
print(r)


	Chi-squared test for given probabilities

data:  df$rural
X-squared = 0.013514, df = 2, p-value = 0.9933

