-
Notifications
You must be signed in to change notification settings - Fork 1
/
basic_stats_vig.Rmd
119 lines (81 loc) · 4.12 KB
/
basic_stats_vig.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
title: "Basic statistics"
author: "Maximiliano Garnier-Villarreal"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
highlight: pygments
toc: true
vignette: >
%\VignetteIndexEntry{Basic statistics}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
library(GMisc)
knitr::opts_chunk$set(
echo = TRUE,
message = FALSE,
warning = FALSE,
error = FALSE,
fig.align = "center",
out.width = "80%"
)
```
This document highlights some functions of the GMisc package related to classical statistics, focusing on summarized data rather than vectors.
# Confidence intervals
There are a few functions that can compute confidence intervals based on vectors or tables (a full sample), but sometimes all you get are the point estimates and corresponding sample information. Even though the formulas are straight forward to implement, these have been programmed here for easiness.
The cases shown here are:
* For the mean:
- One-sample Z-test
- Two-sample Z-test
- One-sample t-test
- Two-sample t-test
* For the variance (standard deviation):
- One-sample ($\chi^2$-test)
- Two-sample (ratio F)
For the variance cases the functions also report the confidence interval for the standard deviation.
## One-sample Z-test
This has the form:
$$\bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}$$
The function for computing it is `ci_z`, with arguments `x` for the sample mean, `sig` for the population standard deviation, `n` for the sample size, and `conf.level` for the desired confidence level:
```{r}
ci_z(x = 80, sig = 15, n = 20, conf.level = 0.95)
```
## Two-sample Z-test
This has the form:
$$\left(\bar{x}_1-\bar{x}_2\right) \pm z_{\alpha/2} {\sqrt{\frac{\sigma^2_1}{n_1} + \frac{\sigma^2_2}{n_2}}}$$
The function for computing it is `ci_z2`, with arguments `x1` for the mean of sample 1, `sig1` for the population 1 standard deviation, `n1` for the sample size of 1, `x2` for the mean of sample 2, `sig2` for the population 2 standard deviation, `n2` for the sample size of 2, and `conf.level` for the desired confidence level:
```{r}
ci_z2(x1 = 42, sig1 = 8, n1 = 75, x2 = 36, sig2 = 6, n2 = 50, conf.level = 0.95)
```
## One-sample t-test
This has the form:
$$\bar{x} \pm t_{\alpha/2,v} \frac{s}{\sqrt{n}}$$
The function for computing it is `ci_t`, with arguments `x` for the sample mean, `s` for the sample standard deviation, `n` for the sample size, and `conf.level` for the desired confidence level:
```{r}
ci_t(x = 80, s = 15, n = 20, conf.level = 0.95)
```
## Two-sample t-test
This has the general form:
$$\left(\bar{x}_1-\bar{x}_2\right) \pm t_{\alpha/2,v} {\sqrt{\frac{s^2_1}{n_1} + \frac{s^2_2}{n_2}}}$$
The function for computing it is `ci_t2`, with arguments `x1` for the mean of sample 1, `s1` for the sample 1 standard deviation, `n1` for the sample size of 1, `x2` for the mean of sample 2, `s2` for the sample 2 standard deviation, `n2` for the sample size of 2, and `conf.level` for the desired confidence level:
```{r}
ci_t2(x1 = 42, s1 = 8, n1 = 75, x2 = 36, s2 = 6, n2 = 50, conf.level = 0.95)
```
## One-sample $\chi^2$-test
This has the form:
$$\frac{(n-1)s^2}{\chi^2_{\alpha/2,v}} < \sigma^2 < \frac{(n-1)s^2}{\chi^2_{1-\alpha/2,v}}$$
The function for computing it is `ci_chisq`, with arguments `s` for the sample standard deviation, `n` for the sample size, and `conf.level` for the desired confidence level:
```{r}
ci_chisq(s = 0.535, n = 10, conf.level = 0.95)
```
## Two-sample F-test (Ratio of variances)
This has the form:
$$\frac{s^2_1}{s^2_2} \frac{1}{F_{\alpha/2(v_1,v_2)}} < \frac{\sigma^2_1}{\sigma^2_2} < \frac{s^2_1}{s^2_2} F_{\alpha/2(v_2,v_1)}$$
The function for computing it is `ci_F`, with arguments `s1` for the sample 1 standard deviation, `n1` for the sample size of 1, `s2` for the sample 2 standard deviation, `n2` for the sample size of 2, and `conf.level` for the desired confidence level:
```{r}
ci_F(s1 = 3.1, n1 = 15, s2 = 0.8, n2 = 12, conf.level = 0.95)
```
# P-values
There are homologous functions for calculating p-values, with the same arguments but adding either the population mean or population standard deviation for the respective cases.