/
qstats.Rmd
109 lines (62 loc) · 2.44 KB
/
qstats.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
title: "Descriptive statistics by group"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Descriptive statistics by group}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE, warning=FALSE, message=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(qacBase)
```
## Overview
Getting summary statistics for a quantitative variable is a very common
task in data analysis. Unfortunately, **R** makes it surprisingly difficult.
The `qstats` function is an attempt to rectify the situation by making it simple to get any number of descriptive statistics for a numeric variable and to break these statistics down by the levels of one or more categorical variables (groups).
The general format is
**`qstats(data, variable, grouping variables, statistics, other options)`**
Note that variable names do not have to be quoted.
## Using default statistics
By default the *sample size*, *mean*, and *standard deviation* are provided.
Let's take a look at fuel efficiencies for 11,914 automobiles
in the `cardata` data frame.
```{r include=TRUE}
# simple summary statistics
qstats(cardata, highway_mpg)
# summary statistics by vehicle_size
qstats(cardata, highway_mpg, vehicle_size)
# summary statistics by vehicle_size and drive type
qstats(cardata, highway_mpg, vehicle_size, driven_wheels)
```
## Specifying other statistics
You can supply a statistics argument with the "stats" parameter. You can pass a single statistic, or multiple statistics as a vector of names.
```{r include=TRUE}
# single statistic
qstats(cardata, highway_mpg, vehicle_size, stats = "median")
# multiple statistics
qstats(cardata, highway_mpg, vehicle_size,
stats = c("median", "min", "max"))
```
User-defined functions can also be used as a statistics. The only requirement
is that the function returns a single number.
```{r include=TRUE}
#custom statistics
p25 <- function(x) quantile(x, probs=.25)
p75 <- function(x) quantile(x, probs=.75)
#calling the built in and custom statistics
qstats(cardata, highway_mpg, vehicle_size,
stats = c("min", "p25", "p75", "max"))
```
## Other options
Other options include
* **na.rm** When TRUE, NAs are removed. Default is TRUE.
* **digits** The number of decimal points to print. Default = 2.
```{r include=TRUE}
qstats(cardata, highway_mpg, vehicle_size,
stats=c("n", "mean","median","sd"),
na.rm=FALSE, digits=2)
```