/
contents.Rmd
71 lines (47 loc) · 3.23 KB
/
contents.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
title: "Exploring a data frame"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Exploring a data frame}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(qacBase)
library(knitr)
```
The purpose of the **qacBase** package is to provide functions for descriptive statistics, data management, and data visualization. As a part of this package, the **contents** function produces a series of informational tables that allow for users to have a comprehensive understanding of their dataset of choice as well as the each of the quantitative and categorical variables featured in their dataset. Graphical functions, such as **barcharts**, **histograms**, and **densities** provide succinct visualizations of the variables in a data frame.
## Example Dataset: Motor Trend Magazine Review of Cars
How can you quickly become familiar with the data in a data frame? We'll use the **cars74** data frame as an example. This data frame contains information on the characteristics of 32 cars in 1974.
```{r message=FALSE, warning=FALSE}
data(cars74)
```
### Data description with `contents()`
```{r message=FALSE, warning=FALSE}
contents(cars74)
```
As shown above, the `contents` function produced three tables, one that provides an overall summary of the data and two tables that break down the quantitative and categorical variables of the dataset respectively.
The **overall table** describes each of the variables in the data frame, listing their column position, variable name, the variable type, number of unique values, the number of missing values, and the corresponding percentage of missing values.
The **quantitative table** provides information for each of the quantitative variables in the data frame, listing the variable name, the number of non-missing values, and a series of summary statistics including the mean, standard deviation, skewness, minimum, 25%tile, median, 75%tile, and maximum.
The **categorical table** provides information for each of the categorical variables in the data frame, listing the variable name, the specific levels of each variable, and the number of observations for each level, and percentage distribution of each variable level. By default, up to 10 levels of a categorical variable are displayed (but you can increase this by adding the option maxcat = #, where # is the number of levels to display).
If the dataset only contains quantitative or categorical, then only the overall summary table and the relevant variable table would be printed.
### Visualizing the data frame
The `df_plot` function provides a single graph displaying the variables, their types, and the percent of values present (or missing).
```{r message=FALSE, warning=FALSE}
df_plot(cars74)
```
### Visualizing the categorical variables
The `barcharts` function plots the distribution of each of the categorical variables in the data frame.
```{r message=FALSE, warning=FALSE}
barcharts(cars74)
```
### Visualizing the quantitative variables
The `histograms` and `densities` functions plot the distribution of each of the quantitative variables in the data frame.
```{r message=FALSE, warning=FALSE}
histograms(cars74)
densities(cars74)
```