Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
198 lines (126 sloc) 3.99 KB
---
title: "Introduction to descriptr"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Introduction to descriptr}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, echo=FALSE, message=FALSE}
library(descriptr)
library(dplyr)
```
## Introduction
Descriptive statistics are used to summarize data. It enables us to present the
data in a more meaningful way and to discern any patterns existing in the data.
They can be useful for two purposes:
- provide basic information about variables in a data set
- highlight potential relationships between variables
This document introduces you to a basic set of functions that describe data.
There is a second vignette which provides details about functions which help
visualize statistical distributions.
## Data
We have modified the `mtcars` data to create a new data set `mtcarz`. The only
difference between the two data sets is related to the variable types.
```{r egdata}
str(mtcarz)
```
## Data Screening
The `ds_screener()` function will screen a data set and return the following:
- Column/Variable Names
- Data Type
- Levels (in case of categorical data)
- Number of missing observations
- % of missing observations
```{r screener}
ds_screener(mtcarz)
```
## Summary Statistics
The `ds_summary_stats` function returns a comprehensive set of statistics for
**continuous** data.
```{r summary}
ds_summary_stats(mtcarz, mpg)
```
## Cross Tabulation
The `ds_cross_table()` function creates two way tables of categorical variables.
```{r cross}
ds_cross_table(mtcarz, cyl, gear)
```
`ds_twoway_table()` will return a tibble.
```{r cross_tibble}
ds_twoway_table(mtcarz, cyl, gear)
```
A plot method has been defined which will generate:
### Grouped Bar Plots
```{r cross_group, fig.width=7, fig.height=7, fig.align='centre'}
k <- ds_cross_table(mtcarz, cyl, gear)
plot(k)
```
### Stacked Bar Plots
```{r cross_stack, fig.width=7, fig.height=7, fig.align='centre'}
k <- ds_cross_table(mtcarz, cyl, gear)
plot(k, stacked = TRUE)
```
### Proportional Bar Plots
```{r cross_prop, fig.width=7, fig.height=7, fig.align='centre'}
k <- ds_cross_table(mtcarz, cyl, gear)
plot(k, proportional = TRUE)
```
## Frequency Table (Categorical Data)
The `ds_freq_table()` function creates frequency tables for categorical variables.
```{r ftable}
ds_freq_table(mtcarz, cyl)
```
### Bar Plot
A barplot method has been defined.
```{r ftable_bar, fig.width=7, fig.height=7, fig.align='centre'}
k <- ds_freq_table(mtcarz, cyl)
plot(k)
```
## Frequency Table (Continuous Data)
The `ds_freq_cont` function creates frequency tables for continuous variables.
The default number of intervals is 5.
```{r fcont}
ds_freq_cont(mtcarz, mpg, 4)
```
### Histogram
A `hist` method has been defined.
```{r fcont_hist, fig.width=7, fig.height=7, fig.align='centre'}
k <- ds_freq_cont(mtcarz, mpg, 4)
plot(k)
```
## Group Summary
The `ds_group_summary()` function returns descriptive statistics of a continuous
variable for the different levels of a categorical variable.
```{r gsummary}
k <- ds_group_summary(mtcarz, cyl, mpg)
k
```
`ds_group_summary()` returns a tibble which can be used for further analysis.
```{r gsummary_tibble}
k$tidy_stats
```
### Box Plot
A `boxplot()` method has been defined.
```{r gsum_boxplot, fig.width=7, fig.height=7, fig.align='centre'}
k <- ds_group_summary(mtcarz, cyl, mpg)
plot(k)
```
## Multiple Variable Statistics
The `ds_multi_stats()` function generates summary/descriptive statistics for
variables in a data frame/tibble.
```{r multistats}
ds_multi_stats(mtcarz, mpg, disp, hp)
```
## Multiple One Way Tables
The `ds_oway_tables()` function creates multiple one way tables by creating a
frequency table for each categorical variable in a data frame.
```{r oway}
ds_oway_tables(mtcarz)
```
## Multiple Two Way Tables
The `ds_tway_tables()` function creates multiple two way tables by creating a
cross table for each unique pair of categorical variables in a data frame.
```{r tway}
ds_tway_tables(mtcarz)
```