Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
181 lines (125 sloc) 3.76 KB
---
title: "Continuous Data"
author: "Aravind Hebbali"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Continuous Data}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r, echo=FALSE, message=FALSE}
library(descriptr)
library(dplyr)
```
## Introduction
This document introduces you to a basic set of functions that describe data
continuous data. The other two vignettes introduce you to functions that
describe categorical data and visualization options.
## Data
We have modified the `mtcars` data to create a new data set `mtcarz`. The only
difference between the two data sets is related to the variable types.
```{r egdata}
str(mtcarz)
```
## Data Screening
The `ds_screener()` function will screen a data set and return the following:
- Column/Variable Names
- Data Type
- Levels (in case of categorical data)
- Number of missing observations
- % of missing observations
```{r screener}
ds_screener(mtcarz)
```
## Summary Statistics
The `ds_summary_stats` function returns a comprehensive set of statistics
including measures of location, variation, symmetry and extreme observations.
```{r summary}
ds_summary_stats(mtcarz, mpg)
```
You can pass multiple variables as shown below:
```{r summary2}
ds_summary_stats(mtcarz, mpg, disp)
```
If you do not specify any variables, it will detect all the continuous
variables in the data set and return summary statistics for each of them.
## Frequency Distribution
The `ds_freq_table` function creates frequency tables for continuous variables.
The default number of intervals is 5.
```{r fcont}
ds_freq_table(mtcarz, mpg, 4)
```
### Histogram
A `plot()` method has been defined which will generate a histogram.
```{r fcont_hist, fig.width=7, fig.height=7, fig.align='centre'}
k <- ds_freq_table(mtcarz, mpg, 4)
plot(k)
```
## Auto Summary
If you want to view summary statistics and frequency tables of all or subset of
variables in a data set, use `ds_auto_summary()`.
```{r auto-summary}
ds_auto_summary_stats(mtcarz, disp, mpg)
```
## Group Summary
The `ds_group_summary()` function returns descriptive statistics of a continuous
variable for the different levels of a categorical variable.
```{r gsummary}
k <- ds_group_summary(mtcarz, cyl, mpg)
k
```
`ds_group_summary()` returns a tibble which can be used for further analysis.
```{r gsummary_tibble}
k$tidy_stats
```
### Box Plot
A `plot()` method has been defined for comparing distributions.
```{r gsum_boxplot, fig.width=7, fig.height=7, fig.align='centre'}
k <- ds_group_summary(mtcarz, cyl, mpg)
plot(k)
```
### Multiple Variables
If you want grouped summary statistics for multiple variables in a data set, use
`ds_auto_group_summary()`.
```{r auto-group-summary}
ds_auto_group_summary(mtcarz, cyl, gear, mpg)
```
## Multiple Variable Statistics
The `ds_tidy_stats()` function returns summary/descriptive statistics for
variables in a data frame/tibble.
```{r multistats}
ds_tidy_stats(mtcarz, mpg, disp, hp)
```
## Measures
If you want to view the measure of location, variation, symmetry, percentiles
and extreme observations as tibbles, use the below functions. All of them,
except for `ds_extreme_obs()` will work with single or multiple variables. If
you do not specify the variables, they will return the results for all the
continuous variables in the data set.
#### Measures of Location
```{r mloc}
ds_measures_location(mtcarz)
```
#### Measures of Variation
```{r mvar}
ds_measures_variation(mtcarz)
```
#### Measures of Symmetry
```{r msym}
ds_measures_symmetry(mtcarz)
```
#### Percentiles
```{r mperc}
ds_percentiles(mtcarz)
```
#### Extreme Observations
```{r mextreme}
ds_extreme_obs(mtcarz, mpg)
```