Skip to content
Switch branches/tags
Go to file
Cannot retrieve contributors at this time
title: "Continuous Data"
author: "Aravind Hebbali"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Continuous Data}
```{r setup, include = FALSE}
collapse = TRUE,
comment = "#>"
```{r, echo=FALSE, message=FALSE}
## Introduction
This document introduces you to a basic set of functions that describe data
continuous data. The other two vignettes introduce you to functions that
describe categorical data and visualization options.
## Data
We have modified the `mtcars` data to create a new data set `mtcarz`. The only
difference between the two data sets is related to the variable types.
```{r egdata}
## Data Screening
The `ds_screener()` function will screen a data set and return the following:
- Column/Variable Names
- Data Type
- Levels (in case of categorical data)
- Number of missing observations
- % of missing observations
```{r screener}
## Summary Statistics
The `ds_summary_stats` function returns a comprehensive set of statistics
including measures of location, variation, symmetry and extreme observations.
```{r summary}
ds_summary_stats(mtcarz, mpg)
You can pass multiple variables as shown below:
```{r summary2}
ds_summary_stats(mtcarz, mpg, disp)
If you do not specify any variables, it will detect all the continuous
variables in the data set and return summary statistics for each of them.
## Frequency Distribution
The `ds_freq_table` function creates frequency tables for continuous variables.
The default number of intervals is 5.
```{r fcont}
ds_freq_table(mtcarz, mpg, 4)
### Histogram
A `plot()` method has been defined which will generate a histogram.
```{r fcont_hist, fig.width=7, fig.height=7, fig.align='centre'}
k <- ds_freq_table(mtcarz, mpg, 4)
## Auto Summary
If you want to view summary statistics and frequency tables of all or subset of
variables in a data set, use `ds_auto_summary()`.
```{r auto-summary}
ds_auto_summary_stats(mtcarz, disp, mpg)
## Group Summary
The `ds_group_summary()` function returns descriptive statistics of a continuous
variable for the different levels of a categorical variable.
```{r gsummary}
k <- ds_group_summary(mtcarz, cyl, mpg)
`ds_group_summary()` returns a tibble which can be used for further analysis.
```{r gsummary_tibble}
### Box Plot
A `plot()` method has been defined for comparing distributions.
```{r gsum_boxplot, fig.width=7, fig.height=7, fig.align='centre'}
k <- ds_group_summary(mtcarz, cyl, mpg)
### Multiple Variables
If you want grouped summary statistics for multiple variables in a data set, use
```{r auto-group-summary}
ds_auto_group_summary(mtcarz, cyl, gear, mpg)
### Combination of Categories
To look at the descriptive statistics of a continuous variable for different
combinations of levels of two or more categorical variables, use
```{r interact-summary}
ds_group_summary_interact(mtcarz, mpg, cyl, gear)
## Multiple Variable Statistics
The `ds_tidy_stats()` function returns summary/descriptive statistics for
variables in a data frame/tibble.
```{r multistats}
ds_tidy_stats(mtcarz, mpg, disp, hp)
## Measures
If you want to view the measure of location, variation, symmetry, percentiles
and extreme observations as tibbles, use the below functions. All of them,
except for `ds_extreme_obs()` will work with single or multiple variables. If
you do not specify the variables, they will return the results for all the
continuous variables in the data set.
#### Measures of Location
```{r mloc}
#### Measures of Variation
```{r mvar}
#### Measures of Symmetry
```{r msym}
#### Percentiles
```{r mperc}
#### Extreme Observations
```{r mextreme}
ds_extreme_obs(mtcarz, mpg)