/
README.Rmd
103 lines (62 loc) · 2.74 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE, message=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
# histogramR
## Overview
histogramR is a tool based on dplyr and ggplot2 that creates classical frequency distribution tables, histograms and frequency polygons. Also, a comparison between number of classes compute methods (Sturges, Friedman-Diaconis and Scott) are performed. This package is part of a final work in Computational Statistics course at the Master of Applied Statistics in Universidad del Norte, Colombia.
## Installation
histogramR is stored in this github repository, thus package devtools is needed to install. If you are on a fresh install of R, then following code will install a lot of packages.
```{r install, eval=FALSE, include=TRUE}
install.packages("devtools")
devtools::install_github("rodianf/histogramR")
library(histogramR)
```
## Usage
### tab_freq
This function creates a classical frequency distribution table, of class tibble, with five columns.
* **variable name**: Class intervals computed by selected method, default is "Sturges".
* **f**: Counts or frequency of the variable in a class interval.
* **rf**: Relative frequency or density.
* **cf**: Cummulative frequency.
* **crf**: Cummulative relative frequency.
As the return object is a tibble, functions from dplyr can be applied. To include in Rmarkdown use `knitr::kable` for better results.
#### Note
Classes with zero frequency are dropped from table. This is caused by function `group_by` from dplyr package, however a correction for this behavior will be implemented soon. See https://github.com/tidyverse/dplyr/pull/3492.
```{r tab_freq}
library(MASS)
data("Melanoma")
attach(Melanoma)
tab_freq(thickness)
tab_freq(thickness, nclass = "FD")
tab_freq(thickness) %>%
rename("Frequency" = f,
"Relative frequency" = rf)
tab_freq(thickness, nclass = "scott") %>%
rename("Frequency" = f,
"Relative frequency" = rf) %>%
knitr::kable()
```
### plot_freq
This function creates an histogram and frequency polygon or a cummulative frequency polygon. The return object is a ggplot2 plot, thus layers can be applied.
```{r plot_freq}
plot_freq(thickness)
plot_freq(thickness, nclass = "FD", density = TRUE)
plot_freq(thickness, nclass = "scott", density = TRUE, cfp = TRUE) +
theme_classic()
```
### nc_comp
This function compare the methods for calculation of the number of classes from a numerical random variable. Uses `plot_freq` function to generate plots. Generics as `print`, `summary` and `ggplot` can be used.
```{r nc_comp}
nc_comp(thickness)
comparison <- nc_comp(thickness)
print(comparison)
summary(comparison)
ggplot(comparison)
```