-
Notifications
You must be signed in to change notification settings - Fork 47
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
this is the first commit of the new package, visdat, which supersedes `footprintr`.
- Loading branch information
0 parents
commit a350e50
Showing
13 changed files
with
245 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
^.*\.Rproj$ | ||
^\.Rproj\.user$ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
.Rproj.user | ||
.Rhistory | ||
.RData |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
Package: visdat | ||
Title: preliminary visualisation of data | ||
Version: 0.0.0.9000 | ||
Authors@R: person("Nicholas", "Tierney", email = "nicholas.tierney@gmail.com", role = c("aut", "cre")) | ||
Description: visdat makes it easy to visualise your whole dataset so that you can quickly identify problems visually. | ||
Depends: | ||
R (>= 3.2.2) | ||
License: MIT | ||
LazyData: true | ||
RoxygenNote: 5.0.1 | ||
Imports: ggplot2, | ||
tidyr, | ||
dplyr |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Generated by roxygen2: do not edit by hand | ||
|
||
export(fingerprint) | ||
export(vis_dat) | ||
export(vis_miss) | ||
import(dplyr) | ||
import(ggplot2) | ||
importFrom(tidyr,gather) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
#' fingerprint | ||
#' | ||
#' \code{fingerprint} is a utility function for vis_dat | ||
#' | ||
#' @description fingerprint takes the fingerprint of a dataframe, and (currently) replaces the contents (x) with the class of a given object, unless it is missing (coded as NA), in which case it leaves it as NA. The name fingerprint is taken from the csv-fingerprint, of which this package is based. | ||
#' | ||
#' @param x a vector | ||
#' | ||
#' @export | ||
fingerprint <- function(x){ | ||
|
||
# is the data missing? | ||
ifelse(is.na(x), | ||
# yes? Leave as is NA | ||
yes = NA, | ||
# no? make that value no equal to the class of this cell. | ||
no = class(x)) | ||
|
||
} # end function |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
#' vis_dat | ||
#' | ||
#' \code{vis_dat} visualises a data.frame to tell you what it contains. | ||
#' | ||
#' @description \code{vis_dat} gives you an at-a-glance ggplot of what is inside a dataframe, colouring cells according to what class they are and whether the values are missing. As it returns a ggplot object, it is very easy to customize and change labels, etc. | ||
#' | ||
#' @param x a data.frame object | ||
#' | ||
#' @importFrom tidyr gather | ||
#' @import dplyr | ||
#' @import ggplot2 | ||
#' | ||
#' @export | ||
vis_dat <- function(x){ | ||
|
||
# apply the fingerprint to every column in the dataframe | ||
lapply(x, fingerprint) %>% | ||
# coerce it to a dataframe...there's probably a better way | ||
as_data_frame %>% | ||
# create a new column that is numbered from 1 to the number of rows | ||
# this assists in the gathering of rows together | ||
mutate(rows = 1:n()) %>% | ||
# gather the variables together for plotting | ||
# here we now have a column of the row number (row), then the variable(variables), then the contents of that variable (value) | ||
gather(key = variables, | ||
value = value, | ||
-rows) %>% | ||
# then we plot it | ||
ggplot(data = ., | ||
aes(x = variables, | ||
y = rows)) + | ||
geom_raster(aes(fill = value)) + | ||
theme_minimal() + | ||
theme(axis.text.x = element_text(angle = 45, vjust = 0.5)) + | ||
labs(x = "Variables in Dataset", | ||
y = "Rows / observations") | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
#' vis_miss | ||
#' | ||
#' \code{vis_miss} visualises a data.frame to display missingness. | ||
#' | ||
#' @description \code{vis_miss} gives you an at-a-glance ggplot of the missingness inside a dataframe, colouring cells according to missingness. As it returns a ggplot object, it is very easy to customize and change labels, etc. | ||
#' | ||
#' @param x a data.frame object | ||
#' | ||
#' @importFrom tidyr gather | ||
#' @import dplyr | ||
#' @import ggplot2 | ||
#' | ||
#' @export | ||
vis_miss <- function(x){ | ||
|
||
x %>% | ||
is.na %>% | ||
as.data.frame %>% | ||
mutate(rows = 1:n()) %>% | ||
# gather the variables together for plotting | ||
# here we now have a column of the row number (row), then the variable(variables), then the contents of that variable (value) | ||
gather(key = variables, | ||
value = value, | ||
-rows) %>% | ||
# then we plot it | ||
ggplot(data = ., | ||
aes(x = variables, | ||
y = rows)) + | ||
geom_raster(aes(fill = value)) + | ||
# change the colour, so that missing is grey, present is black | ||
scale_fill_grey(name = "", | ||
labels = c("Present", | ||
"Missing")) + | ||
theme_minimal() + | ||
theme(axis.text.x = element_text(angle = 45, | ||
vjust = 0.5)) + | ||
labs(x = "Variables in Dataset", | ||
y = "Rows / observations") | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# visdat | ||
|
||
This package is the second iteration of my attempt at cloning the super cool and way sexier "csv-fingerprint" from flowing data - see [here](https://github.com/setosa/csv-fingerprint) and [here](https://flowingdata.com/2014/08/14/csv-fingerprint-spot-errors-in-your-data-at-a-glance/). Initially I had named the package "footprintr", to keep in spirit with the name "csv-fingerprint". However, after a little more thought and usage, I felt that "footprintr" didn't actually describe what was going on with the pacakge, and what it does, and so "visdat" was born. | ||
|
||
# What does it do? | ||
|
||
visdat is a small r package that visualises a dataframe, displaying missing data and variable classes with different colours. Future work will allow for each cell to be colored according to its type (e.g., strings, factors, integers, decimals, dates, missing data). It would also be really cool to get this function to "intelligently" read in data types. | ||
|
||
Part of the name suggests that it could be integrated with testdat and testthat. The idea being that first you visualise your data, then you run tests to fix them. | ||
|
||
|
||
# How to install | ||
|
||
``` | ||
# install.packages("devtools") | ||
library(devtools) | ||
install_github("tierneyn/footprintr") | ||
``` | ||
|
||
# Example | ||
|
||
Let's explore the missing data | ||
|
||
``` | ||
library(visdat) | ||
vis_miss(airquality) | ||
``` | ||
|
||
Let's see what's inside airquality | ||
|
||
``` | ||
vis_dat(airquality) | ||
``` | ||
|
||
# Known Issues. | ||
|
||
**Individual cells do not have an individual class** | ||
Due to the fact that R coerces a vector to be the same type, this means that you cannot have something like c("a", 1L, 10.555) together as a vector, as it will just convert this to `[1] "a" "1" "10.555"`. This means that you don't get the ideal feature of picking up on nuances such as individuals cells that are different classes in the dataframe. Perhaps there is a way to read in a csv as a list so that these features are preserved? | ||
|
||
**Missing Data not listed in legend** | ||
|
||
When running the example below, the gray bars indicate missing values, but these are currently not specified as missing values. | ||
|
||
|
Binary file not shown.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
Version: 1.0 | ||
|
||
RestoreWorkspace: No | ||
SaveWorkspace: No | ||
AlwaysSaveHistory: Default | ||
|
||
EnableCodeIndexing: Yes | ||
UseSpacesForTab: Yes | ||
NumSpacesForTab: 2 | ||
Encoding: UTF-8 | ||
|
||
RnwWeave: knitr | ||
LaTeX: pdfLaTeX | ||
|
||
AutoAppendNewline: Yes | ||
StripTrailingWhitespace: Yes | ||
|
||
BuildType: Package | ||
PackageUseDevtools: Yes | ||
PackageInstallArgs: --no-multiarch --with-keep.source | ||
PackageRoxygenize: rd,collate,namespace |