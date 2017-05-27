skimr

The goal of skimr is to provide a frictionless approach to dealing with summary statistics iteratively and interactively as part of a pipeline, and that conforms to the principle of least surprise.

skimr provides summary statistics that you can skim quickly to understand and your data and see what may be missing. It handles different data types (numerics, factors, etc), and returns a skimr object that can be piped or displayed nicely for the human reader.

Installation

# install.packages("devtools") devtools :: install_github( " hadley/colformat " ) devtools :: install_github( " ropenscilabs/skimr " )

Skim statistics in the console

added missing, complete, n, sd

reports numeric/int/double separately from factor/chr

handles dates, logicals

uses Hadley's colformats, specifically colformats::spark-bar()

Nicely separates numeric and factor variables:







Many numeric variables:







Another example:







skim_df object (long format)

By default skim prints beautifully in the console, but it also produces a long, tidy-format skim_df object that can be computed on.

a <- skim( chickwts ) dim( a ) # [1] 22 5 View( a )

Compute on the full skim_df object

> skim( mtcars ) % > % filter( stat == " hist " ) # A tibble: 11 × 5 var type stat level value < chr > < chr > < chr > < chr > < dbl > 1 mpg numeric hist ▂▅▇▇▇▃▁▁▂▂ 0 2 cyl numeric hist ▆▁▁▁▃▁▁▁▁▇ 0 3 disp numeric hist ▇▇▅▁▁▇▃▂▁▃ 0 4 hp numeric hist ▆▆▇▂▇▂▃▁▁▁ 0 5 drat numeric hist ▃▇▂▂▃▆▅▁▁▁ 0 6 wt numeric hist ▂▂▂▂▇▆▁▁▁▂ 0 7 qsec numeric hist ▂▃▇▇▇▅▅▁▁▁ 0 8 vs numeric hist ▇▁▁▁▁▁▁▁▁▆ 0 9 am numeric hist ▇▁▁▁▁▁▁▁▁▆ 0 10 gear numeric hist ▇▁▁▁▆▁▁▁▁▂ 0 11 carb numeric hist ▆▇▂▁▇▁▁▁▁▁ 0

Works with strings!

Specify your own statistics