Skip to content

hughjonesd/santoku

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
R
 
 
 
 
man
 
 
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

santoku santoku logo

CRAN status Lifecycle: experimental CRAN Downloads Per Month R-CMD-check AppVeyor build status Codecov test coverage

santoku is a versatile cutting tool for R. It provides chop(), a replacement for base::cut().

Advantages

Here are some advantages of santoku:

  • By default, chop() always covers the whole range of the data, so you won’t get unexpected NA values.

  • chop() can handle single values as well as intervals. For example, chop(x, breaks = c(1, 2, 2, 3)) will create a separate factor level for values exactly equal to 2.

  • chop() can handle many kinds of data, including numbers, dates and times.

  • chop_* functions create intervals in many ways, using quantiles of the data, standard deviations, fixed-width intervals, equal-sized groups, or pretty intervals for use in graphs.

  • lbl_* functions make it easy to label intervals: use interval notation like [1, 2), dash notation like 1-2, or arbitrary styles using glue::glue().

  • tab_* functions quickly chop data, then tabulate it.

These advantages make santoku especially useful for exploratory analysis, where you may not know the range of your data in advance.

Examples

library(santoku)

chop returns a factor:

chop(1:5, c(2, 4))
#> [1] [1, 2) [2, 4) [2, 4) [4, 5] [4, 5]
#> Levels: [1, 2) [2, 4) [4, 5]

Include a number twice to match it exactly:

chop(1:5, c(2, 2, 4))
#> [1] [1, 2) {2}    (2, 4) [4, 5] [4, 5]
#> Levels: [1, 2) {2} (2, 4) [4, 5]

Customize output with lbl_* functions:

chop(1:5, c(2, 4), labels = lbl_dash())
#> [1] 1—2 2—4 2—4 4—5 4—5
#> Levels: 1—2 2—4 4—5

Chop into fixed-width intervals:

chop_width(runif(10), 0.1)
#>  [1] [0.1691, 0.2691) [0.6691, 0.7691) [0.8691, 0.9691) [0.6691, 0.7691)
#>  [5] [0.4691, 0.5691) [0.1691, 0.2691) [0.5691, 0.6691) [0.1691, 0.2691)
#>  [9] [0.7691, 0.8691) [0.8691, 0.9691)
#> 6 Levels: [0.1691, 0.2691) [0.4691, 0.5691) ... [0.8691, 0.9691)

Or into fixed-size groups:

chop_n(1:10, 5)
#>  [1] [1, 6)  [1, 6)  [1, 6)  [1, 6)  [1, 6)  [6, 10] [6, 10] [6, 10] [6, 10]
#> [10] [6, 10]
#> Levels: [1, 6) [6, 10]

Chop dates by calendar month, then tabulate:

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

tab_width(as.Date("2021-12-31") + 1:90, months(1), 
            labels = lbl_discrete(fmt = "%d %b")
          )
#> 01 Jan—31 Jan 01 Feb—28 Feb 01 Mar—31 Mar 
#>            31            28            31

For more information, see the vignette.

About

A versatile cutting tool for R

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

No packages published