Skip to content

An R package for working with data that uses shorthand and symbols

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

mattkerlogue/shrthnd

Repository files navigation

shrthnd tidyods package logo

R-CMD-check

Data is often published with shorthand and symbols, and regularly these tags are found in the same container (e.g. a spreadsheet/table cell) as the numeric value. The aim of {shrthnd} is to process character vectors of numerical data that also contain non-numeric shorthand and symbols, and to ensure both pieces of information can be easily retained and worked with.

Installation

{shrthnd} is not yet on CRAN, but binary versions can be installed from R-universe:

install.packages(
  "shrthnd",
  repos = c("https://mattkerlogue.r-universe.dev", "https://cran.r-project.org")
)

You can install the development version of shrthnd like so:

# install.packages("remotes")
remotes::install_github("mattkerlogue/shrthnd")

Usage

Use shrthnd_num() to convert a character vector to a shrthnd_num vector. In effect a shrthnd_num() is a pair of vectors, one numeric and a character vector to store the non-numeric components of the input vector. By default a shrthnd_num() will try to behave as a numeric vector, and can be explicitly coerced into a numeric vector with as.numeric(). You can use shrthnd_tags(), amongst other functions, to interact with the non-numeric (“tag”) component of the input vector. {shrthnd} also provides for the annotation of data.frames, specifically of the tibble::tibble() flavour.

Full usage details are available on the {shrthnd} documentation website.

library(shrthnd)

x <- c("12", "34.567", "[c]", "NA", "56.78 [e]", "78.9", "90.123[e]", 
       "321.09*", "987.564 \u2021", ".", "..")

sh_x <- shrthnd_num(x)

sh_x
#> <shrthnd_num[11]>
#>  [1]  12.00      34.57         NA [c]     NA      56.78 [e]  78.90    
#>  [7]  90.12 [e] 321.09 *   987.56 ‡       NA .       NA ..

shrthnd_list(sh_x)
#> <shrthnd_list[6]>
#> [c] (1 location): 3 
#> [e] (2 locations): 5, 7 
#> * (1 location): 8 
#> ‡ (1 location): 9 
#> . (1 location): 10 
#> .. (1 location): 11

tbl <- tibble::tibble(
  x = x,
  sh_x = sh_x,
  as_num = as.numeric(sh_x), 
  as_char = as.character(sh_x),
  tag = shrthnd_tags(sh_x), 
  as_shrthnd = as_shrthnd(sh_x), 
  as_shrthnd2 = as_shrthnd(sh_x, digits = 3)
)

tbl
#> # A tibble: 11 × 7
#>    x               sh_x as_num as_char tag   as_shrthnd as_shrthnd2
#>    <chr>       <sh_dbl>  <dbl> <chr>   <chr> <chr>      <chr>      
#>  1 12         12.00       12   12      <NA>  12.00      12.000     
#>  2 34.567     34.57       34.6 34.567  <NA>  34.57      34.567     
#>  3 [c]           NA [c]   NA   <NA>    [c]   NA [c]     NA [c]     
#>  4 NA            NA       NA   <NA>    <NA>  NA         NA         
#>  5 56.78 [e]  56.78 [e]   56.8 56.78   [e]   56.78 [e]  56.780 [e] 
#>  6 78.9       78.90       78.9 78.9    <NA>  78.90      78.900     
#>  7 90.123[e]  90.12 [e]   90.1 90.123  [e]   90.12 [e]  90.123 [e] 
#>  8 321.09*   321.09 *    321.  321.09  *     321.09 *   321.090 *  
#>  9 987.564 ‡ 987.56 ‡    988.  987.564 ‡     987.56 ‡   987.564 ‡  
#> 10 .             NA .     NA   <NA>    .     NA .       NA .       
#> 11 ..            NA ..    NA   <NA>    ..    NA ..      NA ..

sh_tbl <- shrthnd_tbl(
  tbl,
  title = "Example table",
  notes = c("Note 1", "Note 2"),
  source_note = "Shrthnd documentation, 2023"
)

sh_tbl
#> # Title:    Example table
#> # A tibble: 11 × 7
#>    x               sh_x as_num as_char tag   as_shrthnd as_shrthnd2
#>    <chr>       <sh_dbl>  <dbl> <chr>   <chr> <chr>      <chr>      
#>  1 12         12.00       12   12      <NA>  12.00      12.000     
#>  2 34.567     34.57       34.6 34.567  <NA>  34.57      34.567     
#>  3 [c]           NA [c]   NA   <NA>    [c]   NA [c]     NA [c]     
#>  4 NA            NA       NA   <NA>    <NA>  NA         NA         
#>  5 56.78 [e]  56.78 [e]   56.8 56.78   [e]   56.78 [e]  56.780 [e] 
#>  6 78.9       78.90       78.9 78.9    <NA>  78.90      78.900     
#>  7 90.123[e]  90.12 [e]   90.1 90.123  [e]   90.12 [e]  90.123 [e] 
#>  8 321.09*   321.09 *    321.  321.09  *     321.09 *   321.090 *  
#>  9 987.564 ‡ 987.56 ‡    988.  987.564 ‡     987.56 ‡   987.564 ‡  
#> 10 .             NA .     NA   <NA>    .     NA .       NA .       
#> 11 ..            NA ..    NA   <NA>    ..    NA ..      NA ..      
#> # ☰ Source: Shrthnd documentation, 2023
#> # ☰ There are 2 notes, use `annotations(x)` to view

annotations(sh_tbl)
#> ── Notes for `sh_tbl` ──────────────────────────────────────────────────────────
#> Title: Example table
#> Source: Shrthnd documentation, 2023
#> Notes:
#> • Note 1
#> • Note 2

Philosophy

Datasets, especially statistical data published by governments, international institutions and academia, often comes with symbols and markers to provide further details about the values: that a value is estimated, the reason for why a value is missing, or that a value has a given statistical significance level.

The most common approach to processing data that contains both numeric and non-numeric components is to scrub the non-numeric content, so that the input can be coerced into a numeric vector. However, this non-numeric content (“tags”) often convey useful information that it might be useful to retain. If you want to access this non-numeric content, you may need to re-import your dataset or change your processing. This creates opportunity for error and, critically, de-linking the numeric and non-numeric components. The shrthnd_num() data type builds on vctrs::new_rcrd() to separate, but keep linked, these numeric and non-numeric components of a vector.

Logo

The {shrthnd} package logo is a combination of the word “shorthand” written in Pitman shorthand alongside an asterisk. The image was drawn by hand with plot points then adjusted for plotting in {ggplot2}. The “shorthand” shape is based on the representation in Arthur Reynold’s Pitman’s English and Shorthand Dictionary, retrieved from the Internet Archive on 2023-05-11.

About

An R package for working with data that uses shorthand and symbols

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages