Create fake data in R
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.md

charlatan

Project Status: Active – The project has reached a stable, usable state and is being actively developed. Build Status Build status cran checks codecov cran version

charlatan makes fake data, inspired from and borrowing some code from Python's faker

Make fake data for:

  • person names
  • jobs
  • phone numbers
  • colors: names, hex, rgb
  • credit cards
  • DOIs
  • numbers in range and from distributions
  • gene sequences
  • geographic coordinates
  • emails
  • URIs, URLs, and their parts
  • IP addresses
  • more coming ...

Possible use cases for charlatan:

  • Students in a classroom setting learning any task that needs a dataset.
  • People doing simulations/modeling that need some fake data
  • Generate fake dataset of users for a database before actual users exist
  • Complete missing spots in a dataset
  • Generate fake data to replace sensitive real data with before public release
  • Create a random set of colors for visualization
  • Generate random coordinates for a map
  • Get a set of randomly generated DOIs (Digital Object Identifiers) to assign to fake scholarly artifacts
  • Generate fake taxonomic names for a biological dataset
  • Get a set of fake sequences to use to test code/software that uses sequence data

Reasons to use charlatan:

  • Lite weight, few dependencies
  • Relatively comprehensive types of data, and more being added
  • Comprehensive set of languages supported, more being added
  • Useful R features such as creating entire fake data.frame's

Installation

cran version

install.packages("charlatan")

dev version

devtools::install_github("ropensci/charlatan")
library("charlatan")

high level function

... for all fake data operations

x <- fraudster()
x$job()
#> [1] "Accountant, chartered"
x$name()
#> [1] "Rosalyn Berge"
x$color_name()
#> [1] "SaddleBrown"

locale support

Adding more locales through time, e.g.,

Locale support for job data

ch_job(locale = "en_US", n = 3)
#> [1] "Energy engineer"     "Biomedical engineer" "Cabin crew"
ch_job(locale = "fr_FR", n = 3)
#> [1] "Spécialiste des affaires réglementaires en chimie"
#> [2] "Chargé d'études naturalistes"                     
#> [3] "Ingénieur production en mécanique"
ch_job(locale = "hr_HR", n = 3)
#> [1] "Dokumentarist"                           
#> [2] "Stručni suradnik u predškolskoj ustanovi"
#> [3] "Pregledač vagona"
ch_job(locale = "uk_UA", n = 3)
#> [1] "Складальник" "Психолог"    "Електрик"
ch_job(locale = "zh_TW", n = 3)
#> [1] "商業設計"     "娛樂事業人員" "日式廚師"

For colors:

ch_color_name(locale = "en_US", n = 3)
#> [1] "LightCoral" "LightGreen" "Brown"
ch_color_name(locale = "uk_UA", n = 3)
#> [1] "Синій Клейна" "Червоний"     "Бурштиновий"

More coming soon ...

generate a dataset

ch_generate()
#> # A tibble: 10 x 3
#>    name                job                              phone_number      
#>    <chr>               <chr>                            <chr>             
#>  1 Eryn Friesen        Education officer, museum        1-112-369-7012x566
#>  2 Katie Herman-Walter Financial trader                 005.938.2589x20920
#>  3 Vicente Mertz       Recruitment consultant           04743829816       
#>  4 Florentino Kiehn    Pharmacologist                   678.897.4173x91600
#>  5 Miss Janay O'Keefe  Production assistant, television 05168743452       
#>  6 Bo Torp             Health and safety inspector      375-455-1811x437  
#>  7 Ewart Wehner        Consulting civil engineer        (032)662-4042x708…
#>  8 Mrs. Brett Torphy   Engineer, land                   218.141.0453      
#>  9 Dr. Hezzie Crist    Electrical engineer              390-729-4668x99745
#> 10 Nasir Kirlin        Health promotion specialist      903-573-8975x42413
ch_generate('job', 'phone_number', n = 30)
#> # A tibble: 30 x 2
#>    job                                   phone_number       
#>    <chr>                                 <chr>              
#>  1 TEFL teacher                          270.623.5377       
#>  2 Engineer, structural                  646.072.1164       
#>  3 Surveyor, land/geomatics              02205572110        
#>  4 Optician, dispensing                  (661)832-6241x552  
#>  5 Education administrator               398.091.8589x283   
#>  6 Health promotion specialist           1-392-380-9439     
#>  7 Medical laboratory scientific officer 533-660-7324       
#>  8 Public relations account executive    505.267.2292x8952  
#>  9 Designer, television/film set         1-687-621-5404x4993
#> 10 Broadcast presenter                   284.869.0060       
#> # ... with 20 more rows

person name

ch_name()
#> [1] "Dr. Trace Paucek"
ch_name(10)
#>  [1] "Ms. Savannah Hickle" "Farrah Bayer"        "Roslyn Upton"       
#>  [4] "Pearley Wisozk"      "Miss Alzina Hane"    "Stefan Runte"       
#>  [7] "Dr. Shaylee Von"     "Richie Schowalter"   "Elvin Bailey"       
#> [10] "Marissa Gerlach"

phone number

ch_phone_number()
#> [1] "777.064.5901"
ch_phone_number(10)
#>  [1] "06888960467"        "(443)980-6761"      "516.753.6850x8631" 
#>  [4] "03547276672"        "870-472-2879x17047" "1-187-034-5312x104"
#>  [7] "(309)739-2098x608"  "631-587-8491"       "791.141.4061"      
#> [10] "(265)600-8703"

job

ch_job()
#> [1] "Higher education lecturer"
ch_job(10)
#>  [1] "Actuary"                          "Administrator, local government" 
#>  [3] "Race relations officer"           "Adult nurse"                     
#>  [5] "Optometrist"                      "Engineer, electronics"           
#>  [7] "Immigration officer"              "Cytogeneticist"                  
#>  [9] "Exercise physiologist"            "Education officer, environmental"

credit cards

ch_credit_card_provider()
#> [1] "Maestro"
ch_credit_card_provider(n = 4)
#> [1] "Voyager"          "VISA 16 digit"    "VISA 16 digit"   
#> [4] "American Express"
ch_credit_card_number()
#> [1] "3337390179998936975"
ch_credit_card_number(n = 10)
#>  [1] "4186144087035319"    "676356787920083"     "3428496399306990"   
#>  [4] "4422068536379628"    "4549679282912533"    "4366618045376"      
#>  [7] "4617373184622"       "6011518817655950980" "3088479828588119186"
#> [10] "4486361463875038"
ch_credit_card_security_code()
#> [1] "215"
ch_credit_card_security_code(10)
#>  [1] "826"  "612"  "531"  "257"  "2063" "034"  "082"  "399"  "099"  "981"

Usage in the wild

Contributors

Meta

  • Please report any issues or bugs.
  • License: MIT
  • Get citation information for charlatan in R doing citation(package = 'charlatan')
  • Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

ropensci_footer