-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
charlatan #94
Comments
Editor checks:
Editor comments
── GP charlatan ──────────────────────────────── It is good practice to ✖ write unit tests for all functions, and all package code in general. 37% of code
✖ use '<-' for assignment instead of '='. '<-' is the standard, and R users and
✖ avoid long code lines, it is bad for readability. Also, many people prefer
✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...)
✖ fix this R CMD check NOTE: Namespaces in Imports field not imported from: ‘R6’
|
Thanks @ganderson and @tjmahr for agreeing to review. |
Whoops, that's @geanders |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 8 Review CommentsThis package is a tool for generating fake data. Users can create fake data of a given type by using functions that start with Charlatan is modeled after Python's Faker library which in turn draws inspiration from PHP Faker, Ruby Faker and Perl Faker. Unfortunately, the name Faker has been taken on CRAN. (It is also worth noting that wakefield is another R package for fake-data. Wakefield provides data-frames of fake data, whereas this package mainly provides vectors of fake data.) Overall, this package is well designed and well structured. The code is cleanly written and formatted, and it takes advantage of R idioms. This clear, expressive style made reading the code a breeze. That said, I found few bugs and inconsistencies while reviewing it. This package uses the R6 object system to create classes of data-generators. (Presumably, using objects and methods made porting the code from Python rather straightforward.) The base class is the BaseProvider
library(charlatan)
bp <- BaseProvider$new()
bp
#> <BaseProvider>
#> Public:
#> bothify: function (text = "## ??")
#> check_locale: function (x)
#> clone: function (deep = FALSE)
#> lexify: function (text = "????")
#> numerify: function (text = "###")
#> random_digit: function ()
#> random_digit_not_null: function ()
#> random_digit_not_null_or_empty: function ()
#> random_digit_or_empty: function ()
#> random_element: function (x)
#> random_int: function (min = 0, max = 9999)
#> random_letter: function ()
bp$random_digit()
#> [1] 0
bp$numerify("I have ## friends")
#> [1] "I have 75 friends" This package makes extensive use of the set.seed(22)
bp$random_element(10)
#> [1] 4 Perhaps, The method str(args(bp$random_int))
#> function (min = 0, max = 9999) It can never generate the maximum value, and this fact isn't noted anywhere. set.seed(22)
range(replicate(bp$random_int(min = 0, max = 99), n = 100000))
#> [1] 0 98
I'm curious why the digit-or-blank generators don't just sample from a larger set. The code for bp$random_digit_not_null_or_empty
#> function() {
#> if (sample(0:1, size = 1) == 1) {
#> sample(1:9, size = 1)
#> } else {
#> ''
#> }
#> }
#> <environment: 0x0000000008c7dd58>
# Why not sample with this instead?
bp$random_element(c(1:9, ""))
#> [1] "9" NumericsProviderThis class generates random numbers. It demonstrates nice features of the R6-based design:
np <- NumericsProvider$new()
np
#> <NumericsProvider>
#> Inherits from: <BaseProvider>
#> Public:
#> beta: function (n = 1, shape1, shape2, ncp = 0)
#> bothify: function (text = "## ??")
#> check_locale: function (x)
#> clone: function (deep = FALSE)
#> double: function (n = 1, mean = 0, sd = 1)
#> integer: function (n = 1, min = 1, max = 1000)
#> lexify: function (text = "????")
#> lnorm: function (n = 1, mean = 0, sd = 1)
#> norm: function (n = 1, mean = 0, sd = 1)
#> numerify: function (text = "###")
#> random_digit: function ()
#> random_digit_not_null: function ()
#> random_digit_not_null_or_empty: function ()
#> random_digit_or_empty: function ()
#> random_element: function (x)
#> random_int: function (min = 0, max = 9999)
#> random_letter: function ()
#> unif: function (n = 1, min = 0, max = 9999) This class powers the various random number generators in the package. The functions ch_norm
#> function(n = 1, mean = 0, sd = 1) {
#> assert(n, c('integer', 'numeric'))
#> NumericsProvider$new()$norm(n, mean, sd)
#> }
#> <environment: namespace:charlatan> The package rightly calls upon R's built-in number generators ( args(runif)
#> function (n, min = 0, max = 1)
#> NULL
args(NumericsProvider$new()$unif)
#> function (n = 1, min = 0, max = 9999)
#> NULL Here I think the package should follow R's defaults and take advantage of what the user may already know about the The ch_integer(n = 10000, min = 1, max = 1000)
#> Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE' It is also odd that the PersonProviderThis class implements the package's impressive (and fun) random name generator. It can generator locale-specific names. It has one method: set.seed(100)
person <- PersonProvider$new()
person$render()
#> [1] "Devan Lubowitz"
person$render()
#> [1] " Konopelski"
person$render()
#> [1] "Dr. Gladis Little MD"
ch_name(4)
#> [1] "Stanley Gottlieb" "Ollie Ortiz MD" "Ms. Meaghan Lesch"
#> [4] "Jordy Rohan" As the example above shows, the names can vary in format. (Sometimes there is a blank first name --- this might be a bug.) This function works by randomly selecting a name format and populating the format with random names/affixes. # Formats
person$formats
#> [1] "{{first_name_female}} {{last_names}}"
#> [2] "{{first_names_female}} {{last_names}}"
#> [3] "{{first_names_female}} {{last_names}}"
#> [4] "{{first_names_female}} {{last_names}}"
#> [5] "{{first_names_female}} {{last_names}}"
#> [6] "{{prefixes_female}} {{first_names_female}} {{last_names}}"
#> [7] "{{first_names_female}} {{last_names}} {{suffixes_female}}"
#> [8] "{{prefixes_female}} {{first_names_female}} {{last_names}} {{suffixes_female}}"
#> [9] "{{first_names_male}} {{last_names}}"
#> [10] "{{first_names_male}} {{last_names}}"
#> [11] "{{first_names_male}} {{last_names}}"
#> [12] "{{first_names_male}} {{last_names}}"
#> [13] "{{first_names_male}} {{last_names}}"
#> [14] "{{prefixes_male}} {{first_names_male}} {{last_names}}"
#> [15] "{{first_names_male}} {{last_names}} {{suffixes_male}}"
#> [16] "{{prefixes_male}} {{first_names_male}} {{last_names}} {{suffixes_male}}"
# Possible slot fillers
str(person$person)
#> List of 8
#> $ first_names : chr [1:7203] "Aaden" "Aarav" "Aaron" "Ab" ...
#> $ first_names_male : chr [1:3271] "Aaden" "Aarav" "Aaron" "Ab" ...
#> $ first_names_female: chr [1:3932] "Aaliyah" "Abagail" "Abbey" "Abbie" ...
#> $ last_names : chr [1:473] "Abbott" "Abernathy" "Abshire" "Adams" ...
#> $ prefixes_female : chr [1:4] "Mrs." "Ms." "Miss" "Dr."
#> $ prefixes_male : chr [1:2] "Mr." "Dr."
#> $ suffixes_female : chr [1:4] "MD" "DDS" "PhD" "DVM"
#> $ suffixes_male : chr [1:11] "Jr." "Sr." "I" "II" ... The package's locale-specificity is just a matter of selecting the appropriate formats and slot fillers for a locale. This implementation is a clean design win. The locale-specific names are stored as variables in the package's namespace, and they are stored in R scripts. Thus, there are .R files that contain vectors with thousands of names. r_dir <- rprojroot::find_rstudio_root_file("R/")
list.files(r_dir, "person-provider-")
#> [1] "person-provider-bg_BG.R" "person-provider-cs_CZ.R"
#> [3] "person-provider-de_AT.R" "person-provider-de_DE.R"
#> [5] "person-provider-dk_DK.R" "person-provider-en_US.R"
#> [7] "person-provider-es_ES.R" "person-provider-fa_IR.R"
#> [9] "person-provider-fi_FI.R" "person-provider-fr_CH.R"
#> [11] "person-provider-fr_FR.R" "person-provider-hr_HR.R"
#> [13] "person-provider-it_IT.R" I wonder if using a The documentation for There is a bug in how double last names (as in Spanish) are generated. set.seed(103)
spanish <- PersonProvider$new(locale = "es_ES")
# Double last names are common in Spanish
spanish$formats
#> [1] "{{first_name_male}} {{last_name}} {{last_name}}"
#> [2] "{{first_name_male}} {{last_name}} {{last_name}}"
#> [3] "{{first_name_male}} {{last_name}} {{last_name}}"
#> [4] "{{first_name_male}} {{last_name}} {{last_name}}"
#> [5] "{{first_name_male}} {{last_name}} {{last_name}}"
#> [6] "{{first_name_male}} {{last_name}} {{last_name}}"
#> [7] "{{first_name_male}} {{last_name}}"
#> [8] "{{first_name_male}} {{prefix}} {{last_name}}"
#> [9] "{{first_name_male}} {{last_name}}-{{last_name}}"
#> [10] "{{first_name_male}} {{first_name_male}} {{last_name}} {{last_name}}"
#> [11] "{{first_name_female}} {{last_name}} {{last_name}}"
#> [12] "{{first_name_female}} {{last_name}} {{last_name}}"
#> [13] "{{first_name_female}} {{last_name}} {{last_name}}"
#> [14] "{{first_name_female}} {{last_name}} {{last_name}}"
#> [15] "{{first_name_female}} {{last_name}} {{last_name}}"
#> [16] "{{first_name_female}} {{last_name}} {{last_name}}"
#> [17] "{{first_name_female}} {{last_name}}"
#> [18] "{{first_name_female}} {{prefix}} {{last_name}}"
#> [19] "{{first_name_female}} {{last_name}}-{{last_name}}"
#> [20] "{{first_name_female}} {{first_name_female}} {{last_name}} {{last_name}}"
# The two last names should be different
spanish$render()
#> [1] "Jose Antonio Lloret Lloret"
spanish$render()
#> [1] "Javier Gras Gras" This behavior is a problem with pluck_names <- charlatan:::pluck_names
fmt <- "{{last_name}} {{last_name}} {{last_name}} {{last_name}}"
dat <- lapply(
spanish$person[pluck_names(fmt, spanish$person)],
sample,
size = 1)
str(dat)
#> List of 4
#> $ last_name: chr "Alegre"
#> $ last_name: chr "Delgado"
#> $ last_name: chr "Heras"
#> $ last_name: chr "Cordero"
whisker::whisker.render(fmt, data = dat)
#> [1] "Alegre Alegre Alegre Alegre" ColorProviderThis class generates colors. Its hex-color generator users the cp <- ColorProvider$new()
cp$color_name()
#> [1] "PowderBlue"
cp$safe_color_name()
#> [1] "fuchsia"
# Some locale sensitivity for color names, but not safe ones
cp2 <- ColorProvider$new(locale = "uk_UA")
cp2$color_name()
#> [1] "<U+0424><U+0456><U+043E><U+043B><U+0435><U+0442><U+043E><U+0432><U+0438><U+0439>"
cp2$safe_color_name()
#> [1] "maroon"
set.seed(26)
cp$hex_color()
#> [1] "#43f66" It pads zeros onto strings less than 6 characters to create the familiar six- digit format. But padding is done on the right side, so it will never generate "#0000ff". It also counts character length after appending the the pound sign, so the above example shows an incorrect hex color.
rgb(0, 0, 0, maxColorValue = 255)
#> [1] "#000000"
rgb(255, 255, 255, maxColorValue = 255)
#> [1] "#FFFFFF"
sample_col <- function() sample(0:255, 1)
rgb(sample_col(), sample_col(), sample_col(), maxColorValue = 255)
#> [1] "#4ADFCC" A similar strategy underlies set.seed(26)
cp$safe_hex_color()
#> [1] "#4400#4400" For this method I think generating three separate hex digits and duplicating them and concatenating them would be a cleaner implementation. I also find conflicting information about which colors are "safe" --- is this method's definition of safe colors standard?
set.seed(133)
colors <- replicate(n = 1000, cp$rgb_color())
#> Error: precBits >= 2 are not all TRUE R has its own family of accepted color names, so those might a natural extension point. sample(colors(), 3)
#> [1] "springgreen" "gray40" "gray32" CoordinateProviderCoordinateProvider does what it says. It powers cp <- CoordinateProvider$new()
CoordinateProvider
#> <CoordinateProvider> object generator
#> Public:
#> lon: function ()
#> lat: function ()
#> position: function (bbox = NULL)
#> clone: function (deep = FALSE)
#> Private:
#> rnd: function ()
#> coord_in_bbbox: function (bbox)
#> Parent env: <environment: namespace:charlatan>
#> Locked objects: TRUE
#> Locked class: FALSE
#> Portable: TRUE
cp$lat()
#> [1] -21.11343
cp$lon()
#> [1] 123.7104
cp$position()
#> [1] 72.99215 12.02990 It can generate coordinates within a boundary box. The box is not checked, so users can get invalid coordinates as a result. # Specify a bad box
cp$position(c(-12000, 0, 0, 30000))
#> [1] -9588.538 16289.230 CreditCardProviderThis class looks good but there is some commented out Python/R code in place. It looks like the class produces legitimate credit card numbers (i.e., they pass an appropriate checksum that real credit card numbers have). This detail highlights an important use case for this package: Generating fake data for testing code, as one could test some credit-card-validating function with this package. DateTimeProviderThis class powers DateTimeProvider$new()$unix_time()
#> [1] 530920243
DateTimeProvider$new()$date_time()
#> [1] "2016-07-26 14:40:45 CDT" There is a typo in body(DateTimeProvider$new()$century)
#> {
#> super$random_element(self$cenuries)
#> } TaxonomyProviderThis class randomly samples from prepackaged genus/species names. The basis for the random genus/species names is well detailed in the hidden set.seed(22)
TaxonomyProvider$new()$genus()
#> [1] "Dialium"
TaxonomyProvider$new()$epithet()
#> [1] "kovacevii"
# A species just genus() + epithet()
set.seed(22)
TaxonomyProvider$new()$species()
#> [1] "Dialium kovacevii" Do-one-thing generatorsCurrencyProviderThis class randomly samples a vector of currency abbreviations. DOIProviderThis class's main job is to randomly select a DOI format and populate the format with characters/integers. This class does not use its inherited JobProviderThe class produces occupation titles using the same locale-specificity covered earlier. It just randomly samples occupations from a vector with each locale's occupation names. set.seed(36)
JobProvider$new(locale = "fr_FR")$render()
#> [1] "Intégrateur web"
ch_job(n = 3)
#> [1] "Pharmacist, community" "Restaurant manager, fast food"
#> [3] "Designer, textile" PhoneNumberProviderThis class randomly selects a format and then populates that format with digits. head(PhoneNumberProvider$new()$formats, 3)
#> [1] "+##(#)##########" "+##(#)##########" "0##########"
PhoneNumberProvider$new()$render()
#> [1] "1-094-032-6500x4515"
ch_phone_number(4)
#> [1] "333-862-9754" "(681)356-2399x715" "1-612-259-7099x07938"
#> [4] "720.912.3811" SequenceProviderThis class generates gene sequences by sampling letters and concatenating them. I feel the user-facing function should be named FraudsterClientThis class wraps all the y <- fraudster(locale = "fr_FR")
y
#> <fraudster>
#> locale: fr_FR
y$job()
#> [1] "Aquaculteur"
y$color_name()
#> Error in sample.int(length(x), size, replace, prob): invalid first argument
y$name()
#> [1] "Teddie McCullough III" Unfinished providersI didn't closely review the code for these classes, as they appear to be unfinished and are not user-facing like the other ones. I am documenting them here for completeness. The package contains exported, in-progress code for generating addresses with It also contains the # I think the same whisker::render() bug is happening here
set.seed(27)
company_provider()$company()
#> [1] "Boehm, Boehm and Boehm"
company_provider()$company()
#> [1] "Hansen, Hansen and Hansen"
company_provider()$company()
#> [1] "Jacobi Inc,and Sons,LLC,Group,PLC,Ltd"
company_provider()$bs()
#> [1] "optimize clicks-and-mortar platforms"
company_provider()$bs()
#> [1] "reinvent extensible applications"
company_provider()$catch_phrase()
#> [1] "Object-based optimal structure"
company_provider()$catch_phrase()
#> [1] "Seamless zero-defect archive"
set.seed(10)
MissingDataProvider$new()$make_missing(letters)
#> [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA "o" "p" "q"
#> [18] "r" "s" "t" "u" "v" "w" "x" "y" "z" I feel like it should randomly determine n,
|
Thanks for the excellent and thorough review, @tjmahr! |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 4 Review CommentsThis package allows a user to create synthetic data of names, addresses, jobs, phone numbers, credit card numbers, taxonomies, etc., in several different languages. The package offers two interfaces, either through There are cases where unreasonable values will be generated by the functions in this package (e.g., phone numbers with illegal area codes, weird names for people, etc.), which is to some extent unavoidable when generating synthetic data. Depending on the intended applications of this package, this may or may not be important. It would be useful to get an idea of more of the intended uses of this package to help assess if any of these unreasonable values (some of which I point out in comments below) would be likely to introduce problems when using this package. The package name is great (and memorable). Coding style is excellent throughout. The vignette rendered without error locally and all examples also ran without error. Functions
Vignette
Helpfiles
|
@geanders thanks so much for the review! will respond soon |
response to @tjmahr - thx much for your review! BaseProvider
see ropensci/charlatan#12 - looking into it
see ropensci/charlatan#13 - good catch
see ropensci/charlatan#14 - good catch, will fix NumericsProvider
This is b/c i was following what
see above comment PersonProvider
see ropensci/charlatan#15 looking into it
I've thought about storing as data - as binary I don't like as the data may need to be updated, changed - so possibly
right, a symptom of submitting this for review early in development :) will fix
see ropensci/charlatan#16 thanks, fixing ColorProvider
will fix see ropensci/charlatan#18
thanks, will look into it, see ropensci/charlatan#17
thanks, for all those see ropensci/charlatan#19 CoordinateProvider
CreditCardProvider
Right, just not done yet DateTimeProvider
will fix that, see ropensci/charlatan#21
Right, just not done yet
will fix, see ropensci/charlatan#22 TaxonomyProvider
will do, see ropensci/charlatan#23 Do-one-thing generatorsDOIProvider
good idea, see ropensci/charlatan#24 SequenceProvider
probably, see ropensci/charlatan#25 FraudsterClient
right, again, a symptom of submitting early in dev for review, will try to fail better on that ropensci/charlatan#26
good catch, see ropensci/charlatan#26 Unfinished providers
right, again symptom of submitting for review earlyish
will fix that to make sure same format as others, see ropensci/charlatan#27
will add that, see ropensci/charlatan#28
|
response to @geanders - thx much for your review!
@geanders What do you mean by illegal? e.g., with area codes, do you mean it generates an area code that doesn't exist? the rules in the pkg for PhoneNumberProvider are only for the formats the string takes, not what combinations are in the real set. I don't think our goal here is to create real phone numbers or real anything else. I think that would in fact not be a good idea. However, cases where "illegal" values are created with respect to not following a specific format then we'd want to fix that.
I don't have any specific applications in mind. I think the pkg could be used in A LOT of different use cases: generating datasets in another pkg, datasets for teaching, pkg test suites, generating datasets with a certain structure for a very specific use case, etc. I'll add more to the docs about what possible use cases are ropensci/charlatan#32 Functions
sorry, that should be
sure, removed (commented out eg, to show what happens when there's an invalid variable input
sounds good to add it in there, see ropensci/charlatan#33
Right, will fix, see ropensci/charlatan#34
I'll have a look at this, but i'm only following what the Python library did, see https://github.com/joke2k/faker/blob/master/faker/providers/person/fr_FR/__init__.py I'll see why they don't have an accents in last names - see ropensci/charlatan#35
@geanders What are examples of different titles for female vs. male?
yep, just haven't gotten to it yet, as you can see at https://github.com/joke2k/faker/tree/master/faker/providers there's a lot in there I haven't gotten to yet in this pkg, see ropensci/charlatan#36
these were started from this discussion ropensci/charlatan#11 - yes, they aren't adding much value, but i think there's more to do with them
right, datetime provider is very incomplete - there's many methods not added yet - including date times between a range of dates, see https://github.com/joke2k/faker/blob/master/faker/providers/date_time/__init__.py#L362 - so it will be added in time
i will add Vignette
Thanks for the feedback on this, will improve it and finish off ropensci/charlatan#37
Good points. Will add to docs an overview on this. Much of the decisions were made in the python client, but I can see if they have any info on that.
I imagine b/c the maintainer is from europe https://github.com/joke2k (though not GB)
thx, fixed.
issued opened, ropensci/charlatan#38
Good idea. ropensci/charlatan#39 maybe a global toggle (for certain variables where appropriate) would be good b/c don't want messy on by default Helpfiles
will do, i'm a vary baad spellller
right, fixed.
fixed, thanks |
update man files for new roxygen2 version, require roxygen2 (>= 6.0.1) bump dev version
Am I allowed to elaborate comments in a review? Anyway, this implementation of charlatan::ch_integer(100, min = 1, max = 10)
#> Error in sample.int(length(x), size, replace, prob) :
#> cannot take a sample larger than the population when 'replace = FALSE' This makes it behave in unexpected ways compared to its sibling functions. In this way, it's not really a random integer generator, although it's lumped in with those other number generators, but a function that shuffles a pile of numbers and draws from that shuffled pile. |
@tjmahr Certainly! We encourage ongoing conversations throughout the review process. |
For this, I mean certain number combinations that you'd never see (like an area code of 000). I agree that a package shouldn't create "real" phone numbers, but clearly the package is trying to create things that look realistic-- otherwise the package could just generate random character strings, with locale-specific encoding, for the names and job titles, right? Do you have links to any examples of the Python version in use? (or I could not be lazy and just google that myself.) I think my main question is how realistic everything generated needs to be to be useful in anticipated applications. If it doesn't need to be very realistic, just recognizable as a try at names, job titles, etc., I think these little things like occasionally weird names are fine. If they need to be plausibly realistic most of the time, it may be worth the effort to try to catch a few more of these issues.
accompagnateur / accompagnatrice, acheteur / acheteuse, administrateur / administratrice, etc. Well over half of the jobs listed have different titles for masculine / feminine. Again, this question would come down to how realistic output data needs to look, which will depend on anticipated uses of the package. As with the comment on phone numbers, perhaps this is not a problem if the data just needs to be recognizable as a dataset of names, jobs, etc., rather than one that could pass as a real dataset. Otherwise, it sounds like you've got good plans that respond to the rest of my comments. |
thx for your comment. maybe this is a better fit for the numerics set of fxns https://github.com/ropenscilabs/charlatan/blob/master/R/base-provider.R#L74-L76 |
thanks for your comments. Opened an issue ropensci/charlatan#40 to keep this in mind and discuss about it with community, etc. The short answer is I don't know the answer right now. I sort of want to keep in line with what the Python library is doing so there's a large amount of similarity in what the two libs do. But if there's good reason to deviate and most people want that we can change - or if we can have a toggle to switch on or off more realistic values thanks for the examples for female/male job titles. I'll have a look at that and fix - i don't want these to be just male, i just wasn't familiar enough with french to know the difference |
Hi @sckott, just checking in on the status of this package. Are you waiting for any answers to your questions to reviewers? |
no, not waiting on answers. have addressed many issues so far, but a few more to go (label: review) https://github.com/ropenscilabs/charlatan/issues?q=is%3Aissue+label%3Areview+is%3Aopen |
@tjmahr @geanders Okay, sorry for the long delay. I've now dealt with nearly all of your points - see issues with I've moved a few of your issues/or parts of your issues to the next milestone https://github.com/ropenscilabs/charlatan/milestone/2 - let me know if you think they're important enough we need to deal with them before going to CRAN. and @noamross let me know if you see anything I should fix/make better |
I am OK with the items moved to the 0.2 milestone if the reviewers are. I note that test coverage is still modest, including in some areas (colors), where your reviewers identified bugs or inconsistencies. Can you bring that up? @tjmahr and @geanders, please take a look at the updated version and let us know if @sckott has addressed your comments to your satisfaction. |
This has satisfactorily addressed my comments, and I'm ok with the items mover to the 0.2 milestones, as well. |
Will roll out more tests |
Okay, I've added more tests - now up to 91% coverage (yes, caveat that that doesn't necessarily mean every import. thing is tested) https://github.com/ropenscilabs/charlatan#charlatan anything else? @noamross |
I read over the closed issues and the code changes. I did not give the package the intense kind of code review and interactive testing as I did for the first review, but I made two pull requests (ropensci/charlatan#43 ropensci/charlatan#47) where the code didn't look quite right. I think the package is ready for users, but the maintainers should be ready to make quick fixes and supporting unit tests if a generator demonstrates some systematic error. |
thanks for the PR @tjmahr ready to make quick fixes indeed! |
Thank you for your responses, @geanders and @tjmahr! @sckott: This looks good as soon as you finish addressing ropensci/charlatan#47 . |
@noamross okay, we're all set on that PR |
all good @noamross ? |
thanks everyone for your work that helped to improve the package! |
@noamross draft post ropensci/roweb#310 |
Summary
charlatan
makes fake data. It borrows data and modifies code from Python's faker library.charlatan
adds more fake data types, in particular to target core audience of scientists.locale support - done for a few data types; more work to be done on others. Most complete locale support in
JobProvider
/ch_job
andPersonProvider
/name
.bottom of
Collate
cut off for brevity.https://github.com/ropenscilabs/charlatan
Students in a classroom learning R, or e.g, in a software carpentry type bootcamp - since it's liteweight wrt dependencies, should be easy install everywhere. Also, if you're doing modeling/simulations, perhaps you could use this to generate some fake data to run models on.
not that i know of
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
with a high-level description in the package root or ininst/
.Detail
Does
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
Note: This package isn't quite complete in the sense that there's more types of data to add to the package, but the overall package API is set (but can be changed if needed), and tests are there (though more needed) and vignette exists - Anyway, submitting earlyish to get reviewer feedback
The text was updated successfully, but these errors were encountered: