New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
duplicate coercion to / copy data.table #149
Comments
Agree this needs to be more standardised and could cause issues if users take certain actions. As discussed at the moment Because the user exposed part of the package is so large and the costs of checking for every function are quite small I don't think we should differentiate between internal and external at least for now until the design settles down. |
Carl can you post your prototype here for discussion/visibility? |
sure. the goal to have something that:
so it's a little bit of a mix of type checking => object coercion. the idea would be to use this wherever a distinct copy is needed, with guaranteed elements, with some light transformation potentially required (e.g. what can be accomplished with a single vector argument). some additional thoughts worth exploring: optionally enabling non-copy behavior, column type checking.
|
the other new function here:
|
Any thoughts on waking this back up? I like the suggested approach with the caveat that it needs to be a little clear what exactly its doing (so a nice clear name etc). Perhaps the next stage is to make a draft PR? |
Still happy to handle this @pearsonca or shall I unassign and look for fresh blood? No problem either way. |
mmm, project for this weekend work timing wise? |
that would be really great. This is currently in |
Done in #239 |
For many of the functions that receive data, roughly the following snippet is repeated:
This (appropriately) ensures that internal workings of enw (e.g. addition / manipulation of columns) do not leak out into the user-space view of the data. That is: no side effects == good.
However: that snippet isn't universally applied (e.g.,
enw_design
) or desirable (e.g., when calling public facing functions internally after a coerce/copy has already been made).It probably should be applied universally (though I'm open to a user-beware approach, as long as that's declared) at the public boundary. For internal calls, the desirability question is mostly a performance consideration (e.g. basically defeats substantial part of data.table performance value to not pass by reference) - how much of an issue is it for what we think of as the large end of inputs to support?
I think there's a reasonable DRYing solution here for the boundary issue which also enables the performance solution (though isn't sufficient to accomplish that end). Something like:
...which then gets used at public-facing boundary functions:
To address a performance gap (if it exists), you'd want to convert all the public facing functions to thin wrappers ala
and then have the internal functions skip the
internalizer
step. That's some tedious copy pasta, but that could plausibly be automated as part of a build process (i.e., for every function declaration matchingenw_...internal
, create a function without the.internal
, all the same roxygen, add the @export tag)The text was updated successfully, but these errors were encountered: