Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternatives for handling NA inputs #225

Merged
merged 10 commits into from Mar 26, 2020
Merged

Alternatives for handling NA inputs #225

merged 10 commits into from Mar 26, 2020

Conversation

stitam
Copy link
Contributor

@stitam stitam commented Mar 19, 2020

This PR is meant to contribute to the discussion on issue #224.

The interesing part is in chebi.R. The issue is that if the input is invalid, e.g. "balloon" (unknown chemical) or NA, the structure of the output will be different, which makes vectorisation difficult. As a reprex, chebi_comp_entity(c("CHEBI:89301","CHEBI:89302","balloon",NA)) will return a nicely formatted list for the first two entries and something else for the last two. This makes sapply(test, function(x) x[["properties"]]$inchikey) return NULL-s, and unable to coerse the output into a character vector.

My approach for handling invalid inputs is to return an empty object with the same structure but all NA-s. I tried two alternatives: in get_chebiid() I defined the structure twice within the function. In chebi_comp_entity() this would have been distrubingly complex, so I created a separate, non exported function, which generates an object with the correct structure, but all NA-s inside, and called this function in lines 226 and 251. Forcing to return an empty object for invalid inputs allowed sapply() to extract inchikeys and coerce the output to a character vector.

I think the second approach is more elegant and less error prone. What do you think about the these empty objects? Can you suggest other solutions?

PR task list:

  • Update NEWS
  • Add tests (if appropriate)
  • Update documentation with devtools::document()
  • Check package passed

@codecov-io
Copy link

codecov-io commented Mar 19, 2020

Codecov Report

Merging #225 into master will not change coverage by %.
The diff coverage is 0.00%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master    #225   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files          19      19           
  Lines        1587    1675   +88     
======================================
- Misses       1587    1675   +88     
Impacted Files Coverage Δ
R/alanwood.R 0.00% <0.00%> (ø)
R/chebi.R 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d02b3bc...04afc4f. Read the comment docs.

R/alanwood.R Show resolved Hide resolved
Copy link
Collaborator

@Aariq Aariq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by match == "na". Maybe you could clarify what that "does" or see if my suggestions allow you to get rid of that code entirely.

R/chebi.R Show resolved Hide resolved
R/chebi.R Show resolved Hide resolved
R/chebi.R Outdated Show resolved Hide resolved
R/chebi.R Outdated Show resolved Hide resolved
R/chebi.R Outdated Show resolved Hide resolved
@stitam stitam mentioned this pull request Mar 25, 2020
@stitam stitam merged commit 5170f7d into ropensci:master Mar 26, 2020
@stitam stitam deleted the nas branch March 26, 2020 09:26
@stitam stitam restored the nas branch March 26, 2020 09:27
@stitam stitam mentioned this pull request Mar 26, 2020
4 tasks
@stitam stitam added this to the RC2019F milestone Sep 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants