New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
class2tree replacement #634
Conversation
replace class2tree.R
fix the classification data type
Subsequent to the initial commit we added
We don't fully understand why the other tests fail for us though. |
thanks @gedankenstuecke and @trvinh - having a look travis failure should just be due to that Travis doesn't share environment variables on pull request builds - so no worries, do check that it passes locally for you though and i'll check as well |
opened issue for tests #635 basically, in R, you can do e.g., in the taxize directory, start R, then library(devtools)
library(testthat)
load_all()
# to test an individual test file
test_file("tests/testthat/test-class2tree.R")
# to run all tests
devtools::test() may want to run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this! made some specific comments inline, a few more general ones:
- please make width of both code and comments be 80 characters - i am not yet consistent in this pkg, but trying to get there
- since a lot of code is being added, i imagine there's more edge cases? do add a few more tests to test suite
R/class2tree.R
Outdated
dat <- rbind.fill(lapply(input, class2tree_helper)) | ||
df <- dat[ , !apply(dat, 2, function(x) any(is.na(x))) ] | ||
if (!inherits(df, "data.frame")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine to remove this, but is there now no chance of resulting in no ranks in common?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uhm, I don't think so. Since we are taking into account even the root rank, so 2 taxa will always have at least one rank in common. Therefore I think checking for !inherits(df, "data.frame")
not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks
R/class2tree.R
Outdated
colnames(mTaxonDf) <- c(taxonName[1],"rank") | ||
|
||
### merge with index2RankDf (Df contains all available ranks from input data) | ||
fullRankIDdf <- merge(fullRankIDdf,mTaxonDf, by=c("rank"), all.x = T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please always use TRUE
and FALSE
throughout instead of T
and F
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check and make all the changes that you requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx
R/class2tree.R
Outdated
joinedDf <- within(joinedDf, rankDf[rankDf=='no rank'] <- paste0("norank_",idDf[rankDf=='no rank'])) | ||
|
||
df <- data.frame(t(data.frame(rev(joinedDf$rankDf))), stringsAsFactors = FALSE) | ||
outDf <- data.frame(tip = x[nrow(x), "name"], df, stringsAsFactors = FALSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this fxn and the next below don't explicitly return any output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sckott think you don't mean that they should use return()
statement, but just have implicit return, ie., last line
outDf
Thanks for the hints on how to run single files. I looked desperately for that earlier today and couldn't find it. The I got some errors/warnings when running the whole of it
Also there were some tests that waited for user-input somehow when I ran your commands, I just proceeded by pressing From the logs:
|
@gedankenstuecke see additional comment above about not cran #634 (comment) |
tests do currently take a long time to run, so you may only want to deal with the tests for this fxn. (eventually we will mock HTTP requests and it will be much better, but not here yet) yes, many tests go through a |
Ah, i ran the tests before the edit was made about |
I started addressing @sckott's comments and made sure we consistently use |
Okay, I think it should now address all of @sckott's comments π
|
Fixes following things: * removes CamelCase for snake_thing * renames variables to remove confusing `list` references that describe data frames * fixes output of `class2tree$classification` to be the expected one.
Before we gave out the internal, wrongly labelled levels. Now we have the right labels and return NA for missing data as expected.
We had overlooked some details on how the classification is returned from |
all looks good, thanks much @gedankenstuecke and @trvinh ! |
thanks so much, @sckott, you made my day :) And also thanks to @gedankenstuecke, we did it π |
Indeed, great first contribution @trvinh ! (& thx of course to you too @gedankenstuecke ) |
As discussed in #611, this update will use the full NCBI Taxonomy information to create the hierarchical clustering.
Description
The old
class2tree
function did not takeunnamed
ranks into account to cluster the species. This led to the trees being unresolved for many splits as the named taxonomy levels were shared between them. The newclass2tree
function makes full use of the NCBI Taxonomy string, including theunnamed
ranks, leading to higher resolution trees that have less multifurcations.After many months of discussion @trvinh and I (mostly him!) have figured out how to do this in R instead of Python! π
Related Issue
#611
Example
A full example can be found in this gist
In short:
will now yield
As we just updated an existing function (and β to be honest β as we so far couldn't find out how to run the tests, some update on the docs on that would be appreciated) we didn't include new tests.