New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Envest/calculate and plot delta kappa #72
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We chatted about this yesterday and this looks good overall and consistent with that convo. I had a couple comments on potential improvements, but nothing that I feel I must re-review.
3-plot_category_kappa.R
Outdated
left_join(null_array.list[[pair_index]], | ||
by = c("perc.seq", "classifier", "norm.method")) %>% | ||
mutate(delta_kappa = kappa.x - kappa.y) %>% # regular kappa - null kappa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use suffix
here so that this can be a little more explicit:
left_join(null_array.list[[pair_index]], | |
by = c("perc.seq", "classifier", "norm.method")) %>% | |
mutate(delta_kappa = kappa.x - kappa.y) %>% # regular kappa - null kappa | |
left_join(null_array.list[[pair_index]], | |
by = c("perc.seq", "classifier", "norm.method"), | |
suffix = c(".true", ".null")) %>% | |
mutate(delta_kappa = kappa.true - kappa.null) %>% # regular kappa - null kappa |
If you make this change here, don't forget to apply it to the seq step below too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh I like that!
# check that we have ordered pairs of regular and null files for array and seq | ||
array_seeds <- stringr::str_sub(array.files, -8, -5) | ||
null_array_seeds <- stringr::str_sub(null_array.files, -8, -5) | ||
seq_seeds <- stringr::str_sub(seq.files, -8, -5) | ||
null_seq_seeds <- stringr::str_sub(null_seq.files, -8, -5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... my instinct here is to suggest that we use other strings that we expect the seeds to be between to detect the seeds, but that seems potentially just as brittle to file name changes as the indexing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Not sure what the best solution is. One mitigation of this is we have previously forced the seed to be a four digit number, so as long as it's the last thing to come before .tsv
, it will be captured here.
Brings #51 to a point where we can run all the scripts in full... fun weekend!
This PR modifies the existing
3-plot_category_kappa.R
to allow for delta kappas to be calculated, then uses the same plotting code. The--null_model
option signals to the script to look for both the regular model and null model and take the difference in their kappa values (regular kappa minus null kappa). Throughout the script, a FALSE value fornull_model
results in the same processing and output as before, and a TRUE value fornull_model
adds in additional steps viaif()
statement blocks.:delta::kappa: Thanks!