-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GAMBLR hotfixes and improvements. #142
Conversation
…ted get_gambl_colours on rmorin branch
R/database.R
Outdated
#' @param qend End coordinate of the range you are restricting to. Required parameter if region is not specified. | ||
#' @param projection Selected genome projection for returned Cn segments. | ||
#' @param this_seq_type Seq type for returned Cn segments. Currently, only genome is supported. Capture samples will be added once processed through CNV protocols. | ||
#' @param remove_chr_prefix Prepend all chromosome names with chr (required by some downstream analyses). Default is FALSE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the description should be not "Prepend ...", but rather "Remove ..."? Otherwise does not sound to match the parameter name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was confused about that too. with_chr_prefix seemed reasonable to me because it means the user wants the result to have chr prefix there. Prepend/remove implies an action will be done. The "with" naming implies it will be done, when necessary, if requested (if that makes any sense)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are good points, I will revert the name change here. However, I do still think the old name is somewhat misleading. This is because this parameter only has one condition in the code, if set to FALSE (default) chr prefixes will be removed. At least to me, this implies that if set to TRUE, chr prefixes should be available. But that's not how the function operates. I can easily update this so that if set to TRUE, "chr" will be added (if not already there, i.e if returned segments are in respect to hg38). Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that makes sense. Perhaps this functionality just wasn't added when this function was updated to support hg38?
R/database.R
Outdated
#' segments_region_grch37 = get_cn_segments(chromosome = "chr8", | ||
#' qstart = 128723128, | ||
#' qend = 128774067, | ||
#' projection = "grch37", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Projection and genome here are same as default, so maybe let's drop them from example to show the "minimum usage"? The next example then shows how to modify different projection and I think the flatfiles are not available for capture anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Examples should make use of defaults implicitly and only show parameters that are forcing the function to run in non-default mode. This makes it easier to see the simplest way to use the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I'll go over the examples and ensure parameters called with default values are not shown.
} | ||
|
||
#get wildcards from this_seq_type (lazy) | ||
seq_type = this_seq_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think this might not be needed here to keep the function simple, and this_seq_type
can be used as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a variable with the name that matches the wildcard is necessary when we use glue. Don't change this unless we're certain glue isn't being used later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Precisely, that is indeed why that variable was created. I will not implement the edit suggested by Kostia.
|
||
#deal with chr prefixes for region, based on selected genome projection. | ||
if(projection == "grch37"){ | ||
if(grepl("chr", chromosome)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this ifelse is not needed. The gsub
will not complain when the "chr" is not in the string. Instead, the user here will get warning that only first element is used since the chromosome
is a vector. I think this part can be reworked to something like
always gsub chr substring
if(projection == "hg38"){
add the chr prefix back using paste0
}
This way, there are no ifelse checks so the code is simpler, prefix is properly handled, and user will get no warnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The else part is needed because it's prepending chr if (and only if ) it's missing though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I meant that it is only checking the first occurrence. Here it is str_detect but grepl I think works the same way. So If there will be a mix of prefixed and non-prefixed chromosome names, the way it is currently implemented might not always deal with this properly
Warning message:
In if (!str_detect(genes$chromosome, "chr")) { :
the condition has length > 1 and only the first element will be used
R/database.R
Outdated
#This isn't yet standardized in the db so it's just a workaround "for now". | ||
|
||
#sanity check type of qstart en qend | ||
if(!is.numeric(qstart)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ifelse checks can be resource consuming on large data frames with lots of rows. Here is a different intention but it shows that the checks are very slow. Instead, nothing bad will happen if we just always mutate the qstart and qend to numeric regardless of what they initially are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is a good point Kostia. Thanks!
R/viz.R
Outdated
@@ -2041,7 +2049,9 @@ ashm_multi_rainbow_plot = function(regions_bed, | |||
p = muts_anno %>% | |||
ggplot() + | |||
geom_point(aes(x = start, y = sample_id, colour = classification), alpha = 0.4, size = 0.6) + | |||
theme(axis.text.y = element_blank()) + | |||
labs(title = "", subtitle = "", x = "", y = "Sample") + | |||
theme_cowplot() + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we switch to theme_Morons()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed we can!
@@ -2041,7 +2049,9 @@ ashm_multi_rainbow_plot = function(regions_bed, | |||
p = muts_anno %>% | |||
ggplot() + | |||
geom_point(aes(x = start, y = sample_id, colour = classification), alpha = 0.4, size = 0.6) + | |||
theme(axis.text.y = element_blank()) + | |||
labs(title = "", subtitle = "", x = "", y = "Sample") + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am more of a fan of specifying the empty ones in the theme()
and setting to element_blank()
. The reason is that when we set it to nothing, it is still processed as "invisible string" and the space for that `` string is allocated on the plot, creating unnecessary white space. Instead, when set through the theme, the title is removed completely and the plotting area has minimum of white space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I follow and agree that this is probably the correct way of doing this. Thanks for the comment.
R/viz.R
Outdated
geom_text(data = bed, aes(x = midpoint, y = height, label = name), size = 2.5, angle = 90) + | ||
guides(color = guide_legend(reverse = TRUE, override.aes = list(size = 3))) | ||
} | ||
|
||
p = p + | ||
labs(title = "", subtitle = "", x = "", y = "Sample") + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments as above for the multirainbow plot
@@ -3437,7 +3462,7 @@ fancy_v_count = function(this_sample, | |||
} | |||
|
|||
#add chr prefix if missing | |||
if(!str_detect(maf$Chromosome, "chr")){ | |||
if(!str_detect(maf$Chromosome, "chr")[1]){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same logic can be applied here for the chr handling
@@ -3694,7 +3719,7 @@ fancy_v_sizedis = function(this_sample, | |||
} | |||
|
|||
#add chr prefix if missing | |||
if(!str_detect(maf$Chromosome, "chr")){ | |||
if(!str_detect(maf$Chromosome, "chr")[1]){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Adam! I think all the comments were addressed! I have also tested in on the GSC and I think it is ready to go!
This PR includes a variety of hotfixes and updates to existing functions. Including;
assign_cn_to_ssm
now takesthis_seq_type
parameter.get_cn_segments
now has support for flat files.prettyOncoplot
can now deal with silent mutations.test_functions.R
.readr
.fancy_v_count
so that it correctly callsassign_cn_to_ssm
.get_gene_expression
update).fancy_x_plot
functions that threw unnecessary writing messages.ashm_rainbow_plot
(fixing fill, margins, axis titles, ticks, etc.).For more info, please see the commit messages below.
This branch was recently merged with the current master (22/11). Many merge conflicts were flagged, and even though the majority of these conflicts could be resolved automatically, a handful had to be resolved manually.
I have also tested this branch using the examples available in
test_functions.R
. The functions listed in this script runs without errors and are producing the expected outputs. With that being said, I would highly recommend at least one additional tester to also confirm this given the number of merge conflicts and updates to commonly used GAMBLR functions, and perhaps other examples that are not (yet) represented in test_functions.RAfter this PR has been approved, I will close the corresponding issues on this repo.
If this PR should still be active upon my return from Sweden, I have some additional improvements that I am currently working on. But these could also very well be on a future PR.
Let me know if there are any questions.
Thanks,