Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GAMBLR hotfixes and improvements. #142

Merged
merged 32 commits into from
Dec 9, 2022
Merged

GAMBLR hotfixes and improvements. #142

merged 32 commits into from
Dec 9, 2022

Conversation

mattssca
Copy link
Contributor

This PR includes a variety of hotfixes and updates to existing functions. Including;

  • assign_cn_to_ssm now takes this_seq_type parameter.
  • get_cn_segments now has support for flat files.
  • prettyOncoplot can now deal with silent mutations.
  • More examples added to test_functions.R.
  • Improved instructions on how to run remote (README).
  • Supressing annoying warning messages from readr.
  • Updates to fancy_v_count so that it correctly calls assign_cn_to_ssm.
  • Updates to the config (redundant line from get_gene_expression update).
  • Hotfix for fancy_x_plot functions that threw unnecessary writing messages.
  • Adding SSM thematic vignette.
  • Improvements to ashm_rainbow_plot (fixing fill, margins, axis titles, ticks, etc.).
  • Added new thematic vignette (CN/SV), in progress...

For more info, please see the commit messages below.

This branch was recently merged with the current master (22/11). Many merge conflicts were flagged, and even though the majority of these conflicts could be resolved automatically, a handful had to be resolved manually.

I have also tested this branch using the examples available in test_functions.R. The functions listed in this script runs without errors and are producing the expected outputs. With that being said, I would highly recommend at least one additional tester to also confirm this given the number of merge conflicts and updates to commonly used GAMBLR functions, and perhaps other examples that are not (yet) represented in test_functions.R

After this PR has been approved, I will close the corresponding issues on this repo.

If this PR should still be active upon my return from Sweden, I have some additional improvements that I am currently working on. But these could also very well be on a future PR.

Let me know if there are any questions.

Thanks,

R/database.R Outdated
#' @param qend End coordinate of the range you are restricting to. Required parameter if region is not specified.
#' @param projection Selected genome projection for returned Cn segments.
#' @param this_seq_type Seq type for returned Cn segments. Currently, only genome is supported. Capture samples will be added once processed through CNV protocols.
#' @param remove_chr_prefix Prepend all chromosome names with chr (required by some downstream analyses). Default is FALSE.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the description should be not "Prepend ...", but rather "Remove ..."? Otherwise does not sound to match the parameter name

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was confused about that too. with_chr_prefix seemed reasonable to me because it means the user wants the result to have chr prefix there. Prepend/remove implies an action will be done. The "with" naming implies it will be done, when necessary, if requested (if that makes any sense)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are good points, I will revert the name change here. However, I do still think the old name is somewhat misleading. This is because this parameter only has one condition in the code, if set to FALSE (default) chr prefixes will be removed. At least to me, this implies that if set to TRUE, chr prefixes should be available. But that's not how the function operates. I can easily update this so that if set to TRUE, "chr" will be added (if not already there, i.e if returned segments are in respect to hg38). Does that make sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that makes sense. Perhaps this functionality just wasn't added when this function was updated to support hg38?

R/database.R Outdated
#' segments_region_grch37 = get_cn_segments(chromosome = "chr8",
#' qstart = 128723128,
#' qend = 128774067,
#' projection = "grch37",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Projection and genome here are same as default, so maybe let's drop them from example to show the "minimum usage"? The next example then shows how to modify different projection and I think the flatfiles are not available for capture anyways

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Examples should make use of defaults implicitly and only show parameters that are forcing the function to run in non-default mode. This makes it easier to see the simplest way to use the function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I'll go over the examples and ensure parameters called with default values are not shown.

}

#get wildcards from this_seq_type (lazy)
seq_type = this_seq_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think this might not be needed here to keep the function simple, and this_seq_type can be used as is

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a variable with the name that matches the wildcard is necessary when we use glue. Don't change this unless we're certain glue isn't being used later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precisely, that is indeed why that variable was created. I will not implement the edit suggested by Kostia.


#deal with chr prefixes for region, based on selected genome projection.
if(projection == "grch37"){
if(grepl("chr", chromosome)){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this ifelse is not needed. The gsub will not complain when the "chr" is not in the string. Instead, the user here will get warning that only first element is used since the chromosome is a vector. I think this part can be reworked to something like

always gsub chr substring
if(projection == "hg38"){
    add the chr prefix back using paste0
}

This way, there are no ifelse checks so the code is simpler, prefix is properly handled, and user will get no warnings

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else part is needed because it's prepending chr if (and only if ) it's missing though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I meant that it is only checking the first occurrence. Here it is str_detect but grepl I think works the same way. So If there will be a mix of prefixed and non-prefixed chromosome names, the way it is currently implemented might not always deal with this properly

Warning message:
In if (!str_detect(genes$chromosome, "chr")) { :
  the condition has length > 1 and only the first element will be used

R/database.R Outdated
#This isn't yet standardized in the db so it's just a workaround "for now".

#sanity check type of qstart en qend
if(!is.numeric(qstart)){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ifelse checks can be resource consuming on large data frames with lots of rows. Here is a different intention but it shows that the checks are very slow. Instead, nothing bad will happen if we just always mutate the qstart and qend to numeric regardless of what they initially are.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a good point Kostia. Thanks!

R/viz.R Outdated
@@ -2041,7 +2049,9 @@ ashm_multi_rainbow_plot = function(regions_bed,
p = muts_anno %>%
ggplot() +
geom_point(aes(x = start, y = sample_id, colour = classification), alpha = 0.4, size = 0.6) +
theme(axis.text.y = element_blank()) +
labs(title = "", subtitle = "", x = "", y = "Sample") +
theme_cowplot() +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we switch to theme_Morons() here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed we can!

@@ -2041,7 +2049,9 @@ ashm_multi_rainbow_plot = function(regions_bed,
p = muts_anno %>%
ggplot() +
geom_point(aes(x = start, y = sample_id, colour = classification), alpha = 0.4, size = 0.6) +
theme(axis.text.y = element_blank()) +
labs(title = "", subtitle = "", x = "", y = "Sample") +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am more of a fan of specifying the empty ones in the theme() and setting to element_blank(). The reason is that when we set it to nothing, it is still processed as "invisible string" and the space for that `` string is allocated on the plot, creating unnecessary white space. Instead, when set through the theme, the title is removed completely and the plotting area has minimum of white space

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I follow and agree that this is probably the correct way of doing this. Thanks for the comment.

R/viz.R Outdated
geom_text(data = bed, aes(x = midpoint, y = height, label = name), size = 2.5, angle = 90) +
guides(color = guide_legend(reverse = TRUE, override.aes = list(size = 3)))
}

p = p +
labs(title = "", subtitle = "", x = "", y = "Sample") +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments as above for the multirainbow plot

@@ -3437,7 +3462,7 @@ fancy_v_count = function(this_sample,
}

#add chr prefix if missing
if(!str_detect(maf$Chromosome, "chr")){
if(!str_detect(maf$Chromosome, "chr")[1]){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same logic can be applied here for the chr handling

@@ -3694,7 +3719,7 @@ fancy_v_sizedis = function(this_sample,
}

#add chr prefix if missing
if(!str_detect(maf$Chromosome, "chr")){
if(!str_detect(maf$Chromosome, "chr")[1]){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here 😃

Copy link
Contributor

@Kdreval Kdreval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Adam! I think all the comments were addressed! I have also tested in on the GSC and I think it is ready to go!

@Kdreval Kdreval merged commit 07c6ffa into master Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants