GAMBLR hotfixes and improvements. #142

mattssca · 2022-11-22T23:05:00Z

This PR includes a variety of hotfixes and updates to existing functions. Including;

assign_cn_to_ssm now takes this_seq_type parameter.
get_cn_segments now has support for flat files.
prettyOncoplot can now deal with silent mutations.
More examples added to test_functions.R.
Improved instructions on how to run remote (README).
Supressing annoying warning messages from readr.
Updates to fancy_v_count so that it correctly calls assign_cn_to_ssm.
Updates to the config (redundant line from get_gene_expression update).
Hotfix for fancy_x_plot functions that threw unnecessary writing messages.
Adding SSM thematic vignette.
Improvements to ashm_rainbow_plot (fixing fill, margins, axis titles, ticks, etc.).
Added new thematic vignette (CN/SV), in progress...

For more info, please see the commit messages below.

This branch was recently merged with the current master (22/11). Many merge conflicts were flagged, and even though the majority of these conflicts could be resolved automatically, a handful had to be resolved manually.

I have also tested this branch using the examples available in test_functions.R. The functions listed in this script runs without errors and are producing the expected outputs. With that being said, I would highly recommend at least one additional tester to also confirm this given the number of merge conflicts and updates to commonly used GAMBLR functions, and perhaps other examples that are not (yet) represented in test_functions.R

After this PR has been approved, I will close the corresponding issues on this repo.

If this PR should still be active upon my return from Sweden, I have some additional improvements that I am currently working on. But these could also very well be on a future PR.

Let me know if there are any questions.

Thanks,

…ted get_gambl_colours on rmorin branch

…remote

Kdreval · 2022-11-29T21:20:33Z

R/database.R

+#' @param qend End coordinate of the range you are restricting to. Required parameter if region is not specified.
+#' @param projection Selected genome projection for returned Cn segments.
+#' @param this_seq_type Seq type for returned Cn segments. Currently, only genome is supported. Capture samples will be added once processed through CNV protocols.
+#' @param remove_chr_prefix Prepend all chromosome names with chr (required by some downstream analyses). Default is FALSE. 


Maybe the description should be not "Prepend ...", but rather "Remove ..."? Otherwise does not sound to match the parameter name

Yeah I was confused about that too. with_chr_prefix seemed reasonable to me because it means the user wants the result to have chr prefix there. Prepend/remove implies an action will be done. The "with" naming implies it will be done, when necessary, if requested (if that makes any sense)

Those are good points, I will revert the name change here. However, I do still think the old name is somewhat misleading. This is because this parameter only has one condition in the code, if set to FALSE (default) chr prefixes will be removed. At least to me, this implies that if set to TRUE, chr prefixes should be available. But that's not how the function operates. I can easily update this so that if set to TRUE, "chr" will be added (if not already there, i.e if returned segments are in respect to hg38). Does that make sense?

Yes that makes sense. Perhaps this functionality just wasn't added when this function was updated to support hg38?

Kdreval · 2022-11-29T21:22:45Z

R/database.R

+#' segments_region_grch37 = get_cn_segments(chromosome = "chr8",
+#'                                          qstart = 128723128,
+#'                                          qend = 128774067,
+#'                                          projection = "grch37", 


Projection and genome here are same as default, so maybe let's drop them from example to show the "minimum usage"? The next example then shows how to modify different projection and I think the flatfiles are not available for capture anyways

I agree. Examples should make use of defaults implicitly and only show parameters that are forcing the function to run in non-default mode. This makes it easier to see the simplest way to use the function

Indeed, I'll go over the examples and ensure parameters called with default values are not shown.

Kdreval · 2022-11-29T21:24:49Z

R/database.R

+  }
+
+  #get wildcards from this_seq_type (lazy)
+  seq_type = this_seq_type


I would think this might not be needed here to keep the function simple, and this_seq_type can be used as is

Creating a variable with the name that matches the wildcard is necessary when we use glue. Don't change this unless we're certain glue isn't being used later

Precisely, that is indeed why that variable was created. I will not implement the edit suggested by Kostia.

Kdreval · 2022-11-29T21:32:16Z

R/database.R

+
+  #deal with chr prefixes for region, based on selected genome projection.
+  if(projection == "grch37"){
+    if(grepl("chr", chromosome)){


I think this ifelse is not needed. The gsub will not complain when the "chr" is not in the string. Instead, the user here will get warning that only first element is used since the chromosome is a vector. I think this part can be reworked to something like

always gsub chr substring if(projection == "hg38"){ add the chr prefix back using paste0 }

This way, there are no ifelse checks so the code is simpler, prefix is properly handled, and user will get no warnings

The else part is needed because it's prepending chr if (and only if ) it's missing though

Right, I meant that it is only checking the first occurrence. Here it is str_detect but grepl I think works the same way. So If there will be a mix of prefixed and non-prefixed chromosome names, the way it is currently implemented might not always deal with this properly

Warning message: In if (!str_detect(genes$chromosome, "chr")) { : the condition has length > 1 and only the first element will be used

Kdreval · 2022-11-29T21:35:41Z

R/database.R

-  #This isn't yet standardized in the db so it's just a workaround "for now".
+
+  #sanity check type of qstart en qend
+  if(!is.numeric(qstart)){


The ifelse checks can be resource consuming on large data frames with lots of rows. Here is a different intention but it shows that the checks are very slow. Instead, nothing bad will happen if we just always mutate the qstart and qend to numeric regardless of what they initially are.

Yes, that is a good point Kostia. Thanks!

Kdreval · 2022-11-29T21:46:11Z

R/viz.R

@@ -2041,7 +2049,9 @@ ashm_multi_rainbow_plot = function(regions_bed,
  p = muts_anno %>%
        ggplot() +
        geom_point(aes(x = start, y = sample_id, colour = classification), alpha = 0.4, size = 0.6) +
-        theme(axis.text.y = element_blank()) +
+        labs(title = "", subtitle = "", x = "", y = "Sample") +
+        theme_cowplot() + 


can we switch to theme_Morons() here?

Indeed we can!

Kdreval · 2022-11-29T21:49:24Z

R/viz.R

@@ -2041,7 +2049,9 @@ ashm_multi_rainbow_plot = function(regions_bed,
  p = muts_anno %>%
        ggplot() +
        geom_point(aes(x = start, y = sample_id, colour = classification), alpha = 0.4, size = 0.6) +
-        theme(axis.text.y = element_blank()) +
+        labs(title = "", subtitle = "", x = "", y = "Sample") +


I am more of a fan of specifying the empty ones in the theme() and setting to element_blank(). The reason is that when we set it to nothing, it is still processed as "invisible string" and the space for that `` string is allocated on the plot, creating unnecessary white space. Instead, when set through the theme, the title is removed completely and the plotting area has minimum of white space

Yes, I follow and agree that this is probably the correct way of doing this. Thanks for the comment.

Kdreval · 2022-11-29T21:54:01Z

R/viz.R

      geom_text(data = bed, aes(x = midpoint, y = height, label = name), size = 2.5, angle = 90) +
      guides(color = guide_legend(reverse = TRUE, override.aes = list(size = 3)))
  }
+
+  p = p +
+    labs(title = "", subtitle = "", x = "", y = "Sample") +


same comments as above for the multirainbow plot

Kdreval · 2022-11-29T21:55:53Z

R/viz.R

@@ -3437,7 +3462,7 @@ fancy_v_count = function(this_sample,
  }

  #add chr prefix if missing
-  if(!str_detect(maf$Chromosome, "chr")){
+  if(!str_detect(maf$Chromosome, "chr")[1]){


Same logic can be applied here for the chr handling

Kdreval · 2022-11-29T21:56:08Z

R/viz.R

@@ -3694,7 +3719,7 @@ fancy_v_sizedis = function(this_sample,
  }

  #add chr prefix if missing
-  if(!str_detect(maf$Chromosome, "chr")){
+  if(!str_detect(maf$Chromosome, "chr")[1]){


and here 😃

Kdreval

Thanks Adam! I think all the comments were addressed! I have also tested in on the GSC and I think it is ready to go!

mattssca added 29 commits October 25, 2022 12:49

removing redundant line from config (tidy_expression_file)

941306d

hot-fix for assign_cn_to_ssm (adding this_seq_type) + merge with udpa…

aa9f64f

…ted get_gambl_colours on rmorin branch

updates to get_cn_segments

30e28eb

updates to prettyOncoplot to deal with Silent mutations

4979832

adding examples to test_functions.R script

249ebad

updating readme, with more detailed instruction on how to run GAMBLR …

15c1ada

…remote

updating package documentation

eb1cacd

suppress annoying readr messages

3e8b6f6

suppress annoying readr messages for one trailing script

5c7f43c

reverting with_chr_prefix change

1ef877d

updating examples

55929b7

updating documentation

9658fb9

hot-fix for fancy_v_count that internally calls assign_cn_to_ssm

aee7d84

removing redundant example

d37d2aa

hot-fix for fancy plot function that threw warnings

75051f4

updating package description

ea8e54a

adding ssm themed vignette

61534c3

hotfixes for various plotting functions

efe0fd2

updating documentation for get_cn_segments

d67fff2

updating SSM vignette

e15c4ed

adding new vignette, CN/SV tutorial (work in progress)

eb6d35c

udpates to a collection of plotting functions

898716c

updating SSM vignette

d67e2e6

adding new vignette (CN/SV)

4b2a096

Updating package doc

ed7d1ae

master merge

f5d438f

updating pacckage documeentation

55000fc

fixing automerge issue

a517c74

updating test functions

3789f22

Kdreval reviewed Nov 29, 2022

View reviewed changes

mattssca added 2 commits December 6, 2022 15:33

updates based on review comments

5f1b0e2

hot-fix for get_sample_cn_segments

6dfee38

mattssca mentioned this pull request Dec 8, 2022

bug in get_sample_cn_segments #144

Closed

updates to cn/sv vignette

2b26c6b

Kdreval approved these changes Dec 9, 2022

View reviewed changes

Kdreval merged commit 07c6ffa into master Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GAMBLR hotfixes and improvements. #142

GAMBLR hotfixes and improvements. #142

mattssca commented Nov 22, 2022

Kdreval Nov 29, 2022

rdmorin Dec 6, 2022

mattssca Dec 6, 2022

rdmorin Dec 6, 2022

Kdreval Nov 29, 2022

rdmorin Dec 6, 2022

mattssca Dec 6, 2022

Kdreval Nov 29, 2022

rdmorin Dec 6, 2022

mattssca Dec 6, 2022

Kdreval Nov 29, 2022

rdmorin Dec 6, 2022

Kdreval Dec 9, 2022

Kdreval Nov 29, 2022

rdmorin Dec 6, 2022

mattssca Dec 6, 2022

Kdreval Nov 29, 2022

mattssca Dec 6, 2022

Kdreval Nov 29, 2022

mattssca Dec 6, 2022

Kdreval Nov 29, 2022

Kdreval Nov 29, 2022

Kdreval Nov 29, 2022

Kdreval left a comment

GAMBLR hotfixes and improvements. #142

GAMBLR hotfixes and improvements. #142

Conversation

mattssca commented Nov 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kdreval left a comment

Choose a reason for hiding this comment