plot updates, vignette & bug fixes, etc. #85

mattssca · 2022-04-14T21:08:11Z

Changes in this PR include the following updates:

fancy_cnlohbar - updated colours (to match gambl palette for CN states), optional parameter to include CN state = 2, adding a second y-axis to annotate the number of nucleotides affected by each CN state. Both y-axis are now in log10 (since CN = 2 is now included). nNucleotides in each CN state are now plotted as geom_points over the corresponding CN count bar.
fancy_sv_chrdistplot - Now uses the newly added palette for DEL and INS.
fancy_snv_ chrdistplot - Name and description of this plot have now been adjusted for SNVs (previously stated SNPs).
fancy_svbar - Now uses the newly added palette for DEL and INS.
fancy_vplot - Now uses the newly added palette for DEL and INS.
fancy_ideogram - Added in this PR. Colours have also been updated to match the defined palette for CN states. Previously, the plot annotated CN states up to 10+, but since the defined colour palette only goes up to CN state = 6, thus all CN states => 6 are annotated the same.
New palette added to get_gambl_colours, all_colours[["indels"]] = c("DEL" = "#53B1FC", "INS" = "#FC9C6D").
Updating get_codin_ssm to have these_samples_metadata parameter
Fixing bug in get_ssm_bt_region (caused an error in ashm_rainbow_plot in the vignette). Were Streamlined = TRUE was wrongly embedded inside the function, causing the function to always return only two columns. This was removed, streamlined and basic_columns were also set to FALSE by default.
lollipopPlot example in the vignette was updated to use get_coding_ssm (instead of get_ssm_by_gene), now works the intended way again.
Regenerated package documentation.
A lot of changes related to removing trailing white space in all scripts (no new code in portal.R, preprocessing_io.R and web.R).

Outstanding things that will be addressed in my next PR are examples in the vignette that are relying on the database for reading data. Examples of newly incorporated plotting functions will also be added.

Pull Request Checklists

Important: When opening a pull request, keep only the applicable checklist and delete all other sections.

Checklist for all PRs

Required

I tested the new code for my use case (please provide a reproducible example of how you tested the new functionality)
I ensured all dplyr functions that commonly conflict with other packages are fully qualified.

This can be checked and addressed by running check_functions.pl and responding to the prompts. Test your code after you do this.

I generated the documentation and checked for errors relating to the new function (e.g. devtools::document()) and added NAMESPACE and all other modified files in the root directory and under man.

Optional but preferred with PRs

I updated and/or successfully knitted a vignette that relies on the modified code (which ones?)

Checklist for New Functions

Required

I documented my function using ROxygen style.)
All parameters for the function are described in the documentation and the function has a decriptive title.

Example:

#' Use GISTIC2.0 scores output to reproduce maftools::chromoplot with more flexibility
#'
#' @param scores output file scores.gistic from the run of GISTIC2.0
#' @param genes_to_label optional. Provide a data frame of genes to label (if mutated). The first 3 columns must contain chromosome, start, and end coordinates. Another required column must contain gene names and be named `gene`. (truncated for example)
#' @param cutoff optional. Used to determine which regions to color as aberrant. Must be float in the range [0-1]. (truncated for example)

My function uses a library that isn't already a dependency of GAMBLR and I made the package aware of this dependency using the function documentation import statment.

Example:

#' @return nothing
#' @export
#' @import tidyverse ggrepel

Checklist for changes to existing code

I added/removed arguments to a function and updated documentation for all changed/new arguments
I tested the new code for compatability with existing functionality in the Master branch (please provide a reprex of how you tested the original functionality)

…new plotting functions to vignette, package doc etc.

mattssca · 2022-04-22T16:18:15Z

In this commit the following updates have been made:

Removed LOH from ideogram (replaced with CN0 and CN1)
New ideogram plotting function for visualizing multisample CN segments. This function is based and expanded on the previous ideogram function. The function currently accepts 2, 3 or 4 samples.
New examples were added to the vignette to showcase the latest additions in terms of plotting functions within GAMBL
Examples in the vignette that are using database data have been updated to read from flat files instead.
Examples in the vignette that calls no longer available functions have been removed or replaced.

…elper function)

… all fancy_x_plots

Kdreval

Thanks Adam! Some small updates here, mostly regarding converting colors and coordinates to call of GAMBLR functions or objects.

Kdreval · 2022-04-28T04:21:58Z

R/annotation.R

@@ -51,9 +51,9 @@ annotate_ssm_blacklist = function(mutations_df,
  for(b in blacklist_files){
    full_path = paste0(project_base,b)
    lifted_blacklist = read_tsv(full_path, col_names = c("chrpos", "blacklist_count"), show_col_types = FALSE)
-    lifted_blacklist = lifted_blacklist %>% 


in this file, there are only space/tab edits - is that right?

Kdreval · 2022-04-28T04:24:07Z

R/database.R


-    all_meta = get_gambl_metadata(from_flatfile = from_flatfile)
+  all_meta = get_gambl_metadata(from_flatfile = from_flatfile)


I think this is the line I mentioned here: #87. Can you please add seq_type=seq_type for this call?
Also, it looks like it will overwrite whatever the metadata user provided in these_samples_metadata handled in lines 1646-1651 here.

I think line 1653 needs to be removed, this was a mistake on my end. Ryan asked me to add this code and I should've probably removed the mentioned line. to not overwrite.

Kdreval · 2022-04-28T04:27:41Z

R/portal.R

@@ -34,100 +34,100 @@ setup_study = function(short_name = "GAMBL",
                       project_name = "gambl_minus_icgc",
                       description = "GAMBL data without ICGC",
                       cancer_type = "mixed",
-                       data_cancer_type = "brca", 


the portal.R is also just space/tab related I think

Kdreval · 2022-04-28T04:35:31Z

R/preprocessing_io.R

@@ -998,7 +998,7 @@ liftover_bedpe = function(bedpe_file,
    output = rtracklayer::export(first_ok, format = "bed") %>%
      read_tsv(col_names = c("chrom", "start", "end", "score", "strand", "nothing", "s1", "e1", "junk", "more", "stuff", "nada")) %>%
      dplyr::select("chrom", "start", "end")
-    
+


same goes here in this file, only space/tabs. Just making sure I did not miss relevant changes 😃

Yes, same thing here. Just space/tab edits :)

Kdreval · 2022-04-28T04:39:56Z

R/utilities.R

+  cn_segments = subset(cn_segments, CN != 2)
+
+  #update CN annotations (if present in cn_segment data).
+  if("0" %in% cn_segments$CN){


I wonder if this can be simplified to a unified call:

cn_segments$CN = paste0("cn_", cn_segments$CN , "_sample", samplen)

Yes, indeed! Nice, good suggestion.

Kdreval · 2022-04-28T04:54:54Z

R/viz.R

+  l_cn_seg = maf
+  l_cn_seg$lenght = l_cn_seg$end - l_cn_seg$start
+  l_cn_seg$CN = as.factor(l_cn_seg$CN)
+  l_cn_seg = dplyr::filter(l_cn_seg, CN != 0)


Is this filtering on neutral segments without CNV? Then this needs to be CN != 2, since this is not log ratios but the absolute CN

Another good catch, when I developed these functions I wrongly assumed that CN states 0 and 1 always would be interpreted as LOH events. I've made updates to this and the plots (i.e ideograms) now show CN states 0 and 1. So, this line needs to be updated so that CN0 is not removed.

Kdreval · 2022-04-28T04:56:53Z

R/viz.R

@@ -3045,8 +3066,396 @@ fancy_vplot = function(this_sample,
    labs(title = paste0(this_sample), subtitle = plot_subtitle, x = "", y = "Variant Size") +
    geom_violin(trim = FALSE, scale = "width", color = NA) +
    stat_summary(fun = mean, geom = "point", shape = 20, size = 3, color = "black") +
-    scale_fill_manual("", values = c("DEL" = "#E6856F", "INS" = "#3A8799")) +
+    scale_fill_manual("", values = c("DEL" = "#53B1FC", "INS" = "#FC9C6D")) +


also needs conversion to a function call here

Kdreval · 2022-04-28T04:58:42Z

R/viz.R

+  }
+
+  #grch37 coordinates
+  grch37_end = c(249250621, 243199373, 198022430, 191154276, 180915260, 171115067, 159138663, 146364022, 141213431, 135534747, 135006516, 133851895, 115169878, 107349540, 102531392, 90354753, 81195210, 78077248, 59128983, 63025520, 48129895, 51304566)


There are chromosome arms coordinates objects not as part of GAMBLR, maybe this should be converted to using those instead?

Yes, I could add ref/lcr-modules-references/genomes/grch37/genome_fasta/main_chromosomes.bed to the config.yml and then get the coordinates from the bed directly. This would also allow for use of different genome projections. What do you think?

It is already added to GAMBLR 😃 You can check these out:

GAMBLR::chromosome_arms_grch37 GAMBLR::chromosome_arms_hg38

Oh, nice! That's perfect, thanks.

Kdreval · 2022-04-28T05:04:40Z

R/viz.R

+      geom_segment(data = segment_data, aes(x = chr_start, xend = chr_end, y = y + 0.5, yend = yend + 0.5, label = chr), color = "white", lineend = "butt", size = 2, stat = "identity", position = position_dodge()) + #white space between chromosome groups (bottom)
+      annotate(geom = "text", x = anno_dist, y = segment_data$yend, label = segment_data$chr, color = "black", size = 7, hjust = 1) + #chr labels
+      labs(title = plot_title, subtitle = plot_sub) + #plot titles
+      scale_colour_manual(name = "", values = c(CN0 = "#4393C3", CN1 = "#92C5DE", CN3 = "#D6604DFF", CN4 = "#B2182BFF", CN5 = "#67001FFF", CN6_or_more = "#380015FF")) + #colours and legend


Can we convert it here to get_gambl_colors call?

Kdreval · 2022-04-28T05:07:50Z

R/viz.R

+  }
+
+  #plot
+  ggplot() +


this function does not seem to return any object. Sometimes it makes it difficult if you want to store a plot in a variable and then combine multiple plots with something like ggpubr:: ggarrange when arranging a multiplot figure. Like the next function below fancy_multisamp_ideogram can we store the plot here in something like p and then return it?

Absolutely!

Also, this function should also be updated to use the newly added helper function to subset cn states for the sample.

mattssca

Thanks for all your comments Kostia, I will address the changes requested, as well as some other minor things I just caught. There will be a new commit later today (or early tomorrow) in regards to this. Let me know if there. are any questions at your end, related to my replies.

mattssca · 2022-04-28T17:22:27Z

R/annotation.R

@@ -51,9 +51,9 @@ annotate_ssm_blacklist = function(mutations_df,
  for(b in blacklist_files){
    full_path = paste0(project_base,b)
    lifted_blacklist = read_tsv(full_path, col_names = c("chrpos", "blacklist_count"), show_col_types = FALSE)
-    lifted_blacklist = lifted_blacklist %>% 


mattssca · 2022-04-28T17:22:37Z

R/database.R


-    all_meta = get_gambl_metadata(from_flatfile = from_flatfile)
+  all_meta = get_gambl_metadata(from_flatfile = from_flatfile)


I think line 1653 needs to be removed, this was a mistake on my end. Ryan asked me to add this code and I should've probably removed the mentioned line. to not overwrite.

mattssca · 2022-04-28T17:23:23Z

R/preprocessing_io.R

@@ -998,7 +998,7 @@ liftover_bedpe = function(bedpe_file,
    output = rtracklayer::export(first_ok, format = "bed") %>%
      read_tsv(col_names = c("chrom", "start", "end", "score", "strand", "nothing", "s1", "e1", "junk", "more", "stuff", "nada")) %>%
      dplyr::select("chrom", "start", "end")
-    
+


Yes, same thing here. Just space/tab edits :)

mattssca · 2022-04-28T17:33:30Z

R/utilities.R

+  cn_segments = subset(cn_segments, CN != 2)
+
+  #update CN annotations (if present in cn_segment data).
+  if("0" %in% cn_segments$CN){


Yes, indeed! Nice, good suggestion.

mattssca · 2022-04-28T17:34:44Z

R/utilities.R

+  }
+
+  #convert to factor.
+  cn_segments$CN = as.factor(cn_segments$CN)


Exactly, the order of the levels doesn't matter, more so that they in fact are factors.

mattssca · 2022-04-28T17:45:34Z

R/viz.R

+  l_cn_seg = maf
+  l_cn_seg$lenght = l_cn_seg$end - l_cn_seg$start
+  l_cn_seg$CN = as.factor(l_cn_seg$CN)
+  l_cn_seg = dplyr::filter(l_cn_seg, CN != 0)


Another good catch, when I developed these functions I wrongly assumed that CN states 0 and 1 always would be interpreted as LOH events. I've made updates to this and the plots (i.e ideograms) now show CN states 0 and 1. So, this line needs to be updated so that CN0 is not removed.

mattssca · 2022-04-28T17:48:18Z

R/viz.R

+  }
+
+  #grch37 coordinates
+  grch37_end = c(249250621, 243199373, 198022430, 191154276, 180915260, 171115067, 159138663, 146364022, 141213431, 135534747, 135006516, 133851895, 115169878, 107349540, 102531392, 90354753, 81195210, 78077248, 59128983, 63025520, 48129895, 51304566)


Yes, I could add ref/lcr-modules-references/genomes/grch37/genome_fasta/main_chromosomes.bed to the config.yml and then get the coordinates from the bed directly. This would also allow for use of different genome projections. What do you think?

mattssca · 2022-04-28T17:48:41Z

R/viz.R

+  }
+
+  #plot
+  ggplot() +


Absolutely!

mattssca · 2022-04-28T17:48:55Z

R/viz.R

+      geom_segment(data = segment_data, aes(x = chr_start, xend = chr_end, y = y + 0.5, yend = yend + 0.5, label = chr), color = "white", lineend = "butt", size = 2, stat = "identity", position = position_dodge()) + #white space between chromosome groups (bottom)
+      annotate(geom = "text", x = anno_dist, y = segment_data$yend, label = segment_data$chr, color = "black", size = 7, hjust = 1) + #chr labels
+      labs(title = plot_title, subtitle = plot_sub) + #plot titles
+      scale_colour_manual(name = "", values = c(CN0 = "#4393C3", CN1 = "#92C5DE", CN3 = "#D6604DFF", CN4 = "#B2182BFF", CN5 = "#67001FFF", CN6_or_more = "#380015FF")) + #colours and legend


mattssca · 2022-04-28T17:55:23Z

R/viz.R

+  }
+
+  #plot
+  ggplot() +


Also, this function should also be updated to use the newly added helper function to subset cn states for the sample.

mattssca · 2022-04-28T21:51:57Z

Changes in this commit include fixes to issues/suggestions raised in the recent PR review.

Use global colours - The following plotting function have been updated to read palette from get_gambl_colours(): fancy_sv_chrdistplot, fancy_sv_bar, fancy_vplot, fancy_ideogram, fancy_multisamp_ideogram.
Renaming of variable returned after calling get_sample_cn_segments in fancy_cnbar (not a maf returned)
New parameter was added to fancy_cnbar, allowing the user to specify the cut-off value for maximum CN states to be retrieved.
CN0 is now retained in fancy_cnbar.
Chromosome tables for both ideograms are updated, now using GAMBLR::chromosome_arms_grch37 to retrieve this information.
fancy_ideogram now stores the plot inside a variable (p) allowing the user to combine multiple plots when arranging a multiplot figure.
fancy_ideogram is now also using the newly added helper function (subset_cnstates).
Above mentioned helper function is also simplified to a unified call.
Overwriting line (all_meta) to database.R has been removed.
Package documentation has been regenerated to reflect added/updated parameters.

…ng sample-level report (pdf)

Kdreval · 2022-04-29T01:22:08Z

R/viz.R

@@ -2942,48 +2947,48 @@ fancy_svbar = function(this_sample,
 fancy_cnbar = function(this_sample,
                       plot_subtitle = "n CNV Segments (barplots, left y-axis), n Affected bases for each CN state",
                       chr_select = paste0("chr", c(1:22)),
+                       cutoff = 160,


is this a typo or intended to be 160 for the CN state?

Oh, I just put a really high number as a placeholder and forgot to change it back to something more realistic.

Kdreval

Thanks Adam!

plot updates, vignette & bug fixes, etc.

504c137

This was referenced Apr 14, 2022

ashm_rainbow_plot example in vignette is broken #75

Closed

Vignette refers to some functionality that is deprecated or gone #77

Closed

Ideogram CN0/1 and LOH update + new ideogram for multisample. Adding …

192feac

…new plotting functions to vignette, package doc etc.

mattssca added 7 commits April 22, 2022 09:19

Merge branch 'master' into cmattsson-dev

c39d0ad

fancy_multisamp_ideogram: fixing issues, condensed code (e.g adding h…

5e0e6d9

…elper function)

fixing no new line

b28314a

Merge branch 'master' into cmattsson-dev

f802c93

adding closing bracket to subset_cnstates...

d977054

updating flat_file compatibility and setting flat_file to default for…

8d31a4c

… all fancy_x_plots

updating package documentation

143d182

Kdreval self-requested a review April 28, 2022 04:19

Kdreval reviewed Apr 28, 2022

View reviewed changes

mattssca commented Apr 28, 2022

View reviewed changes

fixes to issues/suggestions raised in recent PR

693ac94

mattssca requested a review from Kdreval April 28, 2022 21:52

updating plot titles felxibillity and adding new function for compili…

abeff03

…ng sample-level report (pdf)

Kdreval reviewed Apr 29, 2022

View reviewed changes

updating default cutoff value for fancy_cnbar

0c3f58b

Kdreval approved these changes Apr 29, 2022

View reviewed changes

mattssca merged commit 90d4c67 into master Apr 29, 2022


		all_meta = get_gambl_metadata(from_flatfile = from_flatfile)
		all_meta = get_gambl_metadata(from_flatfile = from_flatfile)

plot updates, vignette & bug fixes, etc. #85

plot updates, vignette & bug fixes, etc. #85

Conversation

mattssca commented Apr 14, 2022 • edited Loading

Pull Request Checklists

Checklist for all PRs

Required

Optional but preferred with PRs

Checklist for New Functions

Required

Checklist for changes to existing code

mattssca commented Apr 22, 2022

Kdreval left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattssca left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattssca commented Apr 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kdreval left a comment

Choose a reason for hiding this comment

mattssca commented Apr 14, 2022 •

edited

Loading

mattssca commented Apr 28, 2022 •

edited

Loading