-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plot updates, vignette & bug fixes, etc. #85
Conversation
…new plotting functions to vignette, package doc etc.
In this commit the following updates have been made:
|
… all fancy_x_plots
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Adam! Some small updates here, mostly regarding converting colors and coordinates to call of GAMBLR functions or objects.
@@ -51,9 +51,9 @@ annotate_ssm_blacklist = function(mutations_df, | |||
for(b in blacklist_files){ | |||
full_path = paste0(project_base,b) | |||
lifted_blacklist = read_tsv(full_path, col_names = c("chrpos", "blacklist_count"), show_col_types = FALSE) | |||
lifted_blacklist = lifted_blacklist %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this file, there are only space/tab edits - is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Sir!
R/database.R
Outdated
|
||
all_meta = get_gambl_metadata(from_flatfile = from_flatfile) | ||
all_meta = get_gambl_metadata(from_flatfile = from_flatfile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the line I mentioned here: #87. Can you please add seq_type=seq_type for this call?
Also, it looks like it will overwrite whatever the metadata user provided in these_samples_metadata
handled in lines 1646-1651 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think line 1653 needs to be removed, this was a mistake on my end. Ryan asked me to add this code and I should've probably removed the mentioned line. to not overwrite.
@@ -34,100 +34,100 @@ setup_study = function(short_name = "GAMBL", | |||
project_name = "gambl_minus_icgc", | |||
description = "GAMBL data without ICGC", | |||
cancer_type = "mixed", | |||
data_cancer_type = "brca", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the portal.R
is also just space/tab related I think
@@ -998,7 +998,7 @@ liftover_bedpe = function(bedpe_file, | |||
output = rtracklayer::export(first_ok, format = "bed") %>% | |||
read_tsv(col_names = c("chrom", "start", "end", "score", "strand", "nothing", "s1", "e1", "junk", "more", "stuff", "nada")) %>% | |||
dplyr::select("chrom", "start", "end") | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same goes here in this file, only space/tabs. Just making sure I did not miss relevant changes 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, same thing here. Just space/tab edits :)
R/utilities.R
Outdated
cn_segments = subset(cn_segments, CN != 2) | ||
|
||
#update CN annotations (if present in cn_segment data). | ||
if("0" %in% cn_segments$CN){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this can be simplified to a unified call:
cn_segments$CN = paste0("cn_", cn_segments$CN , "_sample", samplen)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed! Nice, good suggestion.
R/viz.R
Outdated
l_cn_seg = maf | ||
l_cn_seg$lenght = l_cn_seg$end - l_cn_seg$start | ||
l_cn_seg$CN = as.factor(l_cn_seg$CN) | ||
l_cn_seg = dplyr::filter(l_cn_seg, CN != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this filtering on neutral segments without CNV? Then this needs to be CN != 2, since this is not log ratios but the absolute CN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another good catch, when I developed these functions I wrongly assumed that CN states 0 and 1 always would be interpreted as LOH events. I've made updates to this and the plots (i.e ideograms) now show CN states 0 and 1. So, this line needs to be updated so that CN0 is not removed.
R/viz.R
Outdated
@@ -3045,8 +3066,396 @@ fancy_vplot = function(this_sample, | |||
labs(title = paste0(this_sample), subtitle = plot_subtitle, x = "", y = "Variant Size") + | |||
geom_violin(trim = FALSE, scale = "width", color = NA) + | |||
stat_summary(fun = mean, geom = "point", shape = 20, size = 3, color = "black") + | |||
scale_fill_manual("", values = c("DEL" = "#E6856F", "INS" = "#3A8799")) + | |||
scale_fill_manual("", values = c("DEL" = "#53B1FC", "INS" = "#FC9C6D")) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also needs conversion to a function call here
R/viz.R
Outdated
} | ||
|
||
#grch37 coordinates | ||
grch37_end = c(249250621, 243199373, 198022430, 191154276, 180915260, 171115067, 159138663, 146364022, 141213431, 135534747, 135006516, 133851895, 115169878, 107349540, 102531392, 90354753, 81195210, 78077248, 59128983, 63025520, 48129895, 51304566) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are chromosome arms coordinates objects not as part of GAMBLR, maybe this should be converted to using those instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I could add ref/lcr-modules-references/genomes/grch37/genome_fasta/main_chromosomes.bed
to the config.yml and then get the coordinates from the bed directly. This would also allow for use of different genome projections. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already added to GAMBLR 😃 You can check these out:
GAMBLR::chromosome_arms_grch37
GAMBLR::chromosome_arms_hg38
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nice! That's perfect, thanks.
R/viz.R
Outdated
geom_segment(data = segment_data, aes(x = chr_start, xend = chr_end, y = y + 0.5, yend = yend + 0.5, label = chr), color = "white", lineend = "butt", size = 2, stat = "identity", position = position_dodge()) + #white space between chromosome groups (bottom) | ||
annotate(geom = "text", x = anno_dist, y = segment_data$yend, label = segment_data$chr, color = "black", size = 7, hjust = 1) + #chr labels | ||
labs(title = plot_title, subtitle = plot_sub) + #plot titles | ||
scale_colour_manual(name = "", values = c(CN0 = "#4393C3", CN1 = "#92C5DE", CN3 = "#D6604DFF", CN4 = "#B2182BFF", CN5 = "#67001FFF", CN6_or_more = "#380015FF")) + #colours and legend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we convert it here to get_gambl_colors call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed!
R/viz.R
Outdated
} | ||
|
||
#plot | ||
ggplot() + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function does not seem to return any object. Sometimes it makes it difficult if you want to store a plot in a variable and then combine multiple plots with something like ggpubr:: ggarrange when arranging a multiplot figure. Like the next function below fancy_multisamp_ideogram
can we store the plot here in something like p
and then return it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this function should also be updated to use the newly added helper function to subset cn states for the sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your comments Kostia, I will address the changes requested, as well as some other minor things I just caught. There will be a new commit later today (or early tomorrow) in regards to this. Let me know if there. are any questions at your end, related to my replies.
@@ -51,9 +51,9 @@ annotate_ssm_blacklist = function(mutations_df, | |||
for(b in blacklist_files){ | |||
full_path = paste0(project_base,b) | |||
lifted_blacklist = read_tsv(full_path, col_names = c("chrpos", "blacklist_count"), show_col_types = FALSE) | |||
lifted_blacklist = lifted_blacklist %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Sir!
R/database.R
Outdated
|
||
all_meta = get_gambl_metadata(from_flatfile = from_flatfile) | ||
all_meta = get_gambl_metadata(from_flatfile = from_flatfile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think line 1653 needs to be removed, this was a mistake on my end. Ryan asked me to add this code and I should've probably removed the mentioned line. to not overwrite.
@@ -998,7 +998,7 @@ liftover_bedpe = function(bedpe_file, | |||
output = rtracklayer::export(first_ok, format = "bed") %>% | |||
read_tsv(col_names = c("chrom", "start", "end", "score", "strand", "nothing", "s1", "e1", "junk", "more", "stuff", "nada")) %>% | |||
dplyr::select("chrom", "start", "end") | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, same thing here. Just space/tab edits :)
R/utilities.R
Outdated
cn_segments = subset(cn_segments, CN != 2) | ||
|
||
#update CN annotations (if present in cn_segment data). | ||
if("0" %in% cn_segments$CN){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed! Nice, good suggestion.
} | ||
|
||
#convert to factor. | ||
cn_segments$CN = as.factor(cn_segments$CN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, the order of the levels doesn't matter, more so that they in fact are factors.
R/viz.R
Outdated
l_cn_seg = maf | ||
l_cn_seg$lenght = l_cn_seg$end - l_cn_seg$start | ||
l_cn_seg$CN = as.factor(l_cn_seg$CN) | ||
l_cn_seg = dplyr::filter(l_cn_seg, CN != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another good catch, when I developed these functions I wrongly assumed that CN states 0 and 1 always would be interpreted as LOH events. I've made updates to this and the plots (i.e ideograms) now show CN states 0 and 1. So, this line needs to be updated so that CN0 is not removed.
R/viz.R
Outdated
} | ||
|
||
#grch37 coordinates | ||
grch37_end = c(249250621, 243199373, 198022430, 191154276, 180915260, 171115067, 159138663, 146364022, 141213431, 135534747, 135006516, 133851895, 115169878, 107349540, 102531392, 90354753, 81195210, 78077248, 59128983, 63025520, 48129895, 51304566) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I could add ref/lcr-modules-references/genomes/grch37/genome_fasta/main_chromosomes.bed
to the config.yml and then get the coordinates from the bed directly. This would also allow for use of different genome projections. What do you think?
R/viz.R
Outdated
} | ||
|
||
#plot | ||
ggplot() + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely!
R/viz.R
Outdated
geom_segment(data = segment_data, aes(x = chr_start, xend = chr_end, y = y + 0.5, yend = yend + 0.5, label = chr), color = "white", lineend = "butt", size = 2, stat = "identity", position = position_dodge()) + #white space between chromosome groups (bottom) | ||
annotate(geom = "text", x = anno_dist, y = segment_data$yend, label = segment_data$chr, color = "black", size = 7, hjust = 1) + #chr labels | ||
labs(title = plot_title, subtitle = plot_sub) + #plot titles | ||
scale_colour_manual(name = "", values = c(CN0 = "#4393C3", CN1 = "#92C5DE", CN3 = "#D6604DFF", CN4 = "#B2182BFF", CN5 = "#67001FFF", CN6_or_more = "#380015FF")) + #colours and legend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed!
R/viz.R
Outdated
} | ||
|
||
#plot | ||
ggplot() + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this function should also be updated to use the newly added helper function to subset cn states for the sample.
Changes in this commit include fixes to issues/suggestions raised in the recent PR review.
|
…ng sample-level report (pdf)
R/viz.R
Outdated
@@ -2942,48 +2947,48 @@ fancy_svbar = function(this_sample, | |||
fancy_cnbar = function(this_sample, | |||
plot_subtitle = "n CNV Segments (barplots, left y-axis), n Affected bases for each CN state", | |||
chr_select = paste0("chr", c(1:22)), | |||
cutoff = 160, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a typo or intended to be 160 for the CN state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I just put a really high number as a placeholder and forgot to change it back to something more realistic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Adam!
Changes in this PR include the following updates:
fancy_cnlohbar - updated colours (to match gambl palette for CN states), optional parameter to include CN state = 2, adding a second y-axis to annotate the number of nucleotides affected by each CN state. Both y-axis are now in log10 (since CN = 2 is now included). nNucleotides in each CN state are now plotted as geom_points over the corresponding CN count bar.
fancy_sv_chrdistplot - Now uses the newly added palette for DEL and INS.
fancy_snv_ chrdistplot - Name and description of this plot have now been adjusted for SNVs (previously stated SNPs).
fancy_svbar - Now uses the newly added palette for DEL and INS.
fancy_vplot - Now uses the newly added palette for DEL and INS.
fancy_ideogram - Added in this PR. Colours have also been updated to match the defined palette for CN states. Previously, the plot annotated CN states up to 10+, but since the defined colour palette only goes up to CN state = 6, thus all CN states => 6 are annotated the same.
New palette added to get_gambl_colours, all_colours[["indels"]] = c("DEL" = "#53B1FC", "INS" = "#FC9C6D").
Updating get_codin_ssm to have these_samples_metadata parameter
Fixing bug in get_ssm_bt_region (caused an error in ashm_rainbow_plot in the vignette). Were Streamlined = TRUE was wrongly embedded inside the function, causing the function to always return only two columns. This was removed, streamlined and basic_columns were also set to FALSE by default.
lollipopPlot example in the vignette was updated to use get_coding_ssm (instead of get_ssm_by_gene), now works the intended way again.
Regenerated package documentation.
A lot of changes related to removing trailing white space in all scripts (no new code in portal.R, preprocessing_io.R and web.R).
Outstanding things that will be addressed in my next PR are examples in the vignette that are relying on the database for reading data. Examples of newly incorporated plotting functions will also be added.
Pull Request Checklists
Important: When opening a pull request, keep only the applicable checklist and delete all other sections.
Checklist for all PRs
Required
I tested the new code for my use case (please provide a reproducible example of how you tested the new functionality)
I ensured all dplyr functions that commonly conflict with other packages are fully qualified.
This can be checked and addressed by running
check_functions.pl
and responding to the prompts. Test your code after you do this.devtools::document()
) and addedNAMESPACE
and all other modified files in the root directory and underman
.Optional but preferred with PRs
Checklist for New Functions
Required
I documented my function using ROxygen style.)
All parameters for the function are described in the documentation and the function has a decriptive title.
Example:
import
statment.Example:
Checklist for changes to existing code
I added/removed arguments to a function and updated documentation for all changed/new arguments
I tested the new code for compatability with existing functionality in the Master branch (please provide a reprex of how you tested the original functionality)