Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Updated the get-transcript-info.R file and its dependencies #73

Merged
merged 156 commits into from
Jun 14, 2023
Merged
Show file tree
Hide file tree
Changes from 154 commits
Commits
Show all changes
156 commits
Select commit Hold shift + click to select a range
849ed48
feat: added skeleton for 3-prime RNA-seq analysis
johanneskoester Jun 2, 2022
f8702fd
add a rule for obtaining the maximum read length after trimming
johanneskoester Jun 22, 2022
55a1fe6
remove unnecessary parameter
johanneskoester Jun 22, 2022
2bc7a0a
add script
johanneskoester Jun 22, 2022
381cfbc
add env
johanneskoester Jun 22, 2022
33dbb58
test push
Jun 23, 2022
fa2e575
added 3prime reference seq fetch code
Jun 27, 2022
e560178
updated the script get-3prime-seqs.R with 'coding' seqs
Jul 7, 2022
5eaa67d
added script for histogram and 3prime from cds
Jul 26, 2022
d88dbdd
added heatmap script for top 50 var genes
Aug 3, 2022
391af4c
added histogram and heatmap scripts
Aug 5, 2022
bfbcdfb
Modified the histogram plot and heatmap scripts
Aug 17, 2022
a1cce49
Updated histogram plot and dependency workflows
Aug 23, 2022
e651835
modified workflow with is-3-prime-rna-seq: true/false
manuelphilip Aug 26, 2022
d990322
Merge branch '3-prime-rna' of https://github.com/snakemake-workflows/…
manuelphilip Aug 26, 2022
e5426c8
resovled merge issue happened
manuelphilip Aug 26, 2022
2e5be41
modifed script get_3prime-seq.py
manuelphilip Aug 26, 2022
5a179bf
updated workflow that includes filtering out non-canonical transcript…
manuelphilip Sep 1, 2022
5697b3c
fixed unfiltering of canonical transcripts and added rst file for pat…
manuelphilip Sep 2, 2022
26efc57
updated QC plot and workflow to get aligned reads from canonical tran…
manuelphilip Sep 28, 2022
42aa1ae
updated cutadapt rule for 3prime reads and dependencies
manuelphilip Oct 4, 2022
4412dd1
updated config file and dependencies
manuelphilip Oct 5, 2022
81c90ff
update code plot_ind-transcripts_histogram.py and its dependencies
manuelphilip Oct 6, 2022
e34a894
renamed file plot_ind-transcripts_histogram.py
manuelphilip Oct 6, 2022
e486860
Added `plot-qc: all` to config file
manuelphilip Oct 6, 2022
664b48e
Merge branch 'main' into 3-prime-rna
johanneskoester Oct 11, 2022
24e52ed
Update workflow/envs/QC.yaml
manuelphilip Oct 11, 2022
b2f6d4b
Update workflow/envs/aligned_pos.yaml
manuelphilip Oct 11, 2022
72daaa0
Update workflow/envs/canonical_reads.yaml
manuelphilip Oct 27, 2022
9139320
Update workflow/envs/get_canonical_ids.yaml
manuelphilip Oct 27, 2022
0d4faab
Update config/config.yaml
manuelphilip Oct 27, 2022
eba5fb4
Update config/config.yaml
manuelphilip Oct 27, 2022
739cd14
Update workflow/envs/heatmap.yaml
manuelphilip Oct 27, 2022
695236d
Update workflow/envs/pysam.yaml
manuelphilip Oct 27, 2022
ca55ffe
Update workflow/envs/r-fasta.yaml
manuelphilip Oct 27, 2022
16811e3
Update workflow/report/plot-QC.rst
manuelphilip Oct 27, 2022
e1954d3
Update workflow/rules/common.smk
manuelphilip Oct 27, 2022
40c1ac9
Update workflow/scripts/sleuth-diffexp.R
manuelphilip Oct 27, 2022
f31cb99
Added bwa rule and updated workflow and its dependencies.
manuelphilip Oct 28, 2022
a57424a
Merge branch '3-prime-rna' of github.com:snakemake-workflows/rna-seq-…
manuelphilip Oct 28, 2022
c204a20
Added spia-datavzd to report and updated workflow and its dependencies
manuelphilip Nov 3, 2022
cdd0fc6
updated and renamed `plot_ind-transcripts_histogram.py` to `plot-ind…
manuelphilip Nov 3, 2022
aa90b63
modified the workflow and auxiliary files for the QC plot
manuelphilip Nov 16, 2022
835d05c
Updated kallisto rules and fix bugs in QC-plot script
manuelphilip Nov 24, 2022
1c1081a
Add datavzrd tables for diffexp, go_terms and updated dependencies
manuelphilip Nov 30, 2022
7aa5a84
updated `config.schema.yaml` file and fix `spia datavzrd bugs`
manuelphilip Dec 1, 2022
2680e4d
updated config.yaml
manuelphilip Dec 1, 2022
f7ef4dd
updated datavzrd version, diff exp tables
manuelphilip Dec 1, 2022
5e99ab5
Merge branch 'main' into 3-prime-rna
johanneskoester Dec 2, 2022
6add9aa
fixes
johanneskoester Dec 2, 2022
10d960f
fix
johanneskoester Dec 2, 2022
890620f
fix formatting of cutadapt rules
johanneskoester Dec 2, 2022
b5be7d7
minor
johanneskoester Dec 2, 2022
1dcdac4
fix config access for 3-prime-rna-seq keys
johanneskoester Dec 2, 2022
bdb9ec7
added labels to datavzrd output
johanneskoester Dec 2, 2022
f2796ce
fix lints
johanneskoester Dec 2, 2022
17c531c
fix quotes
johanneskoester Dec 2, 2022
0dbdd83
fix quant for non-3-prime data
johanneskoester Dec 2, 2022
50574b7
fixes and categories
johanneskoester Dec 2, 2022
f0577eb
Fix folder path in `kallisto_quant` rule
manuelphilip Dec 2, 2022
2ada6a2
updated datavzrd rule with volcano plot
manuelphilip Dec 5, 2022
a4a4a95
Merge branch '3-prime-rna' of github.com:snakemake-workflows/rna-seq-…
manuelphilip Dec 5, 2022
141a7d6
Fix kallisto input once `3-prime-rna-seq` is set to false. Added volc…
manuelphilip Dec 9, 2022
c448806
fix formatting rule datavzrd
manuelphilip Dec 9, 2022
9f95844
fix snakemake format for rule datavzrd
manuelphilip Dec 9, 2022
80d7a97
fix bugs when `3-prime-rna-seq` set to `false`
manuelphilip Dec 15, 2022
bee1b6c
fix formatting
manuelphilip Dec 15, 2022
f50673c
fix `vega_plot_volcano.py` bug
manuelphilip Dec 15, 2022
748ac56
Create config.yaml
manuelphilip Feb 1, 2023
79bc8d7
Update config.yaml
manuelphilip Feb 1, 2023
8eee57f
fixed formatting
manuelphilip Feb 1, 2023
d21b87e
updated cutadapt wrapper version
manuelphilip Feb 1, 2023
6dc9af5
fix cutadapt se
manuelphilip Feb 1, 2023
d7c364b
added `extra` options to `params` in cutadapt rule
manuelphilip Feb 1, 2023
789f296
fix cutadapt `se` params
manuelphilip Feb 1, 2023
99a18bd
fixed cutadapt `se` bug
manuelphilip Feb 1, 2023
4e7d22f
Update config.yaml
manuelphilip Feb 1, 2023
19c4793
Added test case file for 3-prime-RNA data
manuelphilip Feb 1, 2023
78c476b
Updated folder path of .test config and snakefile
manuelphilip Feb 1, 2023
4e1a913
updated .test raw fastq file path
manuelphilip Feb 2, 2023
5c5346e
Fix `batch_effect` to Batch_effect` in config.yaml
manuelphilip Feb 2, 2023
7287bf3
Fix bugs
manuelphilip Feb 2, 2023
38caa9c
updated main.yaml
manuelphilip Feb 2, 2023
edc8ddd
Fix go significant terms sorting
manuelphilip Feb 2, 2023
c92afc3
Added 3prime specific smk files/updated dependencies
manuelphilip Feb 7, 2023
0f67ecb
Merge branch 'main' into 3-prime-rna
manuelphilip Feb 7, 2023
6fac807
updated workflow wrappers and dependencies
manuelphilip Feb 9, 2023
3bd7b92
added `pre-define-genelist` in .test/3prime-config/config.yaml
manuelphilip Feb 9, 2023
b7fd3e7
added `pre-define-genelist` in .test/config.yaml
manuelphilip Feb 9, 2023
e37fed3
updated `.test/3-prime-config/config.yaml`
manuelphilip Feb 9, 2023
fab67c4
fix 3prime smk file bugs
manuelphilip Feb 9, 2023
24432fd
Merge branch '3-prime-rna' of github.com:snakemake-workflows/rna-seq-…
manuelphilip Feb 9, 2023
3721489
updated `go-enrichment-template.yaml` file
manuelphilip Feb 9, 2023
a555328
Merge branch 'main' into 3-prime-rna
manuelphilip Feb 10, 2023
d6023b9
Updated differential expression heatmap script and dependencies
manuelphilip Feb 13, 2023
b8bf9d1
updated `.test` folder config.yaml file
manuelphilip Feb 13, 2023
e8dee5e
Merge branch '3-prime-rna' of github.com:snakemake-workflows/rna-seq-…
manuelphilip Feb 13, 2023
f7d6438
fix formatting
manuelphilip Feb 13, 2023
ecafb97
Fix .test/config.yaml file
manuelphilip Feb 13, 2023
c7997b8
fix fgsea path in `.test/3-prime-config/`
manuelphilip Feb 13, 2023
dfd6b25
fix fgsea path in `.test/3-prime-config/`
manuelphilip Feb 14, 2023
78ea54b
Update config/config.yaml
manuelphilip Feb 14, 2023
119b2e1
Update workflow/rules/common.smk
manuelphilip Feb 14, 2023
b753f89
Update workflow/rules/common.smk
manuelphilip Feb 14, 2023
1dec9bb
Update workflow/rules/common.smk
manuelphilip Feb 14, 2023
39e32f0
Updated differential heatmap script and its dependencies
manuelphilip Feb 15, 2023
33b571f
fix formatting `common.smk`
manuelphilip Feb 15, 2023
5a3a3f7
Update workflow/report/plot-heatmap.rst
manuelphilip Feb 17, 2023
d48ad74
Update workflow/rules/diffexp.smk
manuelphilip Feb 17, 2023
0d60d7c
updated differential expression rule and script
manuelphilip Feb 21, 2023
887fcff
updated differential expression/spia script and dependencies
manuelphilip Feb 22, 2023
16f4439
updated main.yaml for free space
manuelphilip Feb 23, 2023
c41fc28
fix main.yaml
manuelphilip Feb 23, 2023
0b905dd
updated main.yaml
manuelphilip Feb 23, 2023
41573ff
added `remove-android` and `remove-haskell` to get more disk space
manuelphilip Feb 23, 2023
0c3e033
fix formatting main.yaml
manuelphilip Feb 23, 2023
23d521d
fix formatting `main.yaml`
manuelphilip Feb 23, 2023
ea92da8
update main.yaml
manuelphilip Feb 23, 2023
3b4cf95
update main.yaml
manuelphilip Feb 23, 2023
b7508fb
update main.yaml
manuelphilip Feb 23, 2023
67dea25
update main.yaml
manuelphilip Feb 23, 2023
f47d6b2
updated main.yaml
manuelphilip Feb 23, 2023
f6b10b8
updated main.yaml file
manuelphilip Feb 23, 2023
2cee69f
updated main.yaml
manuelphilip Feb 23, 2023
ce72da3
updated main.yaml
manuelphilip Feb 23, 2023
e0a565f
update `main.yaml` file
manuelphilip Mar 9, 2023
e6a6598
fix `main.yaml`file and `snakefile`
manuelphilip Mar 9, 2023
2b61454
Fix path for 3-prime `volcano_plot` in test path
manuelphilip Mar 9, 2023
f4cafd6
fix `main.yaml` file
manuelphilip Mar 9, 2023
f54c1d3
check space of `.test`
manuelphilip Mar 9, 2023
6db5cdb
check `.test`space
manuelphilip Mar 9, 2023
9ed3f07
updated `.test` folder space
manuelphilip Mar 9, 2023
68e60b4
check space in `.test`
manuelphilip Mar 9, 2023
f5ac8f4
updated `main.yaml` file
manuelphilip Mar 10, 2023
5a5c475
Updated `args` for number of cores due to lack of memory
manuelphilip Mar 10, 2023
caf308b
Split into individual jobs in `main.yaml`
manuelphilip Mar 10, 2023
68bd3c3
Fix `main.yaml`
manuelphilip Mar 10, 2023
a4fc8a0
Fix heatmap plot width in pdf
manuelphilip Mar 10, 2023
bff05ce
Fix heatmap width while plotting in pdf
manuelphilip Mar 10, 2023
e2471fd
updated config.yaml/diffexp.smk and dependencies based on the comments
Mar 14, 2023
0dd5406
updated snakemake report removed redundant files/labels defined.
Mar 22, 2023
4871dd6
updated label config
Mar 29, 2023
07f54cc
fix formatting
Mar 29, 2023
fb9dc11
Updated Volcano plot/enrichment and spia datavzrd tables
Apr 27, 2023
8639c25
Fix formatting diffexp.smk
Apr 27, 2023
764af83
fix spia output file
Apr 28, 2023
52f1dc2
Updated datavzrd version and its dependencies
May 5, 2023
a40a218
Fix diffexp.smk formatting
May 5, 2023
9a31873
updated `main.yaml`file
May 5, 2023
0c78cf6
Set pandas version in `QC.yaml` to 1
May 5, 2023
484eaf0
Fix `transcript-info.R`script for 3-prime-rna data and dependencies
Jun 1, 2023
3c4b6c9
Merge branch 'main' into 3-prime-rna
manuelphilip Jun 2, 2023
cec0380
Fixed Conflicts
manuelphilip Jun 2, 2023
c8d0d2b
updated `get-transcript-info.R` for redundancy and code format
manuelphilip Jun 6, 2023
b9439fd
Added else case if `mane_select` not present in `get-transcript-info.…
manuelphilip Jun 13, 2023
96c9747
Modified if else condition of `mane_select` in get-transcript-info.R
manuelphilip Jun 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions workflow/envs/biomart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ channels:
dependencies:
- bioconductor-biomart =2.46
- r-tidyverse =1.3
- r-dplyr =1.0.9
9 changes: 9 additions & 0 deletions workflow/resources/datavzrd/diffexp-template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,15 @@ views:
domain:
- 0.0
- 1.0
chromosome_name:
optional: true
display-mode: hidden
transcript_mane_select:
optional: true
display-mode: hidden
ensembl_transcript_id_version:
optional: true
display-mode: hidden
test_stat:
display-mode: hidden
rss:
Expand Down
2 changes: 1 addition & 1 deletion workflow/rules/quant_3prime.smk
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ if is_3prime_experiment:

rule kallisto_3prime_index:
input:
fasta="resources/transcriptome.3prime.fasta",
fasta="resources/transcriptome_clean.3prime.fasta",
output:
index="results/kallisto_3prime/transcripts.3prime.idx",
log:
Expand Down
1 change: 1 addition & 0 deletions workflow/rules/ref.smk
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ rule get_transcript_info:
params:
species=get_bioc_species_name(),
version=config["resources"]["ref"]["release"],
three_prime_activated=is_3prime_experiment,
log:
"logs/get_transcript_info.log",
conda:
Expand Down
132 changes: 76 additions & 56 deletions workflow/scripts/get-transcript-info.R
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
log <- file(snakemake@log[[1]], open="wt")
sink(log)
sink(log, type="message")
log <- file(snakemake@log[[1]], open="wt")
sink(log)
sink(log, type="message")

library("biomaRt")
library("tidyverse")
library("dplyr")

# this variable holds a mirror name until
# useEnsembl succeeds ("www" is last, because
# useEnsembl succeeds ("www" is last, because
# of very frequent "Internal Server Error"s)
mart <- "useast"
rounds <- 0
while ( class(mart)[[1]] != "Mart" ) {
while (class(mart)[[1]] != "Mart") {
mart <- tryCatch(
{
# done here, because error function does not
Expand Down Expand Up @@ -39,69 +40,88 @@ while ( class(mart)[[1]] != "Mart" ) {
}
# hop to next mirror
mart <- switch(mart,
useast = "uswest",
uswest = "asia",
asia = "www",
www = {
# wait before starting another round through the mirrors,
# hoping that intermittent problems disappear
Sys.sleep(30)
"useast"
}
)
useast = "uswest",
uswest = "asia",
asia = "www",
www = {
# wait before starting another round through the mirrors,
# hoping that intermittent problems disappear
Sys.sleep(30)
"useast"
}
)
}
)
}
three_prime_activated <- snakemake@params[["three_prime_activated"]]

attributes <- c("ensembl_transcript_id",
"ensembl_gene_id",
"external_gene_name",
"description")

has_canonical <- "transcript_is_canonical" %in% biomaRt::listAttributes(mart=mart)$name

if (has_canonical) {
attributes <- c(attributes, "transcript_is_canonical")
has_canonical <-
"transcript_is_canonical" %in% biomaRt::listAttributes(mart = mart)$name
#Check if three_prime_activated is activated or else if transcipts are cononical
if (has_canonical && three_prime_activated) {
attributes <- c(attributes, "transcript_is_canonical", "chromosome_name",
"transcript_mane_select", "ensembl_transcript_id_version")
}else if (has_canonical) {
attributes <- c(attributes, "transcript_is_canonical")
}

t2g <- biomaRt::getBM(
attributes = attributes,
mart = mart,
useCache = FALSE
)
if (!has_canonical) {
t2g <- t2g %>% add_column(transcript_is_canonical = NA)
attributes = attributes,
mart = mart,
useCache = FALSE
)
# Set columns as NA if three_prime_activated is set to false or if the transcipts are not canonical
if (!has_canonical || !three_prime_activated) {
t2g <- t2g %>% add_column(chromosome_name = NA, transcript_mane_select = NA,
ensembl_transcript_id_version = NA)
}else if (!has_canonical) {
t2g <- t2g %>% add_column(transcript_is_canonical = NA)
}

t2g <- t2g %>%
rename( target_id = ensembl_transcript_id,
ens_gene = ensembl_gene_id,
ext_gene = external_gene_name,
gene_desc = description,
canonical = transcript_is_canonical
) %>%
mutate_at(
vars(gene_desc),
function(values) { str_trim(map(values, function (v) { str_split(v, r"{\[}")[[1]][1]})) } # remove trailing source annotation (e.g. [Source:HGNC Symbol;Acc:HGNC:5])
) %>%
mutate_at(
vars(canonical),
function(values) {
as_vector(
map(
str_trim(values),
function(v) {
if (is.na(v)) {
NA
} else if (v == "1") {
TRUE
} else if (v == "0") {
FALSE
}
}
)
)
rename(
target_id = ensembl_transcript_id,
ens_gene = ensembl_gene_id,
ext_gene = external_gene_name,
gene_desc = description,
canonical = transcript_is_canonical,
chromosome_name = chromosome_name,
transcript_mane_select = transcript_mane_select,
ensembl_transcript_id_version = ensembl_transcript_id_version,
) %>%
mutate_at(
vars(gene_desc),
function(values) {
str_trim(map(values, function(v) {
str_split(v, r"{\[}")[[1]][1]
}))
} # remove trailing source annotation (e.g. [Source:HGNC Symbol;Acc:HGNC:5])
) %>%
mutate_at(
vars(canonical,),
function(values) {
as_vector(
map(
str_trim(values),
function(v) {
if (is.na(v)) {
NA
} else if (v == "1") {
TRUE
} else if (v == "0") {
FALSE
}
}
)

)
}
)
# Filter transcipts that are canonical, mane selected and filter chromosomes that are defined as "patch"
manuelphilip marked this conversation as resolved.
Show resolved Hide resolved
if (three_prime_activated && has_canonical) {
t2g <- t2g %>%
filter(!str_detect(chromosome_name, "patch|PATCH")) %>%
filter(str_detect(transcript_mane_select, ""))
}
johanneskoester marked this conversation as resolved.
Show resolved Hide resolved
write_rds(t2g, file = snakemake@output[[1]], compress = "gz")
Loading