improvements rnaseq pipeline #2

cokelaer · 2019-12-10T09:48:41Z

Future

Duplication statistics: high coverage or PCR duplicates ? Spread over the transcriptome or localized on a set of genes. How distributed at the gene scale ?
Add a column with list of genes corresponding to each GO term enriched (as present for KEGG)
lncRNA analysis
https://www.tandfonline.com/doi/full/10.1080/15476286.2021.1899673
CircRNA analysis
https://www.sciencedirect.com/science/article/pii/S1672022921000292
tRNA abundance/modifcation
https://www.sciencedirect.com/science/article/pii/S1097276521000484?via%3Dihub
Gene fusion detection
https://genome.cshlp.org/content/31/3/448.short?rss=1
WGCNA and meta analysis
https://journals.plos.org/ploscompbiol/article?id=10.1371%2Fjournal.pcbi.1008976&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+ploscompbiol%2FNewArticles+%28PLOS+Computational+Biology+-+New+Articles%29
Include String ?
https://string-db.org/

oct 2021

use new sequana-wrappers

Those requested features are for the rnadiff analysis, not sequana_rnaseq:

if possible, provide resuls w/wo independent filtering
we using --force (rnadiff), we should suppress previous DGE results otherwise they will be added to the HTML reports
design add column 'alias name'

April/May/June 2021

if pvalue == 0, should set a value so that it can be seen in volcano plot
fastp tool to complement existing cutadapt trimming tool
add html entry point for the enrichment (if several comparisons) or several enrichments
refactorise sequana enrichment maybe to have syntax such as sequana enrichment panther"

march 2021

better filtering for multiqc
main summary.html should have more features/summary/plots
check rnaseqc gtf input [catch missing GTF in the main.py and rnaseq.rules]. added a converter in sequana
- gtf input (from GFF) for the prokaryotes case
- gtf input (from GFF) for the eukaryotes case
salmon for eukaryotes tested on mm10
check rnaseqc multiqc module . no need for the biomics fork anymore.

Jan 2021

BUG fix switch mark duplicates correctly for the qc and others
Better GFF handling with custom gff able to handle several feature types, sanity checks of user's choice on attribute and feature
Checked rna_sqc functionality and provide a gff2gtf parser in sequana.

Dec 2020

Fix issue of seg fault for bacterial genomes with star aligner
fastq_screen should work now. The only contaminants looked for is the phix. Other genome should be handled by the users (meaning build the indexing); fastq_screen searches for phix is now the default behaviour since the code should work out of the box
fix missing workflow image in the report.
add strandness plot in ./outputs directory and add the image in the summary plot
bowtie1/star/bowtie2 indexing are now stored in their own sub-directories
provide way to disable rRNA search
fix issue related to star index rule bug in sequana
rnadiff option is now set automatically to one_factor
add option --run to execute the pipeline without manual checking (batch mode)

Oct-Nov 2020

star index we may have warning.
--genomeSAindexNbases 14 is too large for the genome size=4456448,
which may cause seg-fault at the mapping step. Re-run genome generation with
recommended --genomeSAindexNbases 10
a more generic title in the multiqc_config

Sept 2020

Add tolerance for feature_counts in the pipeline and config file after fixing sequana featurecounts functions (v0.9.17)

Aug 2020

do_indexing option is now pre-filled when instanciating the pipeline.
salmon option validateMappings is deprecated. to remove
salmon indexing included
refactorise the way feature counts are handled. Not in the onsuccess but a simpler code from @khourhin now included in sequana and this pipeline as of version 0.9.16 .

June/july 2020

Fix R1/R2 issue for rRNA
add mark duplicates in cluster config and set to False by default
add paired option for feature counts when paired data is provided.
add option to skip the fastqc on the raw data. This will be the default; The fastqc on the filtered data is kept by default.
cleanup the multiqc option to exclude fastqc_samples (to not clash with fastqc_filtered)

April-May 2020

if input genome size is >4billions Gb, the bowtie2 output extension are .bt2l (not .bt2) therefore, the sequana rule bowtie2_mapping should be updated and this pipeline as well.
add input to the rnadiff analysis in ./rnadiff
a faster --help option
a --from-project option to import existing pipeline
a HTML custom front page
add feature counts as a single file

Jan 2020 - April 2020

integrate the biomix scripts to make the link with the differential analysis
add feature counts in separate directory ready to use by rnadiff
integrate salmon

Dec 2019 - Jan 2020

cokelaer self-assigned this Dec 10, 2019

cokelaer assigned khourhin Feb 26, 2020

cokelaer changed the title ~~improvements~~ improvements rnaseq pipeline Feb 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improvements rnaseq pipeline #2

improvements rnaseq pipeline #2

cokelaer commented Dec 10, 2019 •

edited by khourhin

Loading

improvements rnaseq pipeline #2

improvements rnaseq pipeline #2

Comments

cokelaer commented Dec 10, 2019 • edited by khourhin Loading

Future

oct 2021

Those requested features are for the rnadiff analysis, not sequana_rnaseq:

April/May/June 2021

march 2021

Jan 2021

Dec 2020

Oct-Nov 2020

Sept 2020

Aug 2020

June/july 2020

April-May 2020

Jan 2020 - April 2020

Dec 2019 - Jan 2020

cokelaer commented Dec 10, 2019 •

edited by khourhin

Loading