Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvements rnaseq pipeline #2

Open
57 of 65 tasks
cokelaer opened this issue Dec 10, 2019 · 0 comments
Open
57 of 65 tasks

improvements rnaseq pipeline #2

cokelaer opened this issue Dec 10, 2019 · 0 comments
Assignees

Comments

@cokelaer
Copy link
Contributor

cokelaer commented Dec 10, 2019

Future

oct 2021

  • use new sequana-wrappers

Those requested features are for the rnadiff analysis, not sequana_rnaseq:

  • if possible, provide resuls w/wo independent filtering
  • we using --force (rnadiff), we should suppress previous DGE results otherwise they will be added to the HTML reports
  • design add column 'alias name'

April/May/June 2021

  • if pvalue == 0, should set a value so that it can be seen in volcano plot
  • fastp tool to complement existing cutadapt trimming tool
  • add html entry point for the enrichment (if several comparisons) or several enrichments
  • refactorise sequana enrichment maybe to have syntax such as sequana enrichment panther"

march 2021

  • better filtering for multiqc
  • main summary.html should have more features/summary/plots
  • check rnaseqc gtf input [catch missing GTF in the main.py and rnaseq.rules]. added a converter in sequana
    • gtf input (from GFF) for the prokaryotes case
    • gtf input (from GFF) for the eukaryotes case
  • salmon for eukaryotes tested on mm10
  • check rnaseqc multiqc module . no need for the biomics fork anymore.

Jan 2021

  • BUG fix switch mark duplicates correctly for the qc and others
  • Better GFF handling with custom gff able to handle several feature types, sanity checks of user's choice on attribute and feature
  • Checked rna_sqc functionality and provide a gff2gtf parser in sequana.

Dec 2020

  • Fix issue of seg fault for bacterial genomes with star aligner
  • fastq_screen should work now. The only contaminants looked for is the phix. Other genome should be handled by the users (meaning build the indexing); fastq_screen searches for phix is now the default behaviour since the code should work out of the box
  • fix missing workflow image in the report.
  • add strandness plot in ./outputs directory and add the image in the summary plot
  • bowtie1/star/bowtie2 indexing are now stored in their own sub-directories
  • provide way to disable rRNA search
  • fix issue related to star index rule bug in sequana
  • rnadiff option is now set automatically to one_factor
  • add option --run to execute the pipeline without manual checking (batch mode)

Oct-Nov 2020

  • star index we may have warning.
    --genomeSAindexNbases 14 is too large for the genome size=4456448,
    which may cause seg-fault at the mapping step. Re-run genome generation with
    recommended --genomeSAindexNbases 10
  • a more generic title in the multiqc_config

Sept 2020

  • Add tolerance for feature_counts in the pipeline and config file after fixing sequana featurecounts functions (v0.9.17)

Aug 2020

  • do_indexing option is now pre-filled when instanciating the pipeline.
  • salmon option validateMappings is deprecated. to remove
  • salmon indexing included
  • refactorise the way feature counts are handled. Not in the onsuccess but a simpler code from @khourhin now included in sequana and this pipeline as of version 0.9.16 .

June/july 2020

  • Fix R1/R2 issue for rRNA
  • add mark duplicates in cluster config and set to False by default
  • add paired option for feature counts when paired data is provided.
  • add option to skip the fastqc on the raw data. This will be the default; The fastqc on the filtered data is kept by default.
  • cleanup the multiqc option to exclude fastqc_samples (to not clash with fastqc_filtered)

April-May 2020

  • if input genome size is >4billions Gb, the bowtie2 output extension are .bt2l (not .bt2) therefore, the sequana rule bowtie2_mapping should be updated and this pipeline as well.
  • add input to the rnadiff analysis in ./rnadiff
  • a faster --help option
  • a --from-project option to import existing pipeline
  • a HTML custom front page
  • add feature counts as a single file

Jan 2020 - April 2020

  • integrate the biomix scripts to make the link with the differential analysis
  • add feature counts in separate directory ready to use by rnadiff
  • integrate salmon

Dec 2019 - Jan 2020

  • fix the RNAseQC rule, which is brojen at the moment
  • check for rRNA feature name presence in the GFF
  • check for feature count type provide by the user
  • check config with schema
  • fix read tag
  • possiblity to switch off cutadapt
  • fixing the bowtie2 config/pipeline conflict name (see explanation of the naming convention in the config and pipeline when using bowtie2_mapping rule #3)
  • Fixing indexing issue: indexing is done even though not asked for or vice versa: when we set indexing to False, the pipeline fails with crypting message. We will provide a better handling of checking whether or not indexing is done.
  • include the schema file
  • parameter output-directory should be renamed output_directory in the multiqc section
  • handle the stdout correctly inb the fastqc rule, bowtie2, bowtie1
  • allow rRNA feature and/or files with meaningful error message if the 2 options conflict
  • better multiconfig report (text/title)
@cokelaer cokelaer self-assigned this Dec 10, 2019
@cokelaer cokelaer changed the title improvements improvements rnaseq pipeline Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants