Skip to content

Releases: iquasere/MOSCA

Merging of paired reads when no assembly is performed

30 Jan 09:36
Compare
Choose a tag to compare

MOSCA was calling genes directly from the preprocessed reads.
Now, it merges paired-end reads first, and then calls the genes on those reads.
When gene calling, MOSCA still considers the data as reads (-complete=0), not complete genomes (-complete=1).

Update on sortmerna functions

SortMeRNA databases have been updated, and are now provided as a tar file multiple database files. Each of these databases can be used separately for a specific type of search. MOSCA now provides the sortmerna_database parameter, which sets which database will be used:

  • if fast, MOSCA will use the smr_v4.3_fast_db.fasta database.
  • if default, MOSCA will use the smr_v4.3_ default_db.fasta database.
  • if sensitive, MOSCA will use the smr_v4.3_sensitive_db.fasta database.
  • if sensitive_with_rfam, MOSCA will use the smr_v4.3_sensitive_db_rfam_seeds.fasta database.

Only one database file can be used at a time.

minimum_read_length parameter split for MG and MT

Now, minimum length of reads for further analysis can be set with the minimum_mg_read_length and minimum_mt_read_length parameters.

Added minimum_envs folder and contents

For commands and resources to update envs when needed

Also, some fixes

  • Converting readcounts (for MG and MT) to int was turning them all to zeros (because they are normalized). MOSCA now keeps them as float.
  • Blocked the print of MOSCA's TXT logo. Don't know why it doesn't work on the tests.
  • Fix on Summary Report, now rows have information for both "Name" and "Sample" levels (before, there were rows for "Name" and rows for "Sample").
  • Another fix on Summary Report, counting annotated genes was not done properly.
  • When not performing assembly, General Report was not importing correctly the readcounts. Now, it does.

Added default parameters JSON

15 Jan 10:32
Compare
Choose a tag to compare

I hadn't updated MOSCA's recipe in Bioconda to include the new default_config.json file. This release has no code updates, but serves to include the file in MOSCA's recipe.

Default parameters, input sanitization and final reports updates

04 Jan 10:31
Compare
Choose a tag to compare

MOSCA now has default parameters

These default parameters are set by the default_config.json file.

Input quality checking

Implemented checking of invalid names in experiments - names can't start with number, even a float (e.g., 5AA or .5Name).

Updates on final reports

Renamed Protein report to General report.
New report - Expression. This report includes only genes expressed.
Technical report was renamed to Versions. It is also exported as EXCEL now, because it brings information on every environment.

Implemented minimum value imputation

For MP analysis, but it's still not an option to use. For now, is a feature in preparation.

No more build_deps in Dockerfile

It's no longer needed, conda handles it all.

Dependencies update

  • Fixed snakemake version to <8 - some of its new functionality is incompatible with MOSCA implementation.
  • Added pandas as dependency - mosca.py now has functions that require it.
  • Updated to newest versions of UPIMAPI, reCOGnizer and KEGGCharter - allowed to remove the parameters related to database download.

Blocked MGMT test

Because GitHub actions doesn't provide enough disk space for it.

Also, several fixes

  • Fix on DE handling multiple samples
  • Fix on KEGGCharter handling multiple samples
  • Fix on multi_sheet excel handling multiple samples and numbering
  • Fix on converting RAW spectra to MGF outside a container environment
  • MOSCA now prints snakemake command properly
  • Fix on adding normalized matrices to entry report
  • Several fixes on summary report
  • Necessary reparations on EC numbers and KEGG IDs, as those come from UPIMAPI in non-compatible format for KEGGCharter
  • Fix on inputting mods to generate_parameters_file function

Reintroduction of MOSCA into Bioconda

28 Apr 14:39
Compare
Choose a tag to compare

Reintroduction of MOSCA into Bioconda

Since MOSCA 1.3.6, the list of dependencies of the pipeline has become too complex for conda to manage.
This release makes use of snakemake environments to simplify the minimal environment required to install MOSCA. MOSCA ow only requires snakemake.

Now MOSCA uses snakemake's rules

All the rules have been moved to corresponding .smk files. This has simplified a lot the main script.
Script files can no longer be run through the command line, however. Interface is with snakemake directly.
First step into producing a web-service.

Added schema for validating config.json

config.schema.yaml checks if all needed informations are present, and in correct format, on the input config file.

New parameter

metaproteomics_add_reference_proteomes: New option for not searching for reference proteomes for organisms identified. Helps save a lot of time during Peptide-to-Spectrum Matching.

Tests have been reformatted

Complete MGMP has been reintroduced, however, it still fails for too much disk usage. It'll be a problem for another time.

Several fixes and improvements

params.method was not being correctly read on de_analysis.R.
config.json is now explicitly required.
tmp directory when handling SortMeRNA is created inside SortMeRNA output directory.
Removed pandas warnings concerning reading files without low_memory=False.
Memory allocated in metaproteomics now in G instead of M.
Removed UPIMAPI apt dependencies - are no longer needed.
Fix on reading method for normalization.
Fix on parsing conditions in de_analysis.R.

Metaproteogenomics - a new level of omics analyses

16 Jan 22:51
Compare
Choose a tag to compare

New workflow of metaproteomics analyses, based on metagenomics (MG) results.

This new layer of analysis allows to input spectra - both in raw and standard formats - to MOSCA for metaproteomics (MP) analysis

MOSCA's MP workflow is as follows:

1. Database construction

A database is built from MG results, aiming to include all sequences that can possibly be in the datasets. This include:

  • the genes identified by FragGeneScan on the MG gene calling step
  • reference proteomes retrieved from UniProt of the taxa identified in the annotation step with UPIMAPI
  • the cRAP database
  • the protease sequence - only automatically available sequence is Trypsin for now, all others must be inputted manually

This database will then be submitted for a first round of Peptide-to-Spectrum matching with SearchCLI and PeptideShaker. All proteins with at least one Peptide-to-Spectrum match (PSM) are collected for the final database - the metaproteogenomics database.

2. Peptide-to-Spectrum matching

SearchCLI is used for obtaining PSMs from inputted spectra, using as reference the database constructed in the previous step. SearchCLI is used with three search engines - X!Tandem, MyriMatch and MS-GF+. More engines might be added in the future.

3. Protein inference

PeptideShaker is used for protein inference and quantification, based on spectracounts. PSMs are selected at a 5 % local False Discovery Rate, and only peptides with two or more PSMs and only proteins with two or more peptides identified are selected for further analysis

4. Normalization, imputation and differential protein expression analysis

Spectracounts are normalized with Variance Stabilizing Normalization. Missing values are imputed using Local Least Squares Imputation.

Normalized and imputed spectracounts are then submitted for differential protein expression analysis with Reproducibility-Optimized Test Statistics. Log2foldchange and p-values are retrieved for reporting.

5. Metabolic pathway representation and final reportings

All following steps are performed as close as possible to metatranscriptomics (gene expression) analysis.

Metabolic maps are built with KEGGCharter, showing protein expression levels from MP and genomic potential from MG.

Final reports include all results from MG, and report on differential expression analysis of proteins.

Other updates

MOSCA has increased its workflow in around 40 %.

MOSCA is now compatible with the six months old updates of UniProt, through UPIMAPI. It includes the parsing of taxonomic columns, to continue representing taxonomic kronas.

Snakemake conda environments are now used, instead of one single environment. This has made possible again to build MOSCA's environments, and may signal the return of MOSCA to Bioconda.

Re-added KEGGCharter to workflow

31 May 17:47
Compare
Choose a tag to compare

KEGGCharter is again run from "MOSCA_Entry_Report".
Changed its output filename in the rule because the tool now only outputs in TSV.

Also some fixes in environment.yml

  • fixed perl version
  • added subversion

Stand-alone metatranscriptomics worflow implemented

24 May 10:21
Compare
Choose a tag to compare

Metatranscriptomics can be used as reference without metagenomics

  • If MG is not inputted, MT will be used for the MG part of MOSCA's workflow - assembly, binning, gene calling and annotation.
  • Trinity and RNAspades now available as assembler options
  • rule join_reads now considers possibility of MT as reference

Changes in config.json

  • experiments.tsv integrated into config.json as a parameter (list of dictionaries)
  • adapted config.json column names to MOSGUITO
  • New parameter - "suffix"
    • This parameter allows to specify a suffix to follow the _R1/_R2 special characters in files names, MOSCA will consider that those characters are followed by the "suffix" (e.g., _L001 would serve for the files mg_R1_L001.fq and mg_R2_L001.fq)

Adaptations for new versions of tools

  • SortMeRNA 4 fully implemented
  • Always gzips SortMeRNA output
  • UPIMAPI used directly instead of DIAMOND
    • MOSCA now accepts UPIMAPI's three options for database: "taxids", "uniprot" or "swissprot"
  • Small adjustment on CI to allow running reCOGnizer with mini cdd.tar.gz
  • Fixed krona version (to 2.5) for compatibility with MaxBin2 - MaxBin2 dependencies are presenting problems for higher versions, and krona's more recent versions would force to install those damaged dependencies

Added technical files, removed old scripts

  • added .gitignore
  • join_information.py deprecated, replaced by mosca_tools functions and rules in Snakefile

Changes in environment and CI files

  • install.bash no longer installs mamba
  • added gmcloser to environment.yml
  • added simplified cdd.tar.gz for CI
  • added test for complete workflow of MOSCA
  • new default for max-ref-number with metaquast - is now 0 to allow running CI

Miscellaneous fixes

  • fix on snakefile - checks if "Name" in "experiments" is ""
  • bins and DE results go to the folders of their respective "samples"
  • several fixes on reporting
  • fix on alignment functions in mosca_tools.py
  • fix on de_analysis.R
  • fix on obtaining directories for Illumina adapters and rRNA databases on preprocessing step

Fixed high quality bins evaluation

03 Aug 09:33
Compare
Choose a tag to compare

MOSCA was evaluating wrongly the high quality bins.

Best probability threshold is now written at the end of iterative binning.

Assigned minus 1 thread in Snakefile for quantification rule.

  • Allows upimapi to run simultaneously.

metaSPAdes upped to version 3.15 to not run out of memory.

Fixed some bugs in name assignment.

Iterative binning for best binning

26 Jul 16:42
Compare
Choose a tag to compare

do_iterative_binning option now available!

  • Iterative binning cycles between MaxBin and CheckM - MaxBin obtains the bins, CheckM checks their quality
  • Iterative binning cycles by many probability thresholds to determine the value for the best binning

New option for differential expression - minimum_fold_change!

  • Determine padj for up or down expression, instead of just 0 difference

Can now be installed from source code

14 Jun 09:10
Compare
Choose a tag to compare

Automatic setup from source code is now functional, and suggested installation method is through the bash script.