Skip to content

Commit

Permalink
desc for deprecated software (#3339)
Browse files Browse the repository at this point in the history
* adding the software desction to the GUI for deprecated software

* clean docs

* add workflow not active message

* Update qiita_pet/support_files/doc/source/processingdata/qp-fastp-minimap2.rst

Co-authored-by: Daniel McDonald <d3mcdonald@eng.ucsd.edu>

---------

Co-authored-by: Charles Cowart <42684307+charles-cowart@users.noreply.github.com>
Co-authored-by: Daniel McDonald <d3mcdonald@eng.ucsd.edu>
  • Loading branch information
3 people committed Dec 13, 2023
1 parent 35d9335 commit 30eb772
Show file tree
Hide file tree
Showing 8 changed files with 60 additions and 4 deletions.
1 change: 1 addition & 0 deletions qiita_pet/handlers/artifact_handlers/base_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,7 @@ def artifact_summary_get_request(user, artifact_id):
'processing_parameters': proc_params.values,
'command_active': cmd.active,
'software_deprecated': sw.deprecated,
'software_description': sw.description
}
else:
processing_info = {}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,12 @@ def test_artifact_summary_get_request(self):
private_download_button % 2),
'processing_info': {
'command_active': True, 'software_deprecated': False,
'software_description': ('Quantitative Insights Into '
'Microbial Ecology (QIIME) is an '
'open-source bioinformatics '
'pipeline for performing '
'microbiome analysis from raw DNA '
'sequencing data'),
'command': 'Split libraries FASTQ',
'processing_parameters': {
'max_barcode_errors': '1.5', 'sequence_max_n': '0',
Expand Down
2 changes: 1 addition & 1 deletion qiita_pet/handlers/software.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ def _default_parameters_parsing(node):

workflows.append(
{'name': w.name, 'id': w.id, 'data_types': w.data_type,
'description': w.description,
'description': w.description, 'active': w.active,
'parameters_sample': wparams['sample'],
'parameters_prep': wparams['prep'],
'nodes': nodes, 'edges': edges})
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,12 @@ subsequent meta-analyses. We currently provide the several options for your conv
- auto-detect adapters and **rat** + phix filtering. Includes Norway rat (*Rattus norvegicus*) reference `GCF_000001895.5 (Rnor_6.0) <https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000001895.5/>`_. `GCF_000001895.5 fna <https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/895/GCA_000001895.4_Rnor_6.0/GCA_000001895.4_Rnor_6.0_genomic.fna.gz>`_
- auto-detect adapters only filtering. Only includes the two adapter sequences noted above.

For more information about the versions in this plugin, visit:

.. toctree::

qp-fastp-minimap2.rst

Note that the command produces up to 6 output artifacts based on the aligner and database selected:

- Alignment Profile: contains the raw alignment file and the no rank classification BIOM table
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Adapter and host filtering
==========================

At the end of August 2023, we discovered that the parameters used by
qp-fastp-minimap2 did not trigger application of adapter filtering. By default,
fastp performs autodetection of adapters and filtering for single-end data. By
default, fastp does not perform these operations on paired-end data. This behavior
was not expected by us. It was discovered when manually assessing replicated
sequences, which on examination by BLAST against NT reported to be adapters.

Adapter filtering for paired-end data with fastp requires specifying either the
exact adapters to remove (i.e., no autodetection), or to explicitly specify “--detect_adapter_for_pe”. Qiita previously indicated to users that the
qp-fastp-minimap2 plugin was performing adapter autodetection and filtering.
However, because this flag was not specified, that behavior did not occur.

In the metagenomic dataset the adapters were discovered in, we observed a few
sequences with high replication, with assignments to a few genomes in RS210.
The coverage of those genomes, using all metagenomic short reads, was constrained
to very specific regions. The replicated sequences exhibited high identity to
known adapters. As such, we suspect the replicated sequences we observed were
adapters. We suspect the observed genomes either suffer from adapter contamination
themselves, or the constructs used in the samples we examined were derived from
real organisms. Although we cannot differentiate this definitively in the data
we examined, in either case these short reads are likely artifactual.

For the dataset we examined, removal of these false positives was important
for the biological interpretation of the results. However, whether the removal
is important likely depends on the dataset and question.

qp-fastp-minimap2 has been updated to perform adapter filtering on paired-end data.
The fastp autodetection is compile-time limited to `the first 256k sequences <https://github.com/OpenGene/fastp/blob/7784d047fdf0a8df4211967156f5c97920c6d2e8/src/evaluator.cpp#L410-L417>`_.
Because of this, we opted for a more conservative approach of not relying on
autodetection and instead we now test all adapters that fastp is aware of. Specifically,
we now provide fastp a known adapters FASTA which is a serialized representation
of their `known adapter list <https://github.com/OpenGene/fastp/blob/7784d047fdf0a8df4211967156f5c97920c6d2e8/src/knownadapters.h#L11>`_.

The new command is named: `Adapter and host filtering v2023.12`.
1 change: 1 addition & 0 deletions qiita_pet/templates/artifact_ajax/artifact_summary.html
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ <h4>
{% if processing_info['software_deprecated'] %}
<div class="alert alert-danger" role="alert">
Danger, the software that generated this artifact was produced by a software version with a known bug and the results are wrong, please re-run with the newer version.
{% raw processing_info['software_description'] %}
</div>
{% elif not processing_info['command_active'] %}
<div class="alert alert-warning" role="alert">
Expand Down
5 changes: 5 additions & 0 deletions qiita_pet/templates/workflows.html
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,11 @@ <h5>Hover on the spheres to get more information</h5>
<div class="row">
<div class="col-sm-7" style="background-color: #DCDCDC; height: 650px" id="workflow_{{i}}"></div>
<div class="col-sm-5">
{% if not w['active'] %}
<h3 style="color:red">
~~ NOT ACTIVE ~~
</h3>
{% end %}
<h4>
Application: {{', '.join(w['data_types'])}} ->
{% if w['parameters_sample'] or w['parameters_prep'] %}
Expand Down
6 changes: 3 additions & 3 deletions qiita_pet/test/test_software.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ def test_retrive_workflows(self):
{'name': 'FASTQ upstream workflow', 'id': 1, 'data_types': ['16S', '18S'],
'description': 'This accepts html <a href="https://qiita.ucsd.edu">Qiita!'
'</a><br/><br/><b>BYE!</b>',
'parameters_sample': {}, 'parameters_prep': {},
'active': True, 'parameters_sample': {}, 'parameters_prep': {},
'nodes': [
['params_1', 1, 'Split libraries FASTQ', 'Defaults', {
'max_bad_run_length': '3', 'min_per_read_length_fraction': '0.75',
Expand All @@ -199,7 +199,7 @@ def test_retrive_workflows(self):
['params_2', 'output_params_2_OTU table | BIOM']]},
{'name': 'FASTA upstream workflow', 'id': 2, 'data_types': ['18S'],
'description': 'This is another description',
'parameters_sample': {}, 'parameters_prep': {},
'active': False, 'parameters_sample': {}, 'parameters_prep': {},
'nodes': [
['params_3', 2, 'Split libraries', 'Defaults with Golay 12 barcodes', {
'min_seq_len': '200', 'max_seq_len': '1000',
Expand All @@ -226,7 +226,7 @@ def test_retrive_workflows(self):
['params_4', 'output_params_4_OTU table | BIOM']]},
{'name': 'Per sample FASTQ upstream workflow', 'id': 3,
'data_types': ['ITS'], 'description': None,
'parameters_sample': {}, 'parameters_prep': {},
'active': True, 'parameters_sample': {}, 'parameters_prep': {},
'nodes': [
['params_5', 1, 'Split libraries FASTQ', 'per sample FASTQ defaults', {
'max_bad_run_length': '3', 'min_per_read_length_fraction': '0.75',
Expand Down

0 comments on commit 30eb772

Please sign in to comment.