Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions qiita_pet/handlers/artifact_handlers/base_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,7 @@ def artifact_summary_get_request(user, artifact_id):
'processing_parameters': proc_params.values,
'command_active': cmd.active,
'software_deprecated': sw.deprecated,
'software_description': sw.description
}
else:
processing_info = {}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,12 @@ def test_artifact_summary_get_request(self):
private_download_button % 2),
'processing_info': {
'command_active': True, 'software_deprecated': False,
'software_description': ('Quantitative Insights Into '
'Microbial Ecology (QIIME) is an '
'open-source bioinformatics '
'pipeline for performing '
'microbiome analysis from raw DNA '
'sequencing data'),
'command': 'Split libraries FASTQ',
'processing_parameters': {
'max_barcode_errors': '1.5', 'sequence_max_n': '0',
Expand Down
2 changes: 1 addition & 1 deletion qiita_pet/handlers/software.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ def _default_parameters_parsing(node):

workflows.append(
{'name': w.name, 'id': w.id, 'data_types': w.data_type,
'description': w.description,
'description': w.description, 'active': w.active,
'parameters_sample': wparams['sample'],
'parameters_prep': wparams['prep'],
'nodes': nodes, 'edges': edges})
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,12 @@ subsequent meta-analyses. We currently provide the several options for your conv
- auto-detect adapters and **rat** + phix filtering. Includes Norway rat (*Rattus norvegicus*) reference `GCF_000001895.5 (Rnor_6.0) <https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000001895.5/>`_. `GCF_000001895.5 fna <https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/895/GCA_000001895.4_Rnor_6.0/GCA_000001895.4_Rnor_6.0_genomic.fna.gz>`_
- auto-detect adapters only filtering. Only includes the two adapter sequences noted above.

For more information about the versions in this plugin, visit:

.. toctree::

qp-fastp-minimap2.rst

Note that the command produces up to 6 output artifacts based on the aligner and database selected:

- Alignment Profile: contains the raw alignment file and the no rank classification BIOM table
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Adapter and host filtering
==========================

At the end of August 2023, we discovered that the parameters used by
qp-fastp-minimap2 did not trigger application of adapter filtering. By default,
fastp performs autodetection of adapters and filtering for single-end data. By
default, fastp does not perform these operations on paired-end data. This behavior
was not expected by us. It was discovered when manually assessing replicated
sequences, which on examination by BLAST against NT reported to be adapters.

Adapter filtering for paired-end data with fastp requires specifying either the
exact adapters to remove (i.e., no autodetection), or to explicitly specify “--detect_adapter_for_pe”. Qiita previously indicated to users that the
qp-fastp-minimap2 plugin was performing adapter autodetection and filtering.
However, because this flag was not specified, that behavior did not occur.

In the metagenomic dataset the adapters were discovered in, we observed a few
sequences with high replication, with assignments to a few genomes in RS210.
The coverage of those genomes, using all metagenomic short reads, was constrained
to very specific regions. The replicated sequences exhibited high identity to
known adapters. As such, we suspect the replicated sequences we observed were
adapters. We suspect the observed genomes either suffer from adapter contamination
themselves, or the constructs used in the samples we examined were derived from
real organisms. Although we cannot differentiate this definitively in the data
we examined, in either case these short reads are likely artifactual.

For the dataset we examined, removal of these false positives was important
for the biological interpretation of the results. However, whether the removal
is important likely depends on the dataset and question.

qp-fastp-minimap2 has been updated to perform adapter filtering on paired-end data.
The fastp autodetection is compile-time limited to `the first 256k sequences <https://github.com/OpenGene/fastp/blob/7784d047fdf0a8df4211967156f5c97920c6d2e8/src/evaluator.cpp#L410-L417>`_.
Because of this, we opted for a more conservative approach of not relying on
autodetection and instead we now test all adapters that fastp is aware of. Specifically,
we now provide fastp a known adapters FASTA which is a serialized representation
of their `known adapter list <https://github.com/OpenGene/fastp/blob/7784d047fdf0a8df4211967156f5c97920c6d2e8/src/knownadapters.h#L11>`_.

The new command is named: `Adapter and host filtering v2023.12`.
1 change: 1 addition & 0 deletions qiita_pet/templates/artifact_ajax/artifact_summary.html
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ <h4>
{% if processing_info['software_deprecated'] %}
<div class="alert alert-danger" role="alert">
Danger, the software that generated this artifact was produced by a software version with a known bug and the results are wrong, please re-run with the newer version.
{% raw processing_info['software_description'] %}
</div>
{% elif not processing_info['command_active'] %}
<div class="alert alert-warning" role="alert">
Expand Down
5 changes: 5 additions & 0 deletions qiita_pet/templates/workflows.html
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,11 @@ <h5>Hover on the spheres to get more information</h5>
<div class="row">
<div class="col-sm-7" style="background-color: #DCDCDC; height: 650px" id="workflow_{{i}}"></div>
<div class="col-sm-5">
{% if not w['active'] %}
<h3 style="color:red">
~~ NOT ACTIVE ~~
</h3>
{% end %}
<h4>
Application: {{', '.join(w['data_types'])}} ->
{% if w['parameters_sample'] or w['parameters_prep'] %}
Expand Down
6 changes: 3 additions & 3 deletions qiita_pet/test/test_software.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ def test_retrive_workflows(self):
{'name': 'FASTQ upstream workflow', 'id': 1, 'data_types': ['16S', '18S'],
'description': 'This accepts html <a href="https://qiita.ucsd.edu">Qiita!'
'</a><br/><br/><b>BYE!</b>',
'parameters_sample': {}, 'parameters_prep': {},
'active': True, 'parameters_sample': {}, 'parameters_prep': {},
'nodes': [
['params_1', 1, 'Split libraries FASTQ', 'Defaults', {
'max_bad_run_length': '3', 'min_per_read_length_fraction': '0.75',
Expand All @@ -199,7 +199,7 @@ def test_retrive_workflows(self):
['params_2', 'output_params_2_OTU table | BIOM']]},
{'name': 'FASTA upstream workflow', 'id': 2, 'data_types': ['18S'],
'description': 'This is another description',
'parameters_sample': {}, 'parameters_prep': {},
'active': False, 'parameters_sample': {}, 'parameters_prep': {},
'nodes': [
['params_3', 2, 'Split libraries', 'Defaults with Golay 12 barcodes', {
'min_seq_len': '200', 'max_seq_len': '1000',
Expand All @@ -226,7 +226,7 @@ def test_retrive_workflows(self):
['params_4', 'output_params_4_OTU table | BIOM']]},
{'name': 'Per sample FASTQ upstream workflow', 'id': 3,
'data_types': ['ITS'], 'description': None,
'parameters_sample': {}, 'parameters_prep': {},
'active': True, 'parameters_sample': {}, 'parameters_prep': {},
'nodes': [
['params_5', 1, 'Split libraries FASTQ', 'per sample FASTQ defaults', {
'max_bad_run_length': '3', 'min_per_read_length_fraction': '0.75',
Expand Down