Skip to content

Commit

Permalink
Merge e07d915 into 35d9335
Browse files Browse the repository at this point in the history
  • Loading branch information
antgonza committed Dec 13, 2023
2 parents 35d9335 + e07d915 commit a076c5a
Show file tree
Hide file tree
Showing 5 changed files with 51 additions and 0 deletions.
1 change: 1 addition & 0 deletions qiita_pet/handlers/artifact_handlers/base_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,7 @@ def artifact_summary_get_request(user, artifact_id):
'processing_parameters': proc_params.values,
'command_active': cmd.active,
'software_deprecated': sw.deprecated,
'software_description': sw.description
}
else:
processing_info = {}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,12 @@ def test_artifact_summary_get_request(self):
private_download_button % 2),
'processing_info': {
'command_active': True, 'software_deprecated': False,
'software_description': ('Quantitative Insights Into '
'Microbial Ecology (QIIME) is an '
'open-source bioinformatics '
'pipeline for performing '
'microbiome analysis from raw DNA '
'sequencing data'),
'command': 'Split libraries FASTQ',
'processing_parameters': {
'max_barcode_errors': '1.5', 'sequence_max_n': '0',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,12 @@ subsequent meta-analyses. We currently provide the several options for your conv
- auto-detect adapters and **rat** + phix filtering. Includes Norway rat (*Rattus norvegicus*) reference `GCF_000001895.5 (Rnor_6.0) <https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000001895.5/>`_. `GCF_000001895.5 fna <https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/895/GCA_000001895.4_Rnor_6.0/GCA_000001895.4_Rnor_6.0_genomic.fna.gz>`_
- auto-detect adapters only filtering. Only includes the two adapter sequences noted above.

For more information about the versions in this plugin, visit:

.. toctree::

qp-fastp-minimap2.rst

Note that the command produces up to 6 output artifacts based on the aligner and database selected:

- Alignment Profile: contains the raw alignment file and the no rank classification BIOM table
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Adapter and host filtering
==========================

At the end of August 2023, we discovered that the parameters used by
qp-fastp-minimap2 did not trigger application of adapter filtering. By default,
fastp performs autodetection of adapters and filtering for single-end data. By
default, fastp does not perform these operations on paired-end data. This behavior
was not expected by us. It was discovered when manually assessing replicated
sequences, which on examination by BLAST against NT reported to be adapters.

Adapter filtering for paired-end data with fastp requires specifying either the
exact adapters to remove (i.e., no autodetection), or to explicitly specify “--detect_adapter_for_pe”. Qiita previously indicated to users that the
qp-fastp-minimap2 plugin was performing adapter autodetection and filtering.
However, because this flag was not specified, that behavior did not occur.

In the metagenomic dataset the adapters were discovered in, we observed a few
sequences with high replication, which assignments to a few genomes in RS210.
The coverage of those genomes, using all metagenomic short reads, was constrained
to very specific regions. The replicated sequences exhibited high identity to
known adapters. As such, we suspect the replicated sequences we observed were
adapters. We suspect the observed genomes either suffer from adapter contamination
themselves, or the constructs used in the samples we examined were derived from
real organisms. Although we cannot differentiate this definitively in the data
we examined, in either case these short reads are likely artifactual.

For the dataset we examined, removal of these false positives was important
for the biological interpretation of the results. However, whether the removal
is important likely depends on the dataset and question.

qp-fastp-minimap2 has been updated to perform adapter filtering on paired-end data.
The fastp autodetection is compile-time limited to `the first 256k sequences <https://github.com/OpenGene/fastp/blob/7784d047fdf0a8df4211967156f5c97920c6d2e8/src/evaluator.cpp#L410-L417>`_.
Because of this, we opted for a more conservative approach of not relying on
autodetection and instead we now test all adapters that fastp is aware of. Specifically,
we now provide fastp a known adapters FASTA which is a serialized representation
of their `known adapter list <https://github.com/OpenGene/fastp/blob/7784d047fdf0a8df4211967156f5c97920c6d2e8/src/knownadapters.h#L11>`_.

The new command is named: `Adapter and host filtering v2023.12`.
1 change: 1 addition & 0 deletions qiita_pet/templates/artifact_ajax/artifact_summary.html
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ <h4>
{% if processing_info['software_deprecated'] %}
<div class="alert alert-danger" role="alert">
Danger, the software that generated this artifact was produced by a software version with a known bug and the results are wrong, please re-run with the newer version.
{% raw processing_info['software_description'] %}
</div>
{% elif not processing_info['command_active'] %}
<div class="alert alert-warning" role="alert">
Expand Down

0 comments on commit a076c5a

Please sign in to comment.