Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pangolin: Add database usage options #3620

Merged
merged 8 commits into from
Apr 25, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions tools/pangolin/fetch_latest_pangolearn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/usr/bin/env python

import json
import os
import tarfile

# rely on the fact that pangolin itself uses the requests module
import requests

response = requests.get(
"https://api.github.com/repos/cov-lineages/pangoLEARN/releases/latest"
)
if response.status_code == 200:
details = json.loads(response.text)
response = requests.get(details["tarball_url"])
if response.status_code == 200:
with open("pangolearn.tgz", "wb") as handle:
handle.write(response.content)
tf = tarfile.open("pangolearn.tgz")
pl_path = tf.next().name
tf.extractall()
tf.close()
os.rename(os.path.join(pl_path, "pangoLEARN"), "datadir")
else:
response.raise_for_status()
else:
response.raise_for_status()
67 changes: 65 additions & 2 deletions tools/pangolin/pangolin.xml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,16 @@
<requirement type="package" version="0.22.0">csvtk</requirement>
</requirements>
<command detect_errors="exit_code"><![CDATA[
#if str($db.source) == "download"
python '$__tool_directory__/fetch_latest_pangolearn.py' &&
#else if str($db.source) == "builtin"
ln -s $db.db_release.fields.path datadir &&
#end if
pangolin
--threads \${GALAXY_SLOTS:-1}
#if str($db.source) == "download" or str($db.source) == "builtin"
--datadir 'datadir'
#end if
$alignment
--outfile report.csv
--max-ambig $max_ambig
Expand All @@ -28,6 +36,27 @@
value="0.5" min="0" max="1" help="Maximum proportion of Ns allowed for pangolin to attempt assignment" />
<param argument="--min-length" type="integer" label="Minimum query length allowed"
value="10000" min="0" help="Minimum query length allowed for pangolin to attempt assignment"/>
<conditional name="db">
<param type="select" name="source" label="pangoLEARN source" help="Where to find the pangoLEARN database">
<option value="download">Download latest from web</option>
<option value="builtin">Use database from Galaxy server</option>
<option value="default">Use default database built in to pangolin (not recommended)</option>
</param>
<when value="download">
</when>
<when value="builtin">
<param name="db_release" label="pangoLEARN release" type="select">
<options from_data_table="pangolearn">
<column name="value" index="0" />
<column name="name" index="1" />
<column name="path" index="3" />
<filter type="sort_by" column="0"/>
</options>
</param>
</when>
<when value="default">
</when>
</conditional>
</inputs>
<outputs>
<data name="output1" format="tabular" label="pangolin on ${on_string}">
Expand All @@ -42,14 +71,41 @@
<tests>
<test expect_num_outputs="1">
<param name="input1" value="test1.fasta"/>
<output name="output1" file="result1.tsv" ftype="tabular" />
<conditional name="db">
<param name="source" value="download" />
</conditional>
<output name="output1">
<assert_contents>
<has_text text="B.1.1" />
<has_text text="passed_qc" />
</assert_contents>
</output>
</test>
<test expect_num_outputs="2">
<param name="alignment" value="--alignment" />
<param name="input1" value="test1.fasta" />
<output name="output1" file="result1.tsv" ftype="tabular" />
<conditional name="db">
<param name="source" value="download" />
</conditional>
<output name="output1">
<assert_contents>
<has_text text="B.1.1" />
<has_text text="passed_qc" />
</assert_contents>
</output>
<output name="align1" file="aln1.fasta" ftype="fasta" />
</test>
<test expect_num_outputs="1">
<param name="input1" value="test1.fasta"/>
<conditional name="db">
<param name="source" value="builtin" />
</conditional>
<output name="output1">
<assert_contents>
<has_text text="2021-04-21" />
</assert_contents>
</output>
</test>
</tests>
<help><![CDATA[

Expand All @@ -58,6 +114,13 @@
`Pangolin <https://cov-lineages.org/pangolin.html>`_ (Phylogenetic Assignment of Named Global Outbreak LINeages)
is used to assign a SARS-CoV-2 genome sequence the most likely lineage based on the PANGO nomenclature system.

Pangolin uses the `pangoLEARN <https://github.com/cov-lineages/pangoLEARN>`_ stored model for lineage assignment. This
model is updated more frequently than the pangolin tool is. In general one should use the most recent model for lineage
assignment, and the default option for this tool is to download the latest version of the model before the pangolin
tool runs. A pangoLEARN data manager exists so that the Galaxy admin can download specific versions of the pangoLEARN
model as required. Finally the pangolin tool can use its default built-in model, but this is **not recommended** as the
default model rapidly becomes out of date.

]]></help>
<citations>
<citation type="bibtex">
Expand Down
2 changes: 2 additions & 0 deletions tools/pangolin/test-data/2021-04-23/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
_program = "pangoLEARN"
__version__ = "2021-04-21"
Binary file not shown.
Binary file not shown.
13,285 changes: 13,285 additions & 0 deletions tools/pangolin/test-data/2021-04-23/data/decision_tree_rules.txt

Large diffs are not rendered by default.

Loading