New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement shed_diff. #33

Merged
merged 5 commits into from Dec 6, 2014

Conversation

Projects
None yet
1 participant
@jmchilton
Copy link
Member

jmchilton commented Dec 5, 2014

Inspired by script from @peterjc - https://gist.github.com/peterjc/13653e6907d75c470d01.

By default compares the local changes against the main Tool Shed repository defined by [.][tool][_]shed.yml, but with command line options can be made to do all sorts of comparisons. Some of these are demonstrated below:

Default against main tool shed:

% planemo shed_diff
wget -q --recursive -O - 'https://toolshed.g2.bx.psu.edu/repository/download?repository_id=b6b97c236de89252&changeset_revision=default&file_type=gz'; | tar -xzf - -C /tmp/tool_shed_diff_CuRq5U/_toolshed_ --strip-components 1
mkdir "/tmp/tool_shed_diff_CuRq5U/_local_"; tar -xzf "/tmp/tmpdVW07c" -C "/tmp/tool_shed_diff_CuRq5U/_local_"; rm -rf /tmp/tmpdVW07c
cd "/tmp/tool_shed_diff_CuRq5U"; diff -r _local_ _toolshed_
diff -r _local_/count_covariates.xml _toolshed_/count_covariates.xml
7d6
<    <version_command>echo "A REALLY OLD OPEN SOURCE VERSION OF GATK"</version_command>
diff -r _local_/tool_dependencies.xml _toolshed_/tool_dependencies.xml
4c4
<       <repository name="package_gatk_1_4" owner="devteam" prior_installation_required="False" />

---
>       <repository changeset_revision="ec95ec570854" name="package_gatk_1_4" owner="devteam" prior_installation_required="False" toolshed="http://toolshed.g2.bx.psu.edu" />
7c7
<       <repository name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" />

---
>       <repository changeset_revision="171cd8bc208d" name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" toolshed="http://toolshed.g2.bx.psu.edu" />

Check local diff against test tool shed.

% planemo shed_diff --shed_target testtoolshed
wget -q --recursive -O - 'https://testtoolshed.g2.bx.psu.edu/repository/download?repository_id=4dd15c58c2ade087&changeset_revision=default&file_type=gz'; | tar -xzf - -C /tmp/tool_shed_diff_LWnNZt/_testtoolshed_ --strip-components 1
mkdir "/tmp/tool_shed_diff_LWnNZt/_local_"; tar -xzf "/tmp/tmpNKEpuO" -C "/tmp/tool_shed_diff_LWnNZt/_local_"; rm -rf /tmp/tmpNKEpuO
cd "/tmp/tool_shed_diff_LWnNZt"; diff -r _local_ _testtoolshed_
diff -r _local_/count_covariates.xml _testtoolshed_/count_covariates.xml
7d6
<    <version_command>echo "A REALLY OLD OPEN SOURCE VERSION OF GATK"</version_command>
diff -r _local_/tool_dependencies.xml _testtoolshed_/tool_dependencies.xml
4c4
<       <repository name="package_gatk_1_4" owner="devteam" prior_installation_required="False" />

---
>       <repository changeset_revision="0cc94f66d00e" name="package_gatk_1_4" owner="devteam" prior_installation_required="False" toolshed="http://testtoolshed.g2.bx.psu.edu" />
7c7
<       <repository name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" />

---
>       <repository changeset_revision="c0f72bdba484" name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" toolshed="http://testtoolshed.g2.bx.psu.edu" />

Check difference between test and main for this repository.

% planemo shed_diff --shed_target_source testtoolshed
wget -q --recursive -O - 'https://toolshed.g2.bx.psu.edu/repository/download?repository_id=b6b97c236de89252&changeset_revision=default&file_type=gz'; | tar -xzf - -C /tmp/tool_shed_diff_Aa9wj3/_toolshed_ --strip-components 1
wget -q --recursive -O - 'https://testtoolshed.g2.bx.psu.edu/repository/download?repository_id=4dd15c58c2ade087&changeset_revision=default&file_type=gz'; | tar -xzf - -C /tmp/tool_shed_diff_Aa9wj3/_testtoolshed_ --strip-components 1
cd "/tmp/tool_shed_diff_Aa9wj3"; diff -r _testtoolshed_ _toolshed_
diff -r _testtoolshed_/tool_dependencies.xml _toolshed_/tool_dependencies.xml
4c4
<       <repository changeset_revision="0cc94f66d00e" name="package_gatk_1_4" owner="devteam" prior_installation_required="False" toolshed="http://testtoolshed.g2.bx.psu.edu" />

---
>       <repository changeset_revision="ec95ec570854" name="package_gatk_1_4" owner="devteam" prior_installation_required="False" toolshed="http://toolshed.g2.bx.psu.edu" />
7c7
<       <repository changeset_revision="c0f72bdba484" name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" toolshed="http://testtoolshed.g2.bx.psu.edu" />

---
>       <repository changeset_revision="171cd8bc208d" name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" toolshed="http://toolshed.g2.bx.psu.edu" />

Ignore YAML file and just check difference between main and test tool shed for arbitrary repository.

% planemo shed_diff --owner peterjc --name blast_rbh --shed_target_source testtoolshed
wget -q --recursive -O - 'https://toolshed.g2.bx.psu.edu/repository/download?repository_id=d5dd1c5d2070513e&changeset_revision=default&file_type=gz'; | tar -xzf - -C /tmp/tool_shed_diff_II0eAD/_toolshed_ --strip-components 1
wget -q --recursive -O - 'https://testtoolshed.g2.bx.psu.edu/repository/download?repository_id=c053d26daf6271bf&changeset_revision=default&file_type=gz'; | tar -xzf - -C /tmp/tool_shed_diff_II0eAD/_testtoolshed_ --strip-components 1
cd "/tmp/tool_shed_diff_II0eAD"; diff -r _testtoolshed_ _toolshed_
diff -r _testtoolshed_/tools/blast_rbh/blast_rbh.py _toolshed_/tools/blast_rbh/blast_rbh.py
35c35
<     print "BLAST RBH v0.1.6"

---
>     print "BLAST RBH v0.1.5"
110c110
<     if blast_type not in ["blastp", "blastp-fast", "blastp-short"]:

---
>     if blast_type not in ["blastp", "blastp-short"]:
332c332
<     sys.stderr.write("Warning: Sequences with tied best hits found, you may have duplicates/clusters\n")

---
>     sys.stderr.write("Warning: Sequencies with tied best hits found, you may have duplicates/clusters\n")
diff -r _testtoolshed_/tools/blast_rbh/blast_rbh.xml _toolshed_/tools/blast_rbh/blast_rbh.xml
1c1
< <tool id="blast_reciprocal_best_hits" name="BLAST Reciprocal Best Hits (RBH)" version="0.1.6">

---
> <tool id="blast_reciprocal_best_hits" name="BLAST Reciprocal Best Hits (RBH)" version="0.1.5">
48d47
<                     <option value="blastp-fast">blastp-fast - Uses longer words as described by Shiryev et al (2007)</option>
167c166
<             <param name="nucl_type" value="blastp-fast"/>

---
>             <param name="nucl_type" value="blastp"/>
diff -r _testtoolshed_/tools/blast_rbh/README.rst _toolshed_/tools/blast_rbh/README.rst
65d64
< v0.1.6  - Offer the new blastp-fast task added in BLAST+ 2.2.30.
diff -r _testtoolshed_/tools/blast_rbh/tool_dependencies.xml _toolshed_/tools/blast_rbh/tool_dependencies.xml
4c4
<         <repository changeset_revision="268128adb501" name="package_biopython_1_64" owner="biopython" toolshed="https://testtoolshed.g2.bx.psu.edu" />

---
>         <repository changeset_revision="5477a05cc158" name="package_biopython_1_64" owner="biopython" toolshed="https://toolshed.g2.bx.psu.edu" />
7c7
<         <repository changeset_revision="f69b90d89b62" name="package_blast_plus_2_2_30" owner="iuc" toolshed="https://testtoolshed.g2.bx.psu.edu" />

---
>         <repository changeset_revision="0fe5d5c28ea2" name="package_blast_plus_2_2_30" owner="iuc" toolshed="https://toolshed.g2.bx.psu.edu" />

Closes #27.

jmchilton added some commits Dec 5, 2014

Refactor code for downloading and streaming tars out of project_init.
I think this will prove useful for shed_diff stuff - too many low-level details in project_init anyway.
Infrastructure for downloading tool shed tarballs.
(Ahead of shed_diff command, adds a command shed_download which probably isn't useful generally so I excluded it from the docs.)
Implement shed_diff command.
Inspired by script from @peterjc - https://gist.github.com/peterjc/13653e6907d75c470d01.

By default compares the local changes against the main Tool Shed repository defined by [.][tool][_]shed.yml, but with command line options can be made to do all sorts of comparisons. Some of these are demonstrated below:

Default against main tool shed:

```
% planemo shed_diff
wget -q --recursive -O - 'https://toolshed.g2.bx.psu.edu/repository/download?repository_id=b6b97c236de89252&changeset_revision=default&file_type=gz' | tar -xzf - -C /tmp/tool_shed_diff_CuRq5U/_toolshed_ --strip-components 1
mkdir "/tmp/tool_shed_diff_CuRq5U/_local_"; tar -xzf "/tmp/tmpdVW07c" -C "/tmp/tool_shed_diff_CuRq5U/_local_"; rm -rf /tmp/tmpdVW07c
cd "/tmp/tool_shed_diff_CuRq5U"; diff -r _local_ _toolshed_
diff -r _local_/count_covariates.xml _toolshed_/count_covariates.xml
7d6
<    <version_command>echo "A REALLY OLD OPEN SOURCE VERSION OF GATK"</version_command>
diff -r _local_/tool_dependencies.xml _toolshed_/tool_dependencies.xml
4c4
<       <repository name="package_gatk_1_4" owner="devteam" prior_installation_required="False" />
---
>       <repository changeset_revision="ec95ec570854" name="package_gatk_1_4" owner="devteam" prior_installation_required="False" toolshed="http://toolshed.g2.bx.psu.edu" />
7c7
<       <repository name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" />
---
>       <repository changeset_revision="171cd8bc208d" name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" toolshed="http://toolshed.g2.bx.psu.edu" />
```

Check local diff against test tool shed.

```
% planemo shed_diff --shed_target testtoolshed
/home/john/workspace/planemo/.venv/local/lib/python2.7/site-packages/requests-2.4.3-py2.7.egg/requests/packages/urllib3/connectionpool.py:730: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html (This warning will only appear once by default.)
  InsecureRequestWarning)
wget -q --recursive -O - 'https://testtoolshed.g2.bx.psu.edu/repository/download?repository_id=4dd15c58c2ade087&changeset_revision=default&file_type=gz' | tar -xzf - -C /tmp/tool_shed_diff_LWnNZt/_testtoolshed_ --strip-components 1
mkdir "/tmp/tool_shed_diff_LWnNZt/_local_"; tar -xzf "/tmp/tmpNKEpuO" -C "/tmp/tool_shed_diff_LWnNZt/_local_"; rm -rf /tmp/tmpNKEpuO
cd "/tmp/tool_shed_diff_LWnNZt"; diff -r _local_ _testtoolshed_
diff -r _local_/count_covariates.xml _testtoolshed_/count_covariates.xml
7d6
<    <version_command>echo "A REALLY OLD OPEN SOURCE VERSION OF GATK"</version_command>
diff -r _local_/tool_dependencies.xml _testtoolshed_/tool_dependencies.xml
4c4
<       <repository name="package_gatk_1_4" owner="devteam" prior_installation_required="False" />
---
>       <repository changeset_revision="0cc94f66d00e" name="package_gatk_1_4" owner="devteam" prior_installation_required="False" toolshed="http://testtoolshed.g2.bx.psu.edu" />
7c7
<       <repository name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" />
---
>       <repository changeset_revision="c0f72bdba484" name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" toolshed="http://testtoolshed.g2.bx.psu.edu" />
```

Check difference between test and main for this repository.

```
% planemo shed_diff --shed_target_source testtoolshed
/home/john/workspace/planemo/.venv/local/lib/python2.7/site-packages/requests-2.4.3-py2.7.egg/requests/packages/urllib3/connectionpool.py:730: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html (This warning will only appear once by default.)
  InsecureRequestWarning)
wget -q --recursive -O - 'https://toolshed.g2.bx.psu.edu/repository/download?repository_id=b6b97c236de89252&changeset_revision=default&file_type=gz' | tar -xzf - -C /tmp/tool_shed_diff_Aa9wj3/_toolshed_ --strip-components 1
wget -q --recursive -O - 'https://testtoolshed.g2.bx.psu.edu/repository/download?repository_id=4dd15c58c2ade087&changeset_revision=default&file_type=gz' | tar -xzf - -C /tmp/tool_shed_diff_Aa9wj3/_testtoolshed_ --strip-components 1
cd "/tmp/tool_shed_diff_Aa9wj3"; diff -r _testtoolshed_ _toolshed_
diff -r _testtoolshed_/tool_dependencies.xml _toolshed_/tool_dependencies.xml
4c4
<       <repository changeset_revision="0cc94f66d00e" name="package_gatk_1_4" owner="devteam" prior_installation_required="False" toolshed="http://testtoolshed.g2.bx.psu.edu" />
---
>       <repository changeset_revision="ec95ec570854" name="package_gatk_1_4" owner="devteam" prior_installation_required="False" toolshed="http://toolshed.g2.bx.psu.edu" />
7c7
<       <repository changeset_revision="c0f72bdba484" name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" toolshed="http://testtoolshed.g2.bx.psu.edu" />
---
>       <repository changeset_revision="171cd8bc208d" name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" toolshed="http://toolshed.g2.bx.psu.edu" />
```

Ignore YAML file and just check difference between main and test tool shed for arbitrary repository.

```
% planemo shed_diff --owner peterjc --name blast_rbh --shed_target_source testtoolshed
/home/john/workspace/planemo/.venv/local/lib/python2.7/site-packages/requests-2.4.3-py2.7.egg/requests/packages/urllib3/connectionpool.py:730: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html (This warning will only appear once by default.)
  InsecureRequestWarning)
wget -q --recursive -O - 'https://toolshed.g2.bx.psu.edu/repository/download?repository_id=d5dd1c5d2070513e&changeset_revision=default&file_type=gz' | tar -xzf - -C /tmp/tool_shed_diff_II0eAD/_toolshed_ --strip-components 1
wget -q --recursive -O - 'https://testtoolshed.g2.bx.psu.edu/repository/download?repository_id=c053d26daf6271bf&changeset_revision=default&file_type=gz' | tar -xzf - -C /tmp/tool_shed_diff_II0eAD/_testtoolshed_ --strip-components 1
cd "/tmp/tool_shed_diff_II0eAD"; diff -r _testtoolshed_ _toolshed_
diff -r _testtoolshed_/tools/blast_rbh/blast_rbh.py _toolshed_/tools/blast_rbh/blast_rbh.py
35c35
<     print "BLAST RBH v0.1.6"
---
>     print "BLAST RBH v0.1.5"
110c110
<     if blast_type not in ["blastp", "blastp-fast", "blastp-short"]:
---
>     if blast_type not in ["blastp", "blastp-short"]:
332c332
<     sys.stderr.write("Warning: Sequences with tied best hits found, you may have duplicates/clusters\n")
---
>     sys.stderr.write("Warning: Sequencies with tied best hits found, you may have duplicates/clusters\n")
diff -r _testtoolshed_/tools/blast_rbh/blast_rbh.xml _toolshed_/tools/blast_rbh/blast_rbh.xml
1c1
< <tool id="blast_reciprocal_best_hits" name="BLAST Reciprocal Best Hits (RBH)" version="0.1.6">
---
> <tool id="blast_reciprocal_best_hits" name="BLAST Reciprocal Best Hits (RBH)" version="0.1.5">
48d47
<                     <option value="blastp-fast">blastp-fast - Uses longer words as described by Shiryev et al (2007)</option>
167c166
<             <param name="nucl_type" value="blastp-fast"/>
---
>             <param name="nucl_type" value="blastp"/>
diff -r _testtoolshed_/tools/blast_rbh/README.rst _toolshed_/tools/blast_rbh/README.rst
65d64
< v0.1.6  - Offer the new blastp-fast task added in BLAST+ 2.2.30.
diff -r _testtoolshed_/tools/blast_rbh/tool_dependencies.xml _toolshed_/tools/blast_rbh/tool_dependencies.xml
4c4
<         <repository changeset_revision="268128adb501" name="package_biopython_1_64" owner="biopython" toolshed="https://testtoolshed.g2.bx.psu.edu" />
---
>         <repository changeset_revision="5477a05cc158" name="package_biopython_1_64" owner="biopython" toolshed="https://toolshed.g2.bx.psu.edu" />
7c7
<         <repository changeset_revision="f69b90d89b62" name="package_blast_plus_2_2_30" owner="iuc" toolshed="https://testtoolshed.g2.bx.psu.edu" />
---
>         <repository changeset_revision="0fe5d5c28ea2" name="package_blast_plus_2_2_30" owner="iuc" toolshed="https://toolshed.g2.bx.psu.edu" />
```

Closes #27.
@jmchilton

This comment has been minimized.

Copy link
Member Author

jmchilton commented Dec 5, 2014

My original idea was to actually upload the tar to the tool shed - but not commit to the repository - and pull back it back down with changeset and tool shed information. This would provide a cleaner diff and potentially more robustly reflect what is going to happen. That can be an iteration 2 thing though I think - especially because this variant has the nice feature that it doesn't require tool shed credentials so it should remain an option.

jmchilton added a commit that referenced this pull request Dec 6, 2014

@jmchilton jmchilton merged commit 3f8de7a into master Dec 6, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details

@jmchilton jmchilton deleted the shed_diff branch Dec 6, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment