Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement shed_diff. #33

Merged
merged 5 commits into from
Dec 6, 2014
Merged

Implement shed_diff. #33

merged 5 commits into from
Dec 6, 2014

Conversation

jmchilton
Copy link
Member

Inspired by script from @peterjc - https://gist.github.com/peterjc/13653e6907d75c470d01.

By default compares the local changes against the main Tool Shed repository defined by [.][tool][_]shed.yml, but with command line options can be made to do all sorts of comparisons. Some of these are demonstrated below:

Default against main tool shed:

% planemo shed_diff
wget -q --recursive -O - 'https://toolshed.g2.bx.psu.edu/repository/download?repository_id=b6b97c236de89252&changeset_revision=default&file_type=gz'; | tar -xzf - -C /tmp/tool_shed_diff_CuRq5U/_toolshed_ --strip-components 1
mkdir "/tmp/tool_shed_diff_CuRq5U/_local_"; tar -xzf "/tmp/tmpdVW07c" -C "/tmp/tool_shed_diff_CuRq5U/_local_"; rm -rf /tmp/tmpdVW07c
cd "/tmp/tool_shed_diff_CuRq5U"; diff -r _local_ _toolshed_
diff -r _local_/count_covariates.xml _toolshed_/count_covariates.xml
7d6
<    <version_command>echo "A REALLY OLD OPEN SOURCE VERSION OF GATK"</version_command>
diff -r _local_/tool_dependencies.xml _toolshed_/tool_dependencies.xml
4c4
<       <repository name="package_gatk_1_4" owner="devteam" prior_installation_required="False" />

---
>       <repository changeset_revision="ec95ec570854" name="package_gatk_1_4" owner="devteam" prior_installation_required="False" toolshed="http://toolshed.g2.bx.psu.edu" />
7c7
<       <repository name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" />

---
>       <repository changeset_revision="171cd8bc208d" name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" toolshed="http://toolshed.g2.bx.psu.edu" />

Check local diff against test tool shed.

% planemo shed_diff --shed_target testtoolshed
wget -q --recursive -O - 'https://testtoolshed.g2.bx.psu.edu/repository/download?repository_id=4dd15c58c2ade087&changeset_revision=default&file_type=gz'; | tar -xzf - -C /tmp/tool_shed_diff_LWnNZt/_testtoolshed_ --strip-components 1
mkdir "/tmp/tool_shed_diff_LWnNZt/_local_"; tar -xzf "/tmp/tmpNKEpuO" -C "/tmp/tool_shed_diff_LWnNZt/_local_"; rm -rf /tmp/tmpNKEpuO
cd "/tmp/tool_shed_diff_LWnNZt"; diff -r _local_ _testtoolshed_
diff -r _local_/count_covariates.xml _testtoolshed_/count_covariates.xml
7d6
<    <version_command>echo "A REALLY OLD OPEN SOURCE VERSION OF GATK"</version_command>
diff -r _local_/tool_dependencies.xml _testtoolshed_/tool_dependencies.xml
4c4
<       <repository name="package_gatk_1_4" owner="devteam" prior_installation_required="False" />

---
>       <repository changeset_revision="0cc94f66d00e" name="package_gatk_1_4" owner="devteam" prior_installation_required="False" toolshed="http://testtoolshed.g2.bx.psu.edu" />
7c7
<       <repository name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" />

---
>       <repository changeset_revision="c0f72bdba484" name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" toolshed="http://testtoolshed.g2.bx.psu.edu" />

Check difference between test and main for this repository.

% planemo shed_diff --shed_target_source testtoolshed
wget -q --recursive -O - 'https://toolshed.g2.bx.psu.edu/repository/download?repository_id=b6b97c236de89252&changeset_revision=default&file_type=gz'; | tar -xzf - -C /tmp/tool_shed_diff_Aa9wj3/_toolshed_ --strip-components 1
wget -q --recursive -O - 'https://testtoolshed.g2.bx.psu.edu/repository/download?repository_id=4dd15c58c2ade087&changeset_revision=default&file_type=gz'; | tar -xzf - -C /tmp/tool_shed_diff_Aa9wj3/_testtoolshed_ --strip-components 1
cd "/tmp/tool_shed_diff_Aa9wj3"; diff -r _testtoolshed_ _toolshed_
diff -r _testtoolshed_/tool_dependencies.xml _toolshed_/tool_dependencies.xml
4c4
<       <repository changeset_revision="0cc94f66d00e" name="package_gatk_1_4" owner="devteam" prior_installation_required="False" toolshed="http://testtoolshed.g2.bx.psu.edu" />

---
>       <repository changeset_revision="ec95ec570854" name="package_gatk_1_4" owner="devteam" prior_installation_required="False" toolshed="http://toolshed.g2.bx.psu.edu" />
7c7
<       <repository changeset_revision="c0f72bdba484" name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" toolshed="http://testtoolshed.g2.bx.psu.edu" />

---
>       <repository changeset_revision="171cd8bc208d" name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" toolshed="http://toolshed.g2.bx.psu.edu" />

Ignore YAML file and just check difference between main and test tool shed for arbitrary repository.

% planemo shed_diff --owner peterjc --name blast_rbh --shed_target_source testtoolshed
wget -q --recursive -O - 'https://toolshed.g2.bx.psu.edu/repository/download?repository_id=d5dd1c5d2070513e&changeset_revision=default&file_type=gz'; | tar -xzf - -C /tmp/tool_shed_diff_II0eAD/_toolshed_ --strip-components 1
wget -q --recursive -O - 'https://testtoolshed.g2.bx.psu.edu/repository/download?repository_id=c053d26daf6271bf&changeset_revision=default&file_type=gz'; | tar -xzf - -C /tmp/tool_shed_diff_II0eAD/_testtoolshed_ --strip-components 1
cd "/tmp/tool_shed_diff_II0eAD"; diff -r _testtoolshed_ _toolshed_
diff -r _testtoolshed_/tools/blast_rbh/blast_rbh.py _toolshed_/tools/blast_rbh/blast_rbh.py
35c35
<     print "BLAST RBH v0.1.6"

---
>     print "BLAST RBH v0.1.5"
110c110
<     if blast_type not in ["blastp", "blastp-fast", "blastp-short"]:

---
>     if blast_type not in ["blastp", "blastp-short"]:
332c332
<     sys.stderr.write("Warning: Sequences with tied best hits found, you may have duplicates/clusters\n")

---
>     sys.stderr.write("Warning: Sequencies with tied best hits found, you may have duplicates/clusters\n")
diff -r _testtoolshed_/tools/blast_rbh/blast_rbh.xml _toolshed_/tools/blast_rbh/blast_rbh.xml
1c1
< <tool id="blast_reciprocal_best_hits" name="BLAST Reciprocal Best Hits (RBH)" version="0.1.6">

---
> <tool id="blast_reciprocal_best_hits" name="BLAST Reciprocal Best Hits (RBH)" version="0.1.5">
48d47
<                     <option value="blastp-fast">blastp-fast - Uses longer words as described by Shiryev et al (2007)</option>
167c166
<             <param name="nucl_type" value="blastp-fast"/>

---
>             <param name="nucl_type" value="blastp"/>
diff -r _testtoolshed_/tools/blast_rbh/README.rst _toolshed_/tools/blast_rbh/README.rst
65d64
< v0.1.6  - Offer the new blastp-fast task added in BLAST+ 2.2.30.
diff -r _testtoolshed_/tools/blast_rbh/tool_dependencies.xml _toolshed_/tools/blast_rbh/tool_dependencies.xml
4c4
<         <repository changeset_revision="268128adb501" name="package_biopython_1_64" owner="biopython" toolshed="https://testtoolshed.g2.bx.psu.edu" />

---
>         <repository changeset_revision="5477a05cc158" name="package_biopython_1_64" owner="biopython" toolshed="https://toolshed.g2.bx.psu.edu" />
7c7
<         <repository changeset_revision="f69b90d89b62" name="package_blast_plus_2_2_30" owner="iuc" toolshed="https://testtoolshed.g2.bx.psu.edu" />

---
>         <repository changeset_revision="0fe5d5c28ea2" name="package_blast_plus_2_2_30" owner="iuc" toolshed="https://toolshed.g2.bx.psu.edu" />

Closes #27.

I think this will prove useful for shed_diff stuff - too many low-level details in project_init anyway.
(Ahead of shed_diff command, adds a command shed_download which probably isn't useful generally so I excluded it from the docs.)
Inspired by script from @peterjc - https://gist.github.com/peterjc/13653e6907d75c470d01.

By default compares the local changes against the main Tool Shed repository defined by [.][tool][_]shed.yml, but with command line options can be made to do all sorts of comparisons. Some of these are demonstrated below:

Default against main tool shed:

```
% planemo shed_diff
wget -q --recursive -O - 'https://toolshed.g2.bx.psu.edu/repository/download?repository_id=b6b97c236de89252&changeset_revision=default&file_type=gz' | tar -xzf - -C /tmp/tool_shed_diff_CuRq5U/_toolshed_ --strip-components 1
mkdir "/tmp/tool_shed_diff_CuRq5U/_local_"; tar -xzf "/tmp/tmpdVW07c" -C "/tmp/tool_shed_diff_CuRq5U/_local_"; rm -rf /tmp/tmpdVW07c
cd "/tmp/tool_shed_diff_CuRq5U"; diff -r _local_ _toolshed_
diff -r _local_/count_covariates.xml _toolshed_/count_covariates.xml
7d6
<    <version_command>echo "A REALLY OLD OPEN SOURCE VERSION OF GATK"</version_command>
diff -r _local_/tool_dependencies.xml _toolshed_/tool_dependencies.xml
4c4
<       <repository name="package_gatk_1_4" owner="devteam" prior_installation_required="False" />
---
>       <repository changeset_revision="ec95ec570854" name="package_gatk_1_4" owner="devteam" prior_installation_required="False" toolshed="http://toolshed.g2.bx.psu.edu" />
7c7
<       <repository name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" />
---
>       <repository changeset_revision="171cd8bc208d" name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" toolshed="http://toolshed.g2.bx.psu.edu" />
```

Check local diff against test tool shed.

```
% planemo shed_diff --shed_target testtoolshed
/home/john/workspace/planemo/.venv/local/lib/python2.7/site-packages/requests-2.4.3-py2.7.egg/requests/packages/urllib3/connectionpool.py:730: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html (This warning will only appear once by default.)
  InsecureRequestWarning)
wget -q --recursive -O - 'https://testtoolshed.g2.bx.psu.edu/repository/download?repository_id=4dd15c58c2ade087&changeset_revision=default&file_type=gz' | tar -xzf - -C /tmp/tool_shed_diff_LWnNZt/_testtoolshed_ --strip-components 1
mkdir "/tmp/tool_shed_diff_LWnNZt/_local_"; tar -xzf "/tmp/tmpNKEpuO" -C "/tmp/tool_shed_diff_LWnNZt/_local_"; rm -rf /tmp/tmpNKEpuO
cd "/tmp/tool_shed_diff_LWnNZt"; diff -r _local_ _testtoolshed_
diff -r _local_/count_covariates.xml _testtoolshed_/count_covariates.xml
7d6
<    <version_command>echo "A REALLY OLD OPEN SOURCE VERSION OF GATK"</version_command>
diff -r _local_/tool_dependencies.xml _testtoolshed_/tool_dependencies.xml
4c4
<       <repository name="package_gatk_1_4" owner="devteam" prior_installation_required="False" />
---
>       <repository changeset_revision="0cc94f66d00e" name="package_gatk_1_4" owner="devteam" prior_installation_required="False" toolshed="http://testtoolshed.g2.bx.psu.edu" />
7c7
<       <repository name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" />
---
>       <repository changeset_revision="c0f72bdba484" name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" toolshed="http://testtoolshed.g2.bx.psu.edu" />
```

Check difference between test and main for this repository.

```
% planemo shed_diff --shed_target_source testtoolshed
/home/john/workspace/planemo/.venv/local/lib/python2.7/site-packages/requests-2.4.3-py2.7.egg/requests/packages/urllib3/connectionpool.py:730: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html (This warning will only appear once by default.)
  InsecureRequestWarning)
wget -q --recursive -O - 'https://toolshed.g2.bx.psu.edu/repository/download?repository_id=b6b97c236de89252&changeset_revision=default&file_type=gz' | tar -xzf - -C /tmp/tool_shed_diff_Aa9wj3/_toolshed_ --strip-components 1
wget -q --recursive -O - 'https://testtoolshed.g2.bx.psu.edu/repository/download?repository_id=4dd15c58c2ade087&changeset_revision=default&file_type=gz' | tar -xzf - -C /tmp/tool_shed_diff_Aa9wj3/_testtoolshed_ --strip-components 1
cd "/tmp/tool_shed_diff_Aa9wj3"; diff -r _testtoolshed_ _toolshed_
diff -r _testtoolshed_/tool_dependencies.xml _toolshed_/tool_dependencies.xml
4c4
<       <repository changeset_revision="0cc94f66d00e" name="package_gatk_1_4" owner="devteam" prior_installation_required="False" toolshed="http://testtoolshed.g2.bx.psu.edu" />
---
>       <repository changeset_revision="ec95ec570854" name="package_gatk_1_4" owner="devteam" prior_installation_required="False" toolshed="http://toolshed.g2.bx.psu.edu" />
7c7
<       <repository changeset_revision="c0f72bdba484" name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" toolshed="http://testtoolshed.g2.bx.psu.edu" />
---
>       <repository changeset_revision="171cd8bc208d" name="package_samtools_0_1_18" owner="devteam" prior_installation_required="False" toolshed="http://toolshed.g2.bx.psu.edu" />
```

Ignore YAML file and just check difference between main and test tool shed for arbitrary repository.

```
% planemo shed_diff --owner peterjc --name blast_rbh --shed_target_source testtoolshed
/home/john/workspace/planemo/.venv/local/lib/python2.7/site-packages/requests-2.4.3-py2.7.egg/requests/packages/urllib3/connectionpool.py:730: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html (This warning will only appear once by default.)
  InsecureRequestWarning)
wget -q --recursive -O - 'https://toolshed.g2.bx.psu.edu/repository/download?repository_id=d5dd1c5d2070513e&changeset_revision=default&file_type=gz' | tar -xzf - -C /tmp/tool_shed_diff_II0eAD/_toolshed_ --strip-components 1
wget -q --recursive -O - 'https://testtoolshed.g2.bx.psu.edu/repository/download?repository_id=c053d26daf6271bf&changeset_revision=default&file_type=gz' | tar -xzf - -C /tmp/tool_shed_diff_II0eAD/_testtoolshed_ --strip-components 1
cd "/tmp/tool_shed_diff_II0eAD"; diff -r _testtoolshed_ _toolshed_
diff -r _testtoolshed_/tools/blast_rbh/blast_rbh.py _toolshed_/tools/blast_rbh/blast_rbh.py
35c35
<     print "BLAST RBH v0.1.6"
---
>     print "BLAST RBH v0.1.5"
110c110
<     if blast_type not in ["blastp", "blastp-fast", "blastp-short"]:
---
>     if blast_type not in ["blastp", "blastp-short"]:
332c332
<     sys.stderr.write("Warning: Sequences with tied best hits found, you may have duplicates/clusters\n")
---
>     sys.stderr.write("Warning: Sequencies with tied best hits found, you may have duplicates/clusters\n")
diff -r _testtoolshed_/tools/blast_rbh/blast_rbh.xml _toolshed_/tools/blast_rbh/blast_rbh.xml
1c1
< <tool id="blast_reciprocal_best_hits" name="BLAST Reciprocal Best Hits (RBH)" version="0.1.6">
---
> <tool id="blast_reciprocal_best_hits" name="BLAST Reciprocal Best Hits (RBH)" version="0.1.5">
48d47
<                     <option value="blastp-fast">blastp-fast - Uses longer words as described by Shiryev et al (2007)</option>
167c166
<             <param name="nucl_type" value="blastp-fast"/>
---
>             <param name="nucl_type" value="blastp"/>
diff -r _testtoolshed_/tools/blast_rbh/README.rst _toolshed_/tools/blast_rbh/README.rst
65d64
< v0.1.6  - Offer the new blastp-fast task added in BLAST+ 2.2.30.
diff -r _testtoolshed_/tools/blast_rbh/tool_dependencies.xml _toolshed_/tools/blast_rbh/tool_dependencies.xml
4c4
<         <repository changeset_revision="268128adb501" name="package_biopython_1_64" owner="biopython" toolshed="https://testtoolshed.g2.bx.psu.edu" />
---
>         <repository changeset_revision="5477a05cc158" name="package_biopython_1_64" owner="biopython" toolshed="https://toolshed.g2.bx.psu.edu" />
7c7
<         <repository changeset_revision="f69b90d89b62" name="package_blast_plus_2_2_30" owner="iuc" toolshed="https://testtoolshed.g2.bx.psu.edu" />
---
>         <repository changeset_revision="0fe5d5c28ea2" name="package_blast_plus_2_2_30" owner="iuc" toolshed="https://toolshed.g2.bx.psu.edu" />
```

Closes #27.
@jmchilton
Copy link
Member Author

My original idea was to actually upload the tar to the tool shed - but not commit to the repository - and pull back it back down with changeset and tool shed information. This would provide a cleaner diff and potentially more robustly reflect what is going to happen. That can be an iteration 2 thing though I think - especially because this variant has the nice feature that it doesn't require tool shed credentials so it should remain an option.

jmchilton added a commit that referenced this pull request Dec 6, 2014
@jmchilton jmchilton merged commit 3f8de7a into master Dec 6, 2014
@jmchilton jmchilton deleted the shed_diff branch December 6, 2014 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Command to compare repositories in different Tool Sheds
1 participant