Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Galaxy Workflow support for GenomeSpace #1814

Merged
merged 15 commits into from Aug 18, 2017

Conversation

Projects
None yet
6 participants
@nuwang
Copy link
Member

commented Feb 27, 2016

This pull request adds workflow integration for GenomeSpace import/export tools. It's been a long time coming since the original mailing list post almost 2 years ago: http://dev.list.galaxyproject.org/Genomespace-Integration-with-workflows-td4664513.html. I also talked to @jmchilton during GCC 2014 about this - and for various reasons, it's been languishing in the backburner since.

This is a patch against the latest dev branch. We've coordinated with Ted Liefield (@liefield) from the GenomeSpace team to get some support functions added into GenomeSpace for obtaining an auth-token that can be used for performing the import/export. The open-id based method that @blankenberg added to the importer/exporter also works. There's a slight tension between security and ease-of-use, but Ted reckons that since the token is relatively short-lived and revokable, it doesn't pose a significant security threat. If there are any problems here, we can modify it again to mandate openid login, but at the cost of being less user-friendly. (the current version allows for direct import/export from GenomeSpace with only a GenomeSpace login)

The GenomeSpace importer/exporter itself has been rewritten as a standalone pip installable tool, available here: https://github.com/gvlproject/python-genomespaceclient. We hope to transfer that code back back into GenomeSpace or Galaxy as a set of Python bindings + commandline client for GenomeSpace.

There's a 3 minute video of how things work here:
https://www.youtube.com/watch?v=5QPtWS_ab0I

Feedback appreciated.

@@ -0,0 +1,23 @@
// Provides support for interacting with the GenomeSpace File Browser popup dialogue
define([], function() {

This comment has been minimized.

Copy link
@nuwang

nuwang Feb 27, 2016

Author Member

Not sure whether this folder is the best location for this support module.

// tool form templates
return {
openFileBrowser: function( options ) {
var GS_UPLOAD_URL = "https://gsui.genomespace.org/jsui/upload/loadUrlToGenomespace.html?getLocation=true";

This comment has been minimized.

Copy link
@nuwang

nuwang Feb 27, 2016

Author Member

Could this location be made centrally configurable somehow? If so, that would be pretty useful since it could be made configurable at install time to point to the Australian mirror for example.

This comment has been minimized.

Copy link
@jmchilton

jmchilton Aug 10, 2017

Member

Galaxy.config may be available - that is like a global configuration object that you can populate with options from galaxy.ini and such. It is worth looking into.

This comment has been minimized.

Copy link
@nuwang

nuwang Aug 17, 2017

Author Member

@jmchilton Thanks. I've added a galaxy.ini config setting for the GenomeSpace URL now: 04a7e76

@bgruening

This comment has been minimized.

Copy link
Member

commented Feb 27, 2016

This is awesome! Great work @nuwang!

self.value = value or ""

def get_html(self, prefix=""):
return unicodify('<script src="https://gsui.genomespace.org/jsui/upload/gsuploadwindow.js"></script>'

This comment has been minimized.

Copy link
@nuwang

nuwang Feb 27, 2016

Author Member

When we originally started working on this, the tool parameters were still being rendered on the server side. Since it's now being rendered on the client-side, we've added a backbone component for it, but also left this bit as is since workflows don't fully support client-side rendering yet. Once that conversion is done, I hope that this bit can be nuked, preferably from orbit :-)

@nuwang

This comment has been minimized.

Copy link
Member Author

commented Feb 27, 2016

@bgruening Thanks! @ykowsar made the original modifications to GenomeSpace that enabled this to happen, and @liefield tested and mainlined the code.

@nuwang nuwang force-pushed the gvlproject:genomespace_workflows branch from d7f7a90 to 6a83c5d Feb 27, 2016

@martenson martenson added this to the 16.07 milestone Apr 12, 2016

@martenson martenson modified the milestones: 16.10, 16.07 Jul 27, 2016

@martenson martenson modified the milestones: 17.01, 16.10 Nov 16, 2016

@martenson martenson modified the milestones: 17.05, 17.01 Jan 12, 2017

@martenson martenson requested a review from blankenberg Jan 12, 2017

@nuwang nuwang force-pushed the gvlproject:genomespace_workflows branch from e51828c to 6115bfd Mar 22, 2017

@martenson martenson modified the milestones: 17.09, 17.05 Apr 26, 2017

@nuwang

This comment has been minimized.

Copy link
Member Author

commented May 24, 2017

@blankenberg @martenson How's this looking? About 2 months ago, I made another set of fixes to make sure the merge could be done smoothly (all pylint warnings fixed, merge issues fixed) but it seems to have drifted out of sync again. It would be great if we can do something about this pull request because it's been languishing for quite a while now. Let me know if you'd like to see any more changes and any comments etc. I can make another update with the merge issues resolved.

@nuwang

This comment has been minimized.

Copy link
Member Author

commented Jun 9, 2017

@blankenberg @martenson @afgane I've just made some commits updating the branch to latest dev and fixing all merge issues.

@nuwang nuwang force-pushed the gvlproject:genomespace_workflows branch 2 times, most recently from 684cd55 to 357f802 Aug 9, 2017

@jmchilton

This comment has been minimized.

Copy link
Member

commented Aug 10, 2017

I think we need to build the client before merging - do you mind if I do that and push to your branch at somepoint before merging the PR?

@nuwang

This comment has been minimized.

Copy link
Member Author

commented Aug 10, 2017

@jmchilton Thanks, that'd be great. I didn't do that because I was afraid it would trigger merge conflicts more often. There's also a minor issue with compressed file handling. Simon and I thought this would fix that: 357f802 but the files remain stubbornly compressed. Is there a different way to trigger Galaxy's built in capabilities for this?

@nuwang nuwang force-pushed the gvlproject:genomespace_workflows branch from 357f802 to a87a336 Aug 14, 2017

@nuwang

This comment has been minimized.

Copy link
Member Author

commented Aug 17, 2017

@jmchilton We've identified the issue with compressed file handling and have logged an issue here: #4443
So we think the issue is fixed as far as this pull request goes.

I'll rebuild the client and push it in now.

@nuwang nuwang force-pushed the gvlproject:genomespace_workflows branch from bbf18f0 to bcc3f6b Aug 17, 2017

@nuwang

This comment has been minimized.

Copy link
Member Author

commented Aug 17, 2017

@jmchilton I've merged in the latest master, but haven't done a client rebuild, because that looks like something that should be done during the PR merge?

@mvdbeek
Copy link
Member

left a comment

I'd keep compressed files compressed if galaxy can deal with it. I would think this becomes the default in the near future, but probably not for 17.09.

'ffn': 'fasta',
'fastq': 'fastqsanger',
'fq': 'fastqsanger',
'fqz': 'fastqsanger'}

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Aug 17, 2017

Member

If I understand this implementation correctly, all datatypes that are sniffed successfully will not be modified by this mapping, so I'd remove all fasta/q*, gff, gtf, *txt etc. types. In fact I'd probably eliminate this step, or, if you want to force things like gtf/gff, which are difficult to sniff and where you are certain that the genomespace extensions is correct, you'd have to change the order for this to occur before letting galaxy sniff the datatype.

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Aug 17, 2017

Member

Also note that you can't force files to be of a datatype that they are not. Forcing fastqsager.gz to be fastqsanger will only change the datatype, not the content, so almost all tools (tools that discriminate between compressed and uncompressed files, and tools which will trigger an automatic conversion because they only support uncompressed) will fail on this dataset.

This comment has been minimized.

Copy link
@nuwang

nuwang Aug 17, 2017

Author Member

@mvdbeek Thanks for reviewing. I've kept the original implementation logic as is, and have only refactored it (but have added more types). The specific line where filetype is determined is in the determine_file_type() method here: https://github.com/galaxyproject/galaxy/pull/1814/files/e44d67f1944aa43c9d28cd9efffca88cfecd51d5#diff-a9715b311879402b8b86a4013e4ec56bR152

The order of resolution is:

  1. Respect genomespace supplied type
  2. If no type, try to sniff type
  3. Still no type, use filename extension
  4. Still nothing, default to 'data'

There's no forced coercion unless the file is misnamed. Does that sound ok?

However, one thing I did change was to call sniff.handle_uploaded_dataset() to trigger decompression here prior to determining file type: https://github.com/galaxyproject/galaxy/pull/1814/files/e44d67f1944aa43c9d28cd9efffca88cfecd51d5#diff-a9715b311879402b8b86a4013e4ec56bR208

Given the current implementation of sniff.handle_uploaded_dataset(), only fasta.gz files will be decompressed. Everything else will be left as is.

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Aug 17, 2017

Member

Sorry, I don't know how I got the order wrong, so that solves the one part, but you should remove the extensions that are not problematic in galaxy IMO.

So what you're saying is that only fasta.gz will be uncompressed ? Why only fasta.gz ? Or is that a typo and you meant fastq.gz ? Or is that because fasta is an example of a non-compressed datatype ? Normally compressed datatypes like fastq.gz should not be uncompressed by handle_uploaded_dataset.

Now the important thing is that if you respect the genomespace supplied type, which maps compressed fastq (fqz) to fastqsanger, you must uncompress the file. But it would be very good if that doesn't happen and you would not special-case compressed or uncompressed fastqs.

This comment has been minimized.

Copy link
@nuwang

nuwang Aug 17, 2017

Author Member

@mvdbeek You're right - I missed that completely - thanks for catching that. I'll remove these newly added types.

This comment has been minimized.

Copy link
@nuwang

nuwang Aug 17, 2017

Author Member

@mvdbeek This has been fixed now.

#set $token = $URL.split("^")[1] if "^" in $URL and $URL.split("^")[1] else $__user__.preferences.get( 'genomespace_token', None )
#assert $input_file, Exception( 'You must select a valid input file.' )
#assert $token, Exception( 'Invalid token. You must be logged into GenomeSpace through OpenID.' )
--json_parameter_file "${output_file1}"

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Aug 18, 2017

Member

single quotes


#assert $input_file, Exception( 'You must select a valid input file.' )
#assert $token, Exception( 'Invalid token. You must be logged into GenomeSpace through OpenID or select a valid file via the GenomeSpace browse dialog.' )
--json_parameter_file "${output_file1}"

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Aug 18, 2017

Member

single quotes

--json_parameter_file "${output_file1}"
--galaxy_root $__root_dir__
--data_conf $__datatypes_config__
--token "${token}"

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Aug 18, 2017

Member

single quotes

@nuwang nuwang force-pushed the gvlproject:genomespace_workflows branch 2 times, most recently from 749eb94 to c3612fb Aug 18, 2017

@nuwang

This comment has been minimized.

Copy link
Member Author

commented Aug 18, 2017

@mvdbeek Please take a look now. I think all the issues you've flagged have been fixed. Thanks again for the thorough review.

@mvdbeek
Copy link
Member

left a comment

Excellent, thank you very much for this contribution @nuwang, and sorry that it took such a long time to review this.

@nuwang nuwang force-pushed the gvlproject:genomespace_workflows branch from 07b14b1 to 3240197 Aug 18, 2017

@nuwang

This comment has been minimized.

Copy link
Member Author

commented Aug 18, 2017

I've rebased so that the client rebuild step is the last commit. I've also enabled edits from maintainers so that the source branch can be edited: https://github.com/gvlproject/galaxy/tree/genomespace_workflows

@martenson

This comment has been minimized.

Copy link
Member

commented Aug 18, 2017

Thank you very much @nuwang for the contribution and @mvdbeek for the detailed review and @jmchilton for the thumbs up.

@martenson martenson merged commit fd51a61 into galaxyproject:dev Aug 18, 2017

5 of 6 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
api test Build finished. 281 tests run, 0 skipped, 0 failed.
Details
framework test Build finished. 161 tests run, 0 skipped, 0 failed.
Details
integration test Build finished. 44 tests run, 0 skipped, 0 failed.
Details
lgtm analysis: JavaScript 8 new alerts
Details
toolshed test Build finished. 579 tests run, 0 skipped, 0 failed.
Details
@jmchilton

This comment has been minimized.

Copy link
Member

commented Aug 18, 2017

Thanks for sticking with this @nuwang - awesome contribution! I feel like I dropped the ball on this and I apologize to everyone, but thanks for picking it up and doing such an awesome job with the review @mvdbeek.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.