-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aquainfra importer #108
Aquainfra importer #108
Conversation
Wow cool. I need @wm75 here, he is expert in those data_sources. |
THANK you @MarkusKonk for this PR! I let Björn and Wolfgang comment but it seems to me creating such "data importer" can be hard to maintain in both Galaxy and data provider sides with years isn't it ? I often think on my mind that it is better / easier to have "data import tools" who are directly using 'data provider" api for example. Is this comment make sense? THANK you for your work! |
<tool id="aquainfra_importer" name="AquaINFRA Importer" tool_type="data_source" version="1.0" profile="22.05"> | ||
<description>downloads content via the AquaINFRA interaction platform</description> | ||
<command><![CDATA[ | ||
python '$__tool_directory__/data_source.py' '$output' $__app__.config.output_size_limit '$output' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When installing from the toolshed, this will only work if you also provide the data_source.py
script together with the .xml because this literally expects the .xml and the .py file in the same directory.
If you haven't customized anything you can just copy over https://github.com/galaxyproject/galaxy/blob/dev/tools/data_source/data_source.py here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick review. I just added the file to PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I'm now checking things on my own instance to see how it behaves :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the aquainfra part of this is not implemented yet? or should it be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wasn't sure what to do first. The problem is that only very very few datasets have metadata including a direct download link. Most don't have a link or redirect to the website of the data provider where you need to accept conditions or login or similar. Ideally the "Import to Galaxy" Button would be on the website of the data provider but I don't see that happen in the near future.
Give me some time to update the platform and I will show you an example record to test the import to Galaxy.
Thanks for the hint. Do you mean that the platform passes the parameters to the data source tool which uses these to fetch the data via the API directly from the provider? |
Thank you for your rapid feedback. I mean ""just"" having a Galaxy tool who can use the data provider API to create a command line allowing to import data into Galaxy. An example here https://github.com/galaxyecology/tools-ecology/tree/master/tools/spocc |
My 5cents ... both approaches have their pros and cons. In the end, it boils down to me to - do we have a good contact that we trust at the data-repo side. Do they inform us about upcoming internal changes, and are they willing to work with us more closely in the future? If this Q can be answered with a |
Ah, got it. I think we will follow both directions. The data source is useful for people who don't know where to find data or who are rather at the beginning of their search. They start with the platform, search fo data, find it, and then import it to Galaxy. Your way would be more useful for people who want to create, for example, a subset of the data. I am pretty sure we will need to cover both use cases in the project. |
Co-authored-by: Wolfgang Maier <maierw@posteo.de>
@MarkusKonk is zip the only thing the remote server can return? On the Galaxy side you could get way more sophisticated than that. |
and the failing test is about flake8 linting of data_source.py, which is apparently configured differently here from the galaxy repo. |
I finally managed to create a proper example for the data import from aquainfra to galaxy. Here are two examples:
Both have an "Import to Galaxy" button. I changed the Galaxy tool just a bit. It now has "auto" as an output type. It worked well with zip files, json, and geojson. |
I have the same data_source.py as here (https://github.com/galaxyproject/galaxy/blob/dev/tools/data_source/data_source.py) but I am still getting a linting error |
I guess this one is not linted :) |
Ok, I can take that and fix it :) Its not lint |
@bgruening |
Co-authored-by: Wolfgang Maier <maierw@posteo.de>
Wolfgang is busy this week and on vacation for the next 2 weeks, so let's merge it so you can test it further. He can look at it in more detail when he is back. |
This tool is a data source to import datasets from the AquaINFRA Interaction platform. Users will get redirected to the platform where they can search for datasets. Some of the datasets will have an import to Galaxy button which redirects back to Galaxy where the download starts.