Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data manager for pangolin-data #4633

Merged
merged 19 commits into from
Jul 20, 2022
Merged

Add data manager for pangolin-data #4633

merged 19 commits into from
Jul 20, 2022

Conversation

pvanheus
Copy link
Contributor

@pvanheus pvanheus commented Jun 28, 2022

FOR CONTRIBUTOR:

  • - I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • - License permits unrestricted use (educational + commercial)
  • - This PR adds a new tool or tool collection
  • - This PR updates an existing tool or tool collection
  • - This PR does something else (explain below)

@pvanheus
Copy link
Contributor Author

pvanheus commented Jun 28, 2022

This is a data manager for the pangolin-data repository used for the pangolin tool. As discussed with @wm75 each data source for pangolin should be managed separately - this just handles the pangolin-data, other DMs for constellations and (eventually - once pangolin itself is updated) pangolin-assigmment will come later. Essentially this replaces the older "pangoLEARN" data manager since pangolin no longer uses the pangoLEARN data.

Data tables produced by this DM:

@pvanheus pvanheus added the wip label Jun 29, 2022
@pvanheus pvanheus removed the wip label Jun 30, 2022
@wm75
Copy link
Contributor

wm75 commented Jul 4, 2022

@pvanheus I cannot currently planemo serve the DM so presumably there's something wrong (name or ID mismatch between the different DM/data table files, or something along these lines) still. Or is that not a problem for you?

@pvanheus
Copy link
Contributor Author

pvanheus commented Jul 4, 2022

@pvanheus I cannot currently planemo serve the DM so presumably there's something wrong (name or ID mismatch between the different DM/data table files, or something along these lines) still. Or is that not a problem for you?

It works for me. When I am in the top level directory, then planemo serve --docker --galaxy_root MYGALAXYROOT data_manager/pangolin_data_dm.xml works for me.

@wm75
Copy link
Contributor

wm75 commented Jul 4, 2022

Hmm, ok, then I need to debug this on my end first. Except for --docker I'm running the same command as you, but the DM doesn't show up in the tools panel for me.

works now :)

@wm75
Copy link
Contributor

wm75 commented Jul 4, 2022

In the meantime, I have started working on version 4.1.1 of the pangolin wrapper, which will already offer your new data tables. So before we merge this one here, I'd like to test the use of these tables on the pangolin tool wrapper side. Should be done tomorrow hopefully.

@wm75 wm75 mentioned this pull request Jul 5, 2022
5 tasks
@pvanheus pvanheus requested a review from wm75 July 9, 2022 08:23
@pvanheus pvanheus added the virology Issues related to better support for virology of SARS-CoV-2, MPXV and more. label Jul 9, 2022
@wm75
Copy link
Contributor

wm75 commented Jul 13, 2022

I do appreciate your hard work on this @pvanheus !
cov-lineages/pangolin#474 got merged just now, however, and so a much simpler solution would be something along these lines:

  • add an input dataset param at the top of the DM, which admins need to populate with the downloaded file from the pangolin repo
  • for each of the three data table choices populate a select list of options from the contents of that file with something similar to, e.g.:
    <param name="constellations_version" type="select">
      <options from_dataset="pangolin_compatibility_file" separator=",">
       <column name="name" index="1"/>
       <column name="value" index="1"/>
       <column name="min_scorpio_version" index="3"/>
       <filter type="static_value" column="0" value="constellations"/>
      </options>
    </param>
    

You will still have to turn the selected version into a github download link and all that, but you'll have a direct source of truth for available versions and their min requirements on the pangolin and scorpio side of things.

@wm75
Copy link
Contributor

wm75 commented Jul 13, 2022

Populating the options from a csv file is somewhat an undocumented hack maybe atm (see galaxyproject/galaxy#14324) but in the very worst case the data manager help would have to tell users to run a csv to tabular conversion first on the downloaded file.

@pvanheus
Copy link
Contributor Author

@wm75 please have a look at the new version of the DM PR.

@wm75
Copy link
Contributor

wm75 commented Jul 18, 2022

I've changed the requirements a bit, but looks good to me.

Personally, what I would do is decrease the nesting of the interface a bit by presenting three top-level options per data source:
Download latest version, Download specific version, Do not download.
We could also do this later if you feel like you've spent enough time on this by now.

Copy link
Contributor

@wm75 wm75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool, thanks!
Just tiny things left to do here.

@wm75
Copy link
Contributor

wm75 commented Jul 20, 2022

One more minor thing: I just noticed that pangolin-data/-assignment version 1.2.133 cannot be installed because the github release tags for them are v1.2.133.2 for pangolin-data and v1.2.133.1 for pangolin-assignment.
I guess we can leave that as is for now since the error message you're getting from the DM is clear enough (alternatives would be to fix the upstream csv file, and ask if those two minor versions are actually compatible with each other, or to introduce something like a tag_translation dictionary into the DM).
Maybe we can fix the upstream file with its next update.

@pvanheus
Copy link
Contributor Author

pvanheus commented Jul 20, 2022

One more minor thing: I just noticed that pangolin-data/-assignment version 1.2.133 cannot be installed because the github release tags for them are v1.2.133.2 for pangolin-data and v1.2.133.1 for pangolin-assignment. I guess we can leave that as is for now since the error message you're getting from the DM is clear enough (alternatives would be to fix the upstream csv file, and ask if those two minor versions are actually compatible with each other, or to introduce something like a tag_translation dictionary into the DM). Maybe we can fix the upstream file with its next update.

My goodness! pangolin-data versions that look like PANGO lineages.

I had a look over on Github and its a very confusing situation:

image

The version is called v1.2.133 in the comment and v1.2.133.1 in the release.

Since its such an old release, I'd ignore it except it gets included if people select to download all versions (and then causes an error).

Copy link
Contributor

@wm75 wm75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a tough journey, but the result looks great to me!

Thanks @pvanheus!

@bgruening bgruening merged commit 902cce0 into galaxyproject:master Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
virology Issues related to better support for virology of SARS-CoV-2, MPXV and more.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants