Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEMINI load 0.18.1.0 fails for missing yml configuration file and database index #165

Open
4 tasks done
jennaj opened this issue Oct 8, 2018 · 37 comments
Open
4 tasks done
Assignees
Labels
functionality usegalaxy.org tool/dependency/function fix usegalaxy.org reference data CVMFS / IDC / Refgenie

Comments

@jennaj
Copy link
Member

jennaj commented Oct 8, 2018

ORG was recently updated to v 0.18.1.0. It needs to have the updated 2018 annotation indexes added. Choosing either 2014/2015 or 2016 annotation causes failures.

Workaround for users: Use the tool at https://usegalaxy.eu. It will fail at https://usegalaxy.org.

Current list of tasks: https://github.com/galaxyproject/usegalaxy-playbook/projects/2#card-15351010


OLD, keeping for tracking history

To fix the problem:

  • Finish two honeybee genome indexes. Involves fixing BWA-Mem DM runtime problems.
  • Run https://toolshed.g2.bx.psu.edu/view/iuc/data_manager_gemini_database_downloader/172815da3d41
    to get the 2018 indexes installed (DM already installed at Test DM server)
  • Test indexes "On test server"
    - [ ] Update all current DMs
    - [ ] Make list of missing DMs and add
    - [ ] Skip this Cleanup data tables so that only the most current Gemini indexes are listed. If users report problems, inform them to use the most current index by date.
    - [ ] Retest tool on Test to see if all install/dependency issues are resolved
  • Cleanup Test duplicates in __dbkeys__ table(s)
    - [ ] Push the new indexes to CVMFS so they are available at ORG.
    - [ ] Test indexes "On Main"
    - [ ] Skip this: Consider updating Gemini wrapper so that it better handles what databases to list. It seems like the tool version 0.18.1.0 is dependent on a specific version of the annotation created by the updated data manager version 0.18.1.0. If this is expected ongoing (tool and index version dependent on eachother) -- should the DM should remove all prior indexes and replace with the new one?
    - [ ] Simplify this Next time we update Gemini, updating the DM and creating indexes should be part of the install process. When the tool is updated, ping to have the existing "most current" index tested and if it fails, create a new one using the matching, newer DM version.

The v 0.18.1.0 wrapper fails at both ORG and EU when the 2014/5 or 2016 indexes are selected but they are listed as being available on the tool form. I can't think of another tool that works this way - if an index is listed, it is compatible with the tool wrapper version.

ORG test history: https://usegalaxy.org:/u/jen/h/test-history-gemini
screen shot 2018-10-08 at 4 16 02 pm

EU test history: https://usegalaxy.eu/u/jenj/h/test-history-gemini-tutorial
screen shot 2018-10-08 at 4 15 44 pm

ping @davebx @natefoo @bgruening

@jennaj jennaj added the functionality usegalaxy.org tool/dependency/function fix usegalaxy.org label Oct 8, 2018
@jennaj jennaj added this to Unprioritized issues impacting main across repos in Tracking of issues impacting Main via automation Oct 8, 2018
@jennaj jennaj moved this from Unprioritized issues impacting main across repos to To test on Main in Tracking of issues impacting Main Oct 12, 2018
@jennaj jennaj moved this from To test on Main to Items to consider for weekly projects in Tracking of issues impacting Main Oct 12, 2018
@jmchilton
Copy link
Member

Am I assigned this ticket to run the data manager or fix the tool wrapper?

@jmchilton
Copy link
Member

@jmchilton
Copy link
Member

If this is expected ongoing (tool and index version depedent on eachother) -- should the DM should remove all prior indexes and replace with the new one?

I would imagine not right - that would break compatibility? It would better to augment the data manager to add another column with something describing a schema version and have the tool filter the data table on compatible schema versions. I don't know if we can add a new column to an existing data table definition though - we might need to update the tools and data manager to use a new gemini_databases_versioned name with 5 columns instead of gemini_databases with 4 columns.

We could also embed the gemini version in the name of the tables produced - maybe that would be enough to indicate what users should do?

@martenson martenson added this to Week of October 15 in Bugs being fixed Oct 15, 2018
@jennaj jennaj moved this from Items to consider for weekly projects to WIP in Tracking of issues impacting Main Oct 19, 2018
@jennaj
Copy link
Member Author

jennaj commented Oct 19, 2018

@jmchilton Is the plan to fix the tool then reinstall that version?

If that will take a while, we should get test-datamanger's data tables reset to cvmfs/main content done meanwhile. The server needs to be prepped before any new indexes can be created (for this wrapper, or any others).

@natefoo Is this something John or I can do, or something that only you can do?

Regarding version info: Users will still get confused if we just list the dates -- we don't include this for other tools and assumes too much technical understanding from users. I would suggest filtering for valid indexed based on this info and including it in the display/dropdown.

Data versions matter for reproducibility overall. I think that adding a versioned loc/table for all existing data tables/locs would be a big step forward. We've needed that for a while and will definitely find it useful later on once indexes are shared across usegalaxy.* servers (and shared with users).

We could add for this tool to have an example for GUI enhancement dev to display this to users and to model other uses against (determine best universal metadata format/content). For older indexes, we could probably capture this info from cvmfs index file creation dates.

@jmchilton
Copy link
Member

@jmchilton
Copy link
Member

I see nothing in the above script that particularly ties the files being created to the version of Gemini being used to install it. And it sounds like with some more recent versions of Gemini that the reference data structures have stabilized across versions a bit. Since 98% of people are going to use the latest version of the Gemini tool and 98% of those uses are going to want to select the latest database - I'd recommend just running the latest data manager and see if the usage problems go away. If not, the next step would probably be to just manually change those data manager entries so the latest version is on top and the text in there gives some indication that older indices have problems with newer versions of the tool - @natefoo is possible to manually change the data manager generated table texts - not the keys but the display value?

If after these two changes, there are still ongoing problems - we could consider introducing some sort of versioning to the data entries - but we would have to maintain that ourselves since Gemini doesn't seem to track this at all or have any concept of that. This is challenging because none of us is really expert enough to maintain that and it would require a lot more expertise to run the data manager I think. Also Gemini itself doesn't seem actively developed at a rapid pace - so this is a lot of overhead to handle future versions that may never materialize or may only materialize very slowly over many years. With any luck though the above small changes would reduce the usage problems to zero and be enough to workaround things for now.

@jmchilton
Copy link
Member

we don't include this for other tools and assumes too much technical understanding from users

Other tools have well behaved reproducible index generation, this tool does not. We're pulling a bunch of random files down from S3. Anything other than this is a misrepresentation of what is happening and we shouldn't do that just to simplify things. The date simplifies things as much as we can I think - and while it does require some technical understanding from the users - the users should understand this is what is happening and when their index data was generated. If the goal is to hide these details from the users - we should eliminate the selection all together and always just use the latest data we have available - and while that is good for usability it is really not good for reproducibility or transparency - hence the dates.

@jennaj
Copy link
Member Author

jennaj commented Oct 23, 2018

@jmchilton Thanks, I agree with all of this. I'd like to make the Gemini index, use the same date format for the naming, and all indexes can stay in the pull-down.

In order for me to create the new index (and others), the server where we create data indexes needs to have the data tables refreshed to reflect the content of cvmfs (same data as available on main). In the test tables, there are currently duplicates and failed indexing jobs that left partial data. The server is test-datamanager and I believe has the same tables as the test server does. @davebx thinks this is possible and @natefoo has done that in the past.

In short, go back to the original list of todo items above, and drop the last two. The first step is to reset the test tables back to canonical content. Could we do that this week? The tool is not usable with the currently available indexes (at org -- eu already has an updated index).

@jennaj jennaj moved this from Week of October 15 to WIP in Bugs being fixed Oct 24, 2018
@jennaj
Copy link
Member Author

jennaj commented Nov 13, 2018

Gemini data creation is failing. The data is downloaded and appears to be intact, but the YAML could not be written. This causes Genimi load to still fail.

The Gemini DM ends in the history with a "green" dataset but has this in the stdout: https://gist.github.com/jennaj/0a4866aa58e082bb7ae352fde2b7ad4b

@bgruening Does this look familiar or do you know of the best way to fix? I'm guessing we'll need to remove the indexes already downloaded and plus remove that run the Gemini loc file/data table, Then fix some permissions issue (??). And after that run the DM again from scratch. Or, can we just fix the permissions for the YAML write and rerun? My concern is creating duplicates or leaving an index on the tool form's "Choose a gemini annotation database" menu that is not useable.

cc @jmchilton @natefoo

@jennaj
Copy link
Member Author

jennaj commented Nov 13, 2018

Ok -- we are upgrading the Gemini load tool on Test. A version mismatch between the tool/index may be the problem. I'm not quite sure how/when the yaml is used, it might be extra (for Ephemeris?). After upgrading I'll run it again and see happens.

@jennaj
Copy link
Member Author

jennaj commented Nov 13, 2018

@bgruening The indexes are still problematic for us, even when using the updated load tool. Looks like the YAML file is needed. Could you help us to configure this correctly? Not sure if it will be @jmchilton or @natefoo doing that, or possibly @davebx. Guessing we need to wipe everything we have now, get the config correct, then rerun... ?

@nekrut nekrut moved this from WIP to Week of October 29 in Bugs being fixed Nov 13, 2018
@nekrut nekrut moved this from Week of October 29 to WIP in Bugs being fixed Nov 13, 2018
@nekrut nekrut moved this from WIP to Week of Nov 12 in Bugs being fixed Nov 13, 2018
@nekrut nekrut moved this from Week of Nov 12 to WIP in Bugs being fixed Nov 30, 2018
@jennaj
Copy link
Member Author

jennaj commented Nov 30, 2018

Since we cannot figure out how to configure the indexes, and since Gemini only supports hg19, we decided to drop it from main.

Moved checklist to card: https://github.com/galaxyproject/usegalaxy-playbook/projects/2#card-15351010

- [ ] Remove (not just hide) all Gemini tools/versions. None are functional, so no use in "hiding" for prior use. We will point people to Galaxy EU or other sites to use these tools. @jmchilton
- [ ] Remove the corresponding Gemini DM @jmchilton
- [ ] Remove Gemini data currently in cvmfs -- or rather, the locs/tool-config files in cvmfs, the reference data itself is already not there (old or new) from what anyone can tell @jennaj
- [ ] Remove Gemini partially installed DM data at the Test server. The DM failed for some step (yaml config) and runs against it error out for a missing file that was not downloaded. @jennaj

@jennaj jennaj removed this from WIP in Bugs being fixed Nov 30, 2018
@wm75
Copy link

wm75 commented Dec 6, 2018

This may come a bit late, but I started working on updating the gemini tool suite and on a fix for the data manager as part of this. Based on some first experiments, it looks as if the data index / load tool issue is relatively easy to fix so, if it hasn't happened yet, you might want to reconsider the removal of the tools from main.

@jennaj
Copy link
Member Author

jennaj commented Dec 7, 2018

Ok, I'll get some feedback. These are really useful tools (imo) and we do have Gemini users. If can be fixed up, then seems totally reasonable to install the updated versions. We were not sure if these were supported anymore for future updates

We only have one version of the suite now, two older indexes that are in locs but not actually in cvmfs so won't work with the old indexes, and one newer index created (that is buggy) from the DM version that matches the suite version but it is just on our test server, not published to main/cvmfs.

So no way to do the first Gemini step (load), in any combination of tool version/index version at main. The older tool/index versions don't work at EU either, only the newest tool with the newest index. If curious, I ran tests of all combos available at both servers in the test history above.

We could decide (for main):

  1. now: strip all the prior tools/data. none will be useful going forward.
  2. later (once suite updated): install the new versions + build indexes

The first step would need to be done at some point anyway, keeping the old tools/indexes just take up space and create an opportunity for job failures that we can predict (and prevent by just not making them available). But users might wonder where they went or if being fixed or what. Maybe leaving the old up until replaced (if not too long!) would be better. Is more in line with what we do with other tools that have bugs, that we expect to be fixed, however, most tool bugs are not entirely fatal: eg, just problems with specific functions, sometimes working in earlier versions, etc -- so there is some workaround to offer. These tools have no workaround except to go use them at EU. (And we all know that downloading/transferring histories is tricky .. collection issues and all that.. but being worked on :) .. still, it is another thing to explain that is not quite working. Piles it on when 2 or more issues are involved. People tune out, as you have if stopped reading by now, lol.

@wm75 Is there an estimate yet on when you'd be ready to publish the update to the MTS? Just general ballpark. If over 2-3 months, we might want to strip now. When tools don't work (at all, no workaround), confidence in the rest of the site drops. People don't always bother to find out why things are not working, especially if they are new users (of Galaxy entirely, or of main specifically). I fear they'll just move on, frustrated, and not look back...

We really appreciate your help and feedback with all of this - thank you!

@wm75
Copy link

wm75 commented Dec 7, 2018

Is there an estimate yet on when you'd be ready to publish the update to the MTS? Just general ballpark. If over 2-3 months, we might want to strip now. When tools don't work (at all, no workaround), confidence in the rest of the site drops.

I agree, but luckily it won't be that long. After 2 days of working on this I think I have a PR almost ready (will reference it here, when it's pushed to tools-iuc)
That said, I'm planning to go with John's initial suggestion to introduce an additional version column into a new gemini_versioned_databases table as part of the update to the data manager. This and the .yaml file fix will be incompatible with old tool versions, meaning you will have to build new indexes anyway.
Still, you could leave the non-functional version up for a couple more days for consistency, then switch to the new data_manager and tools.

@wm75
Copy link

wm75 commented Dec 7, 2018

So the relevant PR is galaxyproject/tools-iuc#2204

The corresponding new versions of the data manager and the gemini tools are also available from the TTS and appear to work well (though about half of the gemini tools still attempt to install the old data manager due to some metadata issue).
From our side (usegalaxy.eu), this PR is just the first step towards a gemini update to v0.20.1, but if you want to get a functional v0.18.1 at .org right now, it may be the easiest way forward (although I hope to get the v0.20.1 update ready this year still).

@jennaj
Copy link
Member Author

jennaj commented Dec 11, 2018

@jmchilton @nekrut How do you feel about adding tools + DM from the TTS to main? ^^

@jennaj
Copy link
Member Author

jennaj commented Dec 11, 2018

@jmchilton removed from ticket
@davebx added

Plan: Remove everything Gemini in GUI/data for now at main. Tools, DM, Data.

@jennaj jennaj assigned davebx and jennaj and unassigned jmchilton Dec 11, 2018
@bgruening
Copy link
Member

@jennaj don't add anything from test. These tools and DM will appear soon on the MTS.

@jennaj
Copy link
Member Author

jennaj commented Dec 12, 2018

@bgruening Thanks for the advice! We'll remove the older version/DM for now and once in MTS again can install tools/build indexes fresh using the updated version/DM. :)

@wm75
Copy link

wm75 commented Jan 23, 2019

Wrappers for GEMINI 0.20.1 are now available through the MTS. Since the majority of tools in the suite have received major updates and some tools have been merged into one, I'd recommend to uninstall the old tools, then reinstall only the latest version.
Everything's on usegalaxy.eu already, so you can have a look there before deciding how to move forward.
The DM has been updated and fixed so installing the annotation data should no longer be problematic and not require any manual tweaks.

@wm75
Copy link

wm75 commented Jan 23, 2019

@jennaj, @davebx For fully functional gemini query and gemini actionable_mutations tools, you will have to patch a broken hyperlink in the gemini source. It's a single line of code, as mentioned here:
arq5x/gemini#912 (comment)

Björn patched it on the EU-Server, but that's the only modification we did to the whole suite of tools.

@jennaj
Copy link
Member Author

jennaj commented Jan 23, 2019

@jennaj jennaj added the reference data CVMFS / IDC / Refgenie label Apr 17, 2019
@jennaj
Copy link
Member Author

jennaj commented Apr 17, 2019

The current list of to-do items is in this card. We are waiting for Test to get stable before doing the next steps (reinstalling tools + DM, etc): https://github.com/galaxyproject/usegalaxy-playbook/projects/2#card-15351010

@natefoo
Copy link
Member

natefoo commented Apr 17, 2019

Test should be stable enough to attempt tool installs.

@jennaj
Copy link
Member Author

jennaj commented Apr 18, 2019

Didn't install correctly. @davebx made a PR to fix the issue -- it should go into 19.05. Once on main, we can try again.

https://github.com/galaxyproject/galaxy/pull/7770/files

@jennaj
Copy link
Member Author

jennaj commented Apr 19, 2019

Ok - tools installed and DM running!

@jennaj
Copy link
Member Author

jennaj commented May 7, 2019

@jmchilton
Copy link
Member

That’s a Python 3 error

@wm75
Copy link

wm75 commented May 7, 2019

yes, that's right. The DM opens the file in binary mode, but tries to write a string to it. It must have escaped my attention when I updated the DM, otherwise I would have fixed it back then.

@jennaj
Copy link
Member Author

jennaj commented May 7, 2019

Ok, good that we know what is going on. In private convo I guessed Py3 to @davebx (wow, got one right!)

I think he is fixing it? ping @davebx

@jennaj
Copy link
Member Author

jennaj commented May 7, 2019

Updated: Asked him if wants a ticket ... or do you @wm75 ? Against the IUC repo? Impacts everyone now with 19.05 coming out soon, yes?

@wm75
Copy link

wm75 commented May 8, 2019

adding @nsoranzo
The problematic pattern
open( filename, 'wb' ).write( json.dumps( data_manager_dict ) )
is widely used in DMs so this problem will not be unique to GEMINI, but will manifest itself with most DMs you run under Python3.
IMHO, we should have a PR that fixes the issue everywhere.
The most obvious solution seems to be to open the output file in text mode unless you are concerned about encoding issues.

@nsoranzo
Copy link
Member

nsoranzo commented May 8, 2019

There's already a WIP PR fixing these from @mvdbeek : galaxyproject/tools-iuc#2032

Impacts everyone now with 19.05 coming out soon, yes?

@jennaj No, unless someone installs Galaxy using Python3 (not the default yet).

@wm75
Copy link

wm75 commented May 8, 2019

Excellent, thanks!

@jennaj
Copy link
Member Author

jennaj commented May 9, 2019

Hum, Ok, then we are stuck for adding the index (or any new indexes created by impacted DMs) for cvmfs content that flows from Test > Main/CVMFS until the Python3 fixes are made. We've already upgraded :/ But good to know root problem!

cc @natefoo @davebx @jmchilton Unless we can make our test-datamanager server work as a clone of main instead of test? Or .. I don't know .. some other ideas? Could/should I run this directly on main?

@jennaj
Copy link
Member Author

jennaj commented May 9, 2019

@natefoo made a ticket for the Gemini DM fix here galaxyproject/tools-iuc#2408

@jmchilton could consider that for a weekly project unless someone else picks it up first, seems like we'll need it before anyone else

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functionality usegalaxy.org tool/dependency/function fix usegalaxy.org reference data CVMFS / IDC / Refgenie
Development

No branches or pull requests

7 participants