Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Len file missing for indexed genomes at http://usegalaxy.org #2884

Closed
jennaj opened this issue Sep 1, 2016 · 15 comments
Closed

Len file missing for indexed genomes at http://usegalaxy.org #2884

jennaj opened this issue Sep 1, 2016 · 15 comments
Labels

Comments

@jennaj
Copy link
Member

jennaj commented Sep 1, 2016

Problem

Data migration incomplete for new indexed genomes. Certain tools will fail.

Tools impacted

Trackster
Extract Genomic DNA
Wig/BedGraph-to-bigWig
Others that require a .len file

Genomes with issue

rn6
danRer10

Related: #2530

Master Genome Ticket: https://github.com/galaxyproject/galaxy/issues/1470

Workaround: Obtain the rn6 or danRer10 reference genome from UCSC, FTP it to Galaxy, and use it as a Custom Reference Genome. Help for FTP and Custom Genomes can be found on the Galaxy support wiki: https://wiki.galaxyproject.org/Support

ping @natefoo

@jennaj
Copy link
Member Author

jennaj commented Sep 15, 2016

Hello - another ping @natefoo or maybe @martenson ?

@jennaj
Copy link
Member Author

jennaj commented Nov 11, 2016

@galaxyproject/guac Could we please work to resolve this soon? Has been around a while, the issue is clear (just needs to be done), and will help users. Thx!

@natefoo
Copy link
Member

natefoo commented Nov 16, 2016

I could not find any DM-generated length files so I set up a new space for these two and generated them. They should be available on Main now.

@jennaj
Copy link
Member Author

jennaj commented Mar 24, 2017

@natefoo Trackster is not working for dm6 again (it is not in the trackster builds list). danRer10 is fine.

@jennaj jennaj removed the triage label Mar 24, 2017
@natefoo
Copy link
Member

natefoo commented Sep 6, 2017

I'm finally setting up the UCSC build and chrom length fetcher to run automatically and update them in CVMFS once a weekmonth, which should take care of this (including dm6). I don't think this has been run automatically since Test and Main moved to TACC.

@natefoo
Copy link
Member

natefoo commented Sep 7, 2017

Okay, Test is now using the updated data from UCSC. This includes new builds for datasets, new builds recognized for the UCSC browsers ("display at" links) and new chrom length files. I'll do Main tomorrow morning after Anton's workshop.

I set the UCSC updating process up to run on the first Wednesday of the month. Any new genomes added to UCSC at that time will appear in Test/Main once they are restarted after that date (that part will still be manually, but they're typically restarted every few days for tool upgrades, etc.).

xref: galaxyproject/usegalaxy-playbook@f3257f3

@bgruening
Copy link
Member

Thats cool @natefoo! Are in the same process also indices build? Is there a mechanism needed to build them or contribute them. Thinking about contribution possibilities, now that we have more users and mirrors that usegalaxy.org.

@natefoo
Copy link
Member

natefoo commented Sep 7, 2017

This is now also done for Main.

xref: galaxyproject/usegalaxy-playbook@319ff47

@natefoo
Copy link
Member

natefoo commented Sep 7, 2017

@bgruening No, no indices. This would be more complicated since they use DMs, I'd need to write a BioBlend script and wrap starting and stopping a dedicated Galaxy server with it. I'd like to do this, though!

@bgruening
Copy link
Member

Oh @natefoo no! We have this for you: https://github.com/galaxyproject/ephemeris/blob/master/ephemeris/run_data_managers.py

And this is included in Galaxy Docker, so you can do: https://github.com/galaxyproject/training-material/blob/master/topics/chip-seq/docker/Dockerfile.old#L26

In combination with a mountpoint, you could start, run, deploy this automatically I think.

@natefoo
Copy link
Member

natefoo commented Sep 7, 2017

!!!!!!!

You are my hero, @bgruening!

@jennaj
Copy link
Member Author

jennaj commented Oct 27, 2017

Hi - Some tools use the fasta_indexes.loc file still (Freebayes in an example). The new DM databases are not being populated into this table (example: danRer10).

Any idea what is going wrong? Does the fetch DM need a change?

ping @blankenberg

@blankenberg
Copy link
Member

blankenberg commented Oct 27, 2017

@jennaj Freebayes uses samtool indexes, which are contained in the rather ambiguously named fasta_indexes table. Are you building samtool indexes from the all_fasta table with the SAM FASTA index data manager?

@jennaj
Copy link
Member Author

jennaj commented Oct 27, 2017

This was from a user question. I reviewed the tip files and it calls the fasta_indexes.loc which is used to populate the fasta_indexes table, right? I checked our data and this is not populated for the genomes we had to do some manual fixes for. I did run SAMTools indexes on them with the DM but don't see the data in the sam_fa_indexes.loc/table now. It is in the picard_indexes.loc/table. I always run both and in order with other DMs: fetch (w/ create dbkey in this case), sam, picard, 2bit, then others. Rats - I wonder what went wrong or I did wrong. danRer10 is an example. Will look into what actually happened - the DM history is still intact.

@jennaj jennaj closed this as completed Oct 27, 2017
@jennaj jennaj reopened this Oct 27, 2017
@jennaj
Copy link
Member Author

jennaj commented Oct 30, 2017

We can close this out.

Updating the processes for adding indexes to use ephemeris/run_data_managers.py on main is tracked in this ticket now: galaxyproject/usegalaxy-playbook#38

Duplicates in our data are in this ticket: galaxyproject/usegalaxy-playbook#55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants