Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stating the expected headers #13

Closed
carbocation opened this issue Oct 21, 2021 · 2 comments
Closed

Stating the expected headers #13

carbocation opened this issue Oct 21, 2021 · 2 comments

Comments

@carbocation
Copy link

In the locuszoom document about preparing data, which is linked from localzoom, the specific names of the expected data columns are not stated. It would be useful to list the names that can be automatically detected, so that we can more easily tell how to munge our file headers for easy compatibility with localzoom.

@abought
Copy link
Member

abought commented Oct 21, 2021

Thanks for the feedback- we just added that link last night, and it's nice to hear that new features are noticed!

I agree about adding this to the documentation for a future release. As a general rule, we try to auto-detect columns using heuristics from a survey of common file formats; if the parser isn't recognizing your data, let us know and we will try to make improvements. Our goal is to handle a variety of common files out of the box.

For example, your bug report reminded me that some of the fields in the GWAS catalog standard file format were not being correctly detected, and this will be improved in a future release.

For your own reference until I can find time to revise the docs:

  • We try to handle many possible column names, but as a result, we err on the side of being strict about the meaning of fields used to draw a plot. For example, some GWAS programs output "effect" alleles without orienting to a reference genome; since ref/alt are required to calculate LD information, our parser will not attempt to auto-guess columns.
  • For a list of sample columns that our site expects to see internally, you can download one of the sample harmonized GWAS files output by the pipeline at my.locuszoom.org, eg: https://my.locuszoom.org/gwas/236887/data/
  • Historical note: originally, we tried to have presets for common programs, but oftentimes, popular programs give different output based on options or across versions. Alas for data munging!

@abought
Copy link
Member

abought commented May 28, 2022

I've made some improvements to the parser to handle the EBI format since our last conversation. I'll close this ticket for now.

There really isn't one exhaustive header list (because our parser tries to be flexible), so I think overly prescriptive docs might not be the way to go.

If anyone thinks that their file format could stand to be better supported, feel free to reach out with a list of example headers and we could try to improve the auto-detect features. Otherwise, I hope the example files and notes are useful. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants