Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF, ALT, QUAL, FILTER, INFO fields stored multiple times for each sample? #63

Closed
shubhamchandak94 opened this issue Aug 13, 2019 · 2 comments

Comments

@shubhamchandak94
Copy link

Hi,
If I import a VCF file into genomicsDB containing many samples, say something from the 1000 genome project, are the fields that are common to all samples (the ones mentioned in the subject line) stored in each TileDB cell? Or are they stored just once per variant?
Regards,
Shubham

@kgururaj
Copy link
Contributor

kgururaj commented Aug 15, 2019

Hello Shubham,
Yes, all the common fields are stored in every TileDB cell for each sample.

GenomicsDB was developed primarily for storing variant data from many individual samples (many VCFs, each VCF with 1 sample, say from the output of a variant caller) and then jointly querying/processing the data. It doesn't work well when the variant data from multiple samples is already combined into a single VCF.

We have seen this issue before and gave some thought to it. In the end, I gave up trying to make multi-sample VCF import into GenomicsDB efficient.

@shubhamchandak94
Copy link
Author

Thanks for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants