Ag1000G phase 3 public dataset #468
Unanswered
alimanfoo
asked this question in
Show and tell
Replies: 1 comment 1 reply
-
Thanks @alimanfoo and congratulations on the milestone!
Why not? Is there anything we could do to make it more likely for you to choose our format in the future? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
I just wanted to share that we've released a new public dataset of genome variation data from the Anopheles gambiae 1000 Genomes Project (Ag1000G). The data include genome-wide SNP calls in 2,784 wild-caught mosquitoes, plus some crosses. Documentation about the data is available here.
As part of the data release I've created a Python package to make accessing the data easier, particularly accessing the zarr data in the cloud. The data are not in the native sgkit format, but it is straightforward to create an xarray dataset, and I've added a function to do this, example here. I haven't documented this publicly yet as thought I'd share here and dataset looks right, and feedback welcome.
Hoping this will be a good opportunity to try out sgkit with these new data, there are lots of analyses we'd like to do.
Cheers,
Alistair
Beta Was this translation helpful? Give feedback.
All reactions