Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy script to convert vcf to zarr to this repo #35

Closed
tnguyensanger opened this issue Jun 25, 2020 · 0 comments
Closed

Copy script to convert vcf to zarr to this repo #35

tnguyensanger opened this issue Jun 25, 2020 · 0 comments

Comments

@tnguyensanger
Copy link
Contributor

Copy script from https://github.com/malariagen/legacy_pipelines/blob/master/prod-tools/scripts/vcf_to_zarr.py to this repository.

@gbggrant All the legacy scripts used the by vrpipe (the legacy workflow management system) should be found in https://github.com/malariagen/legacy_pipelines in case you want to refer to them.

The legacy vcf_to_zarr.py script needs edits for the new vector pipelines, which we will review in a PR:

  • Previous versions of zarr (until at least zarr v2.1.4) set permissions on files underneath a zarr directory to be read-write by user only, regardless of your umask. The newest version of zarr v2.4.0 will obey your umask. The legacy vcf_to_zarr.py script contains code to traverse the directory tree to explicitly set permissions. This code is no longer required.
  • the function vcf_to_zarr. zip_zarr() will zip the converted zarr using python package zipfile with zip64 extensions disabled. This was required on legacy machines which used old versions of zip which did not support zip64 extensions. zipfile requires zip64 extensions when the zipped file is > 4GiB (according to v3.8.3 zipfile documentation https://docs.python.org/3/library/zipfile.html). At the time, we were creating small zip files < 4GiB, so it was not an issue. Either we enable zip64 extensions to allow for larger zipfiles, or we should scrap using python package zipfile and just use a recent version of zip directly. Whatever turns out to be easier.
@tnguyensanger tnguyensanger mentioned this issue Jun 25, 2020
2 tasks
alimanfoo added a commit that referenced this issue Jul 28, 2020
* Addresses #35.  Copy https://github.com/malariagen/legacy_pipelines/blob/master/prod-tools/scripts/vcf_to_zarr.py in its original form to this repo

* rework and test

* update spec

* clean up

* fix missing MQ

* fix nit

Co-authored-by: Alistair Miles <alimanfoo@googlemail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant