-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hypothesis testing for vcf2zarr #249
Comments
I'd be very happy to move it into this repo, but maybe it is actually something of more general use? How much trouble would packaging it separately be? |
Here's a branch that uses hypothesis-vcf to generate VCFs: main...tomwhite:bio2zarr:hypothesis-vcf-tests It's been passing for ~1000s of generated examples, which gives me confidence that vcf2zarr is handling lots of edge cases. But I just ran it again and it found a failing VCF which needs looking into. Perhaps we should run it as a separate GitHub Action workflow, or maybe even manually for the moment. I've also had to modify the VCF generating code (currently in sgkit), so that's probably not quite ready to release separately yet either. |
I think it would be useful generally, and could be listed on https://hypothesis.readthedocs.io/en/latest/strategies.html. It would need minimal packaging and just a README for documentation I think. |
I've moved the hypothesis-vcf code into its own repository at https://github.com/tomwhite/hypothesis-vcf. if that looks OK, I'd like to move it under https://github.com/sgkit-dev. |
LGTM - I think it would be a great addition to sgkit-dev |
The hypothesis-vcf code is now in https://github.com/sgkit-dev/hypothesis-vcf. Thanks for fixing #251 @jeromekelleher. I've rebased and rerun the code in my branch at https://github.com/tomwhite/hypothesis-vcf and it hasn't found any more problems. What do you think the next step is? Have a CI job that runs just the hypothesis tests once a day? |
I'm not sure this would do anything different to just adding a hypothesis job as part of normal CI. If we tune it to run for < 30 seconds and it runs with a different seed each time, it shouldn't get in the way and give us good coverage. We're not expecting it to break, so shouldn't lead to noise for contributors. |
I just had a quick look at the timings, and each call to |
Is it much better with |
Yes! It takes around 30 seconds with the default number of examples. So we could probably just use that. |
Should that be the default if you're not using the CLI? |
The issue is that it's using a home-grown syncronous exector to do it ( Line 82 in d192054
|
Makes sense. |
It would be good to use the Hypothesis strategy for generating VCF files that's in sgkit against vcf2zarr, to check for corner cases in conversion.
I wonder if we should move the Hypothesis VCF code to this repo, or release as a separate package (it may be of general interest)?
The text was updated successfully, but these errors were encountered: