-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional genomic-scale for popgen stats #893
Comments
A consideration is whether specifying A dimension of length 1 would potentially simplify some things programmatically, but may be confusing to users. It may also complicate post hoc generalization of methods such as The absence of a genomic-dimension when |
The approach we've taken in tskit to "extra" dimensions question is drop them by default where possible, but to always keep them if the user has specified explicit windows (or anything else that changes the output dimensions). That seemed like a good balance between doing the "right" thing by default, but also giving the user full control and not changing the number of dimensions arbitrarily when the user changes the number of windows. Would this general approach map here? |
In general, the |
That makes sense
In this context would |
Maybe we could sketch out a few different usages here with mixtures of default and non-default options to see what the most sensible approach is? I'm having trouble seeing exactly how the |
I've started on one possible implementation here, just to see what it looks like: https://github.com/tomwhite/sgkit/tree/by-genome
No window.
An alternative would be to have a
Interested to hear people's thoughts. [BTW The naming of the existing |
This looks good @tomwhite, I personally prefer the first option. I wonder if we should use a more specific argument name rather than |
Looking at this again I think I prefer the second option. The The following table shows how statistics are grouped (per variant, per window, whole genome) depending on the value of the
With the second option there is no The second option is simpler to introduce into the codebase, since it just needs the |
SGTM @tomwhite, I'll cast my vote for adding the |
SGTM also, but this may require some re-working of the |
See discussion in #888
Most methods within
popgen.py
are automatically applied over windows and default to variants if the dataset has not been windowed. It would be useful to optionally apply these methods over the whole genome and potentially per contig. Additionally, some methods such asidentity_by_state
andWeir_Goudet_beta
are currently applied to the whole genome, but it would also be useful to apply them within windows/contigs.An option to allow more user control, whist keeping sensible defaults, would be to introduce a
by
parameter which can be used to specify the genomic scale (e.g., one of{"genome", "windows", "variants"}
). The specified scale would determine the genomic-dimension and shape of the resulting values.The text was updated successfully, but these errors were encountered: