Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constants for dimensions and variables #17

Closed
eric-czech opened this issue Jul 6, 2020 · 3 comments
Closed

Constants for dimensions and variables #17

eric-czech opened this issue Jul 6, 2020 · 3 comments

Comments

@eric-czech
Copy link
Collaborator

Can we live without constants for dimension names in datasets (cf. https://github.com/pystatgen/sgkit/pull/10)?

Specifically I mean these in api.py:

DIM_VARIANT = "variants"
DIM_SAMPLE = "samples"
DIM_PLOIDY = "ploidy"
DIM_ALLELE = "alleles"
DIM_GENOTYPE = "genotypes"

Using these to build/append a dataset involves something like this:

data_vars = { "variant/contig": ([DIM_VARIANT], variant_contig)}

What I'm wondering is if we're going to use constants, why stop at the dimension names? It could look like this:

data_vars = { f"{VAR_GROUP_VARIANT}/{VAR_NAME_CONTIG}": ([DIM_VARIANT], variant_contig)}

but I doubt any of us would prefer it. I think the two biggest advantages of the constants are:

  1. Preventing typos
  2. Allowing us to change the names

Of these, 2 seems unlikely to be important (and we'll probably not use the constants in examples/documentation anyhow) and 1 might eventually be solved with things like python/typing#28 (comment) and pydata/xarray#3967. I'm not going to hold my breath for that, but I do think it's worth asking whether or not we would all prefer this instead:

data_vars = { "variant/contig": (["variants"], variant_contig)}

As a user, I think I would be happy to make my own constants or use mypy Literal types if I wanted some static safety and to just live with the risk of typos otherwise. As a contributor, I'm not so sure but I'm leaning towards preferring the latter. Any thoughts?

@jeromekelleher
Copy link
Collaborator

I don't have a strong opinion here.

@alimanfoo
Copy link
Collaborator

Similar, no strong opinion. If just using strings, any typos should get caught in unit tests.

@eric-czech
Copy link
Collaborator Author

Thanks guys. I've stopped using them in the functions I'm working on and will stick with that from now on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants