Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert demographic models to Demes YAML files #1233

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

grahamgower
Copy link
Member

Closes #962.


Lots of decisions needed as to how we proceed, but this shows the basic shape of things. Bikeshedding welcome!

@grahamgower
Copy link
Member Author

Some issues arising from the Demes conversion:

  • Population IDs might change (Demes imposes an ancestors-before-descendents ordering on populations). I guess we could do some mapping to reorder populations again after loading the YAML file. We probably have to do this so we don't break backwards compatibility too badly. But maybe we should deprecate use of population IDs in favour of names anyway?
  • Many models are written using a MassMigration style, where ancestral populations retain the id of an extant population. These might be better modelled with a distinct ancestral population. E.g. PonAbe/TwoSpecies_2L11. Such an update will break the population IDs as indicated in the previous point.
  • Generation times in stdpopsim are a bit of a mess. Some discussion exists in clarify use of generation time and units of rates #1197. There are now many failing tests because the QC models just copied the (old) main models' generation_time attribute. I'm tempted to just yank out the generation time QC tests, because they're fairly meaningless (they resemble this pattern: "does value a match value b, which was copied from a in order to make the test pass"). At least with Demes, the generation time has units, which must be explicitly stated, and all times are expressed in those units. So we'll want to decide on which units each model should have. "years" and "generations" are the most appropriate, but in general we should try to match the originating publication so things like "thousands of years" or "days" might pop up here or there.
  • Some of the failing QC model tests can be fixed with judicious updates of the populoation_id_map list of dicts. But we'll need to fix Demography.from_old_style() error with events at same time as a MassMigration tskit-dev/msprime#2047 for this.

@jeromekelleher
Copy link
Member

Generally looks good to me. I'm pleasantly surprised with how well the autoconversions have gone.

I think we're just going to have to break the population IDs and there's not a lot we can do about that. We'll need to think through the consequences and see how we manage the breakage.

We haven't done a release in 2 years though, so I think there's going to be a pretty painful process of getting things up to release quality anyway.

@petrelharp
Copy link
Contributor

Yes, this looks great! But, I'm (again) a bit lost with the time units conversion. Could you write down what your philosophy is here? I guess I'm trying to sort out in which time units should values be written in:

  • the .yaml file
  • the stdpopsim catalog as displayed in the docs
  • msprime simulations, internally
  • SLiM simulations, internally (this is generations, i think)
  • the output tree sequence

@grahamgower
Copy link
Member Author

Yes, this looks great! But, I'm (again) a bit lost with the time units conversion. Could you write down what your philosophy is here?

It would be nice to clean up the time units, but completely fixing that might be a bit beyond the scope of the present PR. Your input as to what we should aspire to here would be very valuable!

I guess I'm trying to sort out in which time units should values be written in:

  • the .yaml file

I want the time units in the .yaml file to reflect the description of the demographic model from the originating publication(s). I think this is the least confusing option, and least error prone for new models going forward. Publications for the current stdpopsim demographic models will need to be checked manually, and I've not done that yet. This should (eventually) happen in this PR. We have an in_generations() method in demes-python to convert from these time units into generations.

  • the stdpopsim catalog as displayed in the docs

I'm not sure. But either choice should be easy to implement once conversion to YAML is complete. I don't think we need to touch this in the current PR.

  • msprime simulations, internally
  • SLiM simulations, internally (this is generations, i think)
  • the output tree sequence

I think these should all remain in generations. So, no change. In this PR, the conversion to generations will probably happen when the model is read in from the catalog (to keep the internal representation as an msprime.Demography for now). Later, we could pivot away from this and maybe do the conversion to generations when/as needed for each simulation engine.

There is some weirdness/redundancy in time unit conversion in the slim engine, as discussed in #1197. We should fix this, but that is beyond the scope of the current PR. Ideally, this PR won't touch the slim engine at all. In some future PR, we can convert the Demes model to a SLiM representation directly instead of first converting via msprime.DemographyDebugger. I have some demes->SLiM experiments here in case you're interested. But it's not a drop-in replacement for what we have in stdpopsim.

@petrelharp
Copy link
Contributor

That sounds great. What's the role of the time_conversion dict, then?

@grahamgower
Copy link
Member Author

It's an ad-hoc data structure used here to convert the stdpopsim demographic model into the time units used in the original publication, so that those units can be used when converting to the Demes YAML file. This info isn't properly recorded in the demographic models (those are all in generations). I think the stdpopsim.DemographicModel.generation_time field is in units of years, but to be honest I just don't trust those values (mostly because the units haven't been recorded), so I want to go back to the publications and record the details in the time_conversion dict of the demes conversion script. When I do that, I'll just interactively rebase to edit that table and reconvert from DemographicModel -> Demes YAML. It's also why I've included the conversion script in the maintenance/ folder for now---this script won't really be needed after the PR is done and merged, but it's useful for documenting the process while the PR is in draft form and changing things as/when needed during PR development and review.

@andrewkern
Copy link
Member

one question i have on the time unit conventions now -- should we specify in the documentation which units (gens vs years) that models should be implemented in?

looking at #1197 it seems like given the redundancy that @grahamgower built into the slim engine, the issue before was more about style and clarity than correctness. i'd say that we could force devs to just, say, do everything in generations for all future models.

@grahamgower
Copy link
Member Author

Please just ignore what I did with conversion of time units for the SLiM engine - it was a mistake which we should fix at some point. It's confusing and unnecessary.

We should definitely be clear in the dev docs about how we want folks to write models. We could force everyone to write models in generations, but do you think this is the best thing to do for model clarity and correctness? It means that model writers will convert units (probably manually) from what is written in a publication.

@petrelharp
Copy link
Contributor

We should definitely be clear in the dev docs about how we want folks to write models. We could force everyone to write models in generations, but do you think this is the best thing to do for model clarity and correctness? It means that model writers will convert units (probably manually) from what is written in a publication.

I'm with you - specifying things in the units of the original publication is the way to go.

@petrelharp
Copy link
Contributor

Fyi, @mufernando is using a yaml over in #1256

@grahamgower
Copy link
Member Author

If anyone is interested in picking up this PR and pushing it over the line, please shout out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

convert demographic models to Demes YAML files
4 participants