Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Papio Anubis genome, recombination map, and demographic model #1216

Merged
merged 1 commit into from
May 16, 2022

Conversation

saurabhbelsare
Copy link
Contributor

@saurabhbelsare saurabhbelsare commented Apr 13, 2022

This fork adds the species Papio Anubis (olive baboon). The sources for the data are:

Assembly: https://doi.org/10.1093/gigascience/giaa134
Mutation rate and generation time: https://doi.org/10.1371/journal.pbio.3000838
Recombination rate/map and demographic model: https://doi.org/10.1093/gbe/evac040

Caveat1: The recombination rate/map was only estimated for the autosomes. Hence I've added the recombination rate for the X chromosome as the mean rate from the autosomes.
Caveat2: I have named the demographic model as PAP_1A22 because I'm not sure what the correct naming convention is.
Caveat3: I have listed a placeholder url following the amazonaws.com format for the genetic map. I have all the map files required for this locally, but am not sure how to upload them to that url.

Copy link
Member

@grahamgower grahamgower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @saurabhbelsare, this looks really good, thanks! I think the pre-commit/black stuff should be done as a separate pull request, but the rest looks good.

I don't know what's going on with the failing tests though. Do the tests fail when you run them on your computer?

rev: 20.8b1
rev: 22.3.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there are reason to change this in this pull request? I recall the black project did make a change to their output recently, so perhaps you could provide a separate pull request for this instead?

black
git+https://github.com/psf/black
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is right, as a specific version would be preferred. In any event, we can discuss this in another pull request dealing only with updating black/pre-commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Graham, I didn't intend to push any of these changes for black and the version. I only made these changes locally for myself because I ran into this error while trying to commit and used the solution they suggested in that thread. I didn't add the precommit yaml file or development.txt to my commit/pull request. Not sure how they got pushed.

I fixed all the issues with tests on my computer before I submitted the pull request, and didn't have any tests failing at that point. How should I fix these issues? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm off to bed now, but I'll take a closer look tomorrow and see if I can figure out the failing tests.

@grahamgower
Copy link
Member

grahamgower commented Apr 22, 2022

Hi @saurabhbelsare,
There is a test which tries to do a minimal simulation for each species, which is using all the memory when simulating PonAnu (and then the test is killed when memory runs out). It looks like the problem is a typo in the mutation rate, which makes it many orders of magnitude too big!

@grahamgower
Copy link
Member

Hi again @saurabhbelsare. I've put the update to black in a separate pull request (#1224). After that's merged, you should be able to remove those changes from this pull request and rebase your PapAnu bits.

There is a failing test regarding the ensembl release number. I'm not sure what the best thing to do here is... do you need that specific version? @jeromekelleher @andrewkern do you recall what was the process here for developers? Should all the species be updated to the given ensembl release in the event that a newly added species needs a newer ensembl release? Or can we handle mismatched versions between species gracefully (ie., just fix the test here and move on, as someone will do the update of other species later)?

And, @saurabhbelsare, to answer your initial questions (sorry, I missed these somehow)...

Caveat1: The recombination rate/map was only estimated for the autosomes. Hence I've added the recombination rate for the X chromosome as the mean rate from the autosomes.

I think this is fine.

Caveat2: I have named the demographic model as PAP_1A22 because I'm not sure what the correct naming convention is.

The stdpopsim developer docs have this to say on the matter:

Demographic models are named using a combination of a descriptive name, information about the simulation, and information about the publication it was presented in. Specifically we use ${SomethingDescriptive}_${number_of_populations}${first_author_initial}${two_digit_date} where the descriptive text is meant to capture something about the model (i.e. an admixture model, a population crash, etc.) and the number of populations is the number of populations implemented in the model (not necessarily the number from which samples are drawn). For author initial we will use a single letter, the 1st, until an ID collision, in which case we will include the 2nd letter, and so forth.

So it would be nice to change the "PAP" in the model id to something more informative. Maybe a descriptor of the population that was used? Or the method? Or something descriptive from the publication?

Caveat3: I have listed a placeholder url following the amazonaws.com format for the genetic map. I have all the map files required for this locally, but am not sure how to upload them to that url.

This will need to be uploaded by the AWS keymaster, @andrewkern.

@saurabhbelsare
Copy link
Contributor Author

I have updated the name of the demographic model and shared the recombination map with Andy. l'll update the url as soon as he uploads the map files to aws and then push the changes here.

@petrelharp
Copy link
Contributor

Ping @andrewkern - have you put the maps on AWS?

@saurabhbelsare saurabhbelsare force-pushed the olive_baboon branch 2 times, most recently from 42fe845 to 75347ae Compare May 10, 2022 18:40
@saurabhbelsare
Copy link
Contributor Author

Hi @jeromekelleher, @andrewkern,
I'm having a test fail, because in my stdpopsim/catalog/ensembl_info.py, the release of ensembl is 106, while the release in the main stdpopsim repo is 103, so the merge tests are conflicting and failing. I'm not sure why my file has the version 106, where it came from, since I haven't edited that file. @petrelharp suggested that you could weigh in on why this is happening and what is the best fix for it?

@andrewkern
Copy link
Member

I've uploaded the PapAnu map to AWS. It can be found here:

https://stdpopsim.s3.us-west-2.amazonaws.com/genetic_maps/PapAnu/papio_anubis_genetic_map.tar.gz

@saurabhbelsare, you should be able to go ahead and edit the URL in your PR to reflect this

as for the issue with the ensembl release, i need to look in to this a bit. stay tuned.

@grahamgower
Copy link
Member

grahamgower commented May 16, 2022

Hey @saurabhbelsare. I opened #1248 about the ensembl release. For your PR, just edit the release number to make the test pass (either in the ensembl_release.py or in the test, it doesn't matter).

id="SinglePopSMCpp_1W22",
description="SMC++ estimates of N(t) for Papio Anubis individuals",
long_description="""
These estimates were obtained from a sample Papio Anubis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These estimates were obtained from a sample Papio Anubis
These estimates were obtained from a sample pf Papio Anubis

id="Pyrho_PAnubis1.0",
description="Pyrho inferred genetic map for Papio Anubis",
long_description="""
These estimates were obtained from a sample Papio Anubis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These estimates were obtained from a sample Papio Anubis
These estimates were obtained from a sample of Papio Anubis

common_name="Olive baboon",
genome=_genome,
generation_time=11,
population_size=335505, # Most recent from Wall et al demographic model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to put a note about where you got this number in the QC issue, so the person doing QC knows where you got it from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants