Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

template for requesting addition of a new species #772

Merged

Conversation

petrelharp
Copy link
Contributor

This would be a way for someone to compile the relevant information for adding a new species without doing the coding.

@codecov
Copy link

codecov bot commented Mar 2, 2021

Codecov Report

Merging #772 (8026455) into main (e88fb3d) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #772   +/-   ##
=======================================
  Coverage   99.62%   99.62%           
=======================================
  Files          34       34           
  Lines        2413     2413           
  Branches      298      298           
=======================================
  Hits         2404     2404           
  Misses          4        4           
  Partials        5        5           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e88fb3d...8026455. Read the comment docs.

Copy link
Member

@grahamgower grahamgower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Maybe we don't need the chromosome list though?


**Chromosome structure:**

- [] list of chromosomes with *name* and *length* (in bp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are coming from ensembl now, so this isn't strictly necessary, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of which, the Ensembl ID is a necessary bit of info.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but we should give more info on how to get the Ensembl ID, etc.


**Chromosome structure:**

- [] list of chromosomes with *name* and *length* (in bp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of which, the Ensembl ID is a necessary bit of info.


**Recombination rates:**

- [] genetic map (as a .csv) of recombination rates **(optional)**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or Hapmap? Don't want to people to convert to CSV if it's already in the right format.

@petrelharp
Copy link
Contributor Author

Speaking of which, the Ensembl ID is a necessary bit of info.

Hm. So, here's the species that are apparently available on ensembl:

I think that's all? I can't find a list of ensembl sites.

And, what do we do about species not in Ensembl (e.g., there are no Mimulus, looks like)? Do we say that we're only for species with annotations uploaded to Ensembl? Any idea how difficult getting a new species on there is?

@jeromekelleher
Copy link
Member

I guess we could put the data files in by hand for species that aren't in Ensembl. We'd have to introduce some level of QC then, though.

Any idea how difficult getting a new species on there is?

No - can't imagine it's a quick process though, by the time the data gets into a release.

@petrelharp
Copy link
Contributor Author

I guess we could put the data files in by hand for species that aren't in Ensembl. We'd have to introduce some level of QC then, though.

Can we do this currently or would this require a lot of changes to the infrastructure? In other words, is this something we want to allow in the "adding to the zoo" workshop?

@petrelharp
Copy link
Contributor Author

Ok, I've updated this - I think it's OK now, unless we think we will never want to support non-Ensembl species.

@petrelharp
Copy link
Contributor Author

More Ensembl info: from this page

Ensembl does not produce genome assemblies, instead we provide annotation on genome assemblies that have been deposited into the INSDC (GenBank, ENA, DDBJ) and are publicly available. We select species to annotate on a case-by-case basis ...

@jeromekelleher
Copy link
Member

Can we do this currently or would this require a lot of changes to the infrastructure? In other words, is this something we want to allow in the "adding to the zoo" workshop?

No, probably not. I think we should focus on Ensembl species for the workshop.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@petrelharp petrelharp merged commit b9bbee7 into popsim-consortium:main Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants