Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for genomic identifiers #37

Merged
merged 1 commit into from
Aug 17, 2018
Merged

add support for genomic identifiers #37

merged 1 commit into from
Aug 17, 2018

Conversation

afrubin
Copy link
Contributor

@afrubin afrubin commented Apr 26, 2018

I have implemented the validators and url resolvers for several common genomic databases. Each has test cases, and the tests pass when run locally.

@lnielsen
Copy link
Member

lnielsen commented May 3, 2018

Thanks for the PR - do you have links to some formal descriptions of the identifiers?

@afrubin
Copy link
Contributor Author

afrubin commented May 4, 2018

Here are links to the formal descriptions that I used to write the validators. URL resolvers were primarily based on information from http://identifiers.org/

I have merged the 1.0.1 changes and corrected the omission of a valid BioProject accession format.

NOTE: For the RefSeq validator, the pattern could/should be more strict in terms of the number of digits (6 or 9) following the various prefixes when that is specified by the standard, but it would make the regex quite a bit longer. The NZ_ pattern is unusual and I couldn't find clear documentation for the format but NZ_[A-Z]{4}\d+ seems to be correct.

@lnielsen lnielsen self-assigned this May 6, 2018
@afrubin
Copy link
Contributor Author

afrubin commented Jun 9, 2018

@lnielsen Any update on the status of this PR?

@lnielsen lnielsen added this to In progress in Invenio Sprint 2018 Week 33-34 via automation Jul 23, 2018
@lnielsen lnielsen moved this from In progress to Pending review in Invenio Sprint 2018 Week 33-34 Jul 23, 2018
* Adds SRA, BioProject, and BioSample identification.

* Adds Ensembl, UniProt, RefSeq, GenBank/RefSeq identification.

* Adds ENA BioProject format identification.
@lnielsen
Copy link
Member

Sorry for holding up the PR - I'm ready to merge it now, just one last clarification. Who is the legal copyright holder of your contribution? Is it you personally, or is it e.g. your employer?

@lnielsen lnielsen moved this from Pending review to Pending merge in Invenio Sprint 2018 Week 33-34 Aug 14, 2018
@afrubin
Copy link
Contributor Author

afrubin commented Aug 16, 2018

Thanks for coming back to this!

Who is the legal copyright holder of your contribution?

The legal copyright holder is me, Alan F Rubin.

@lnielsen lnielsen merged commit fb0afb3 into inveniosoftware:master Aug 17, 2018
Invenio Sprint 2018 Week 33-34 automation moved this from Pending merge to Done Aug 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants