Add DOI and WikipediaVersioned Extractor #1049

shawntanzk · 2022-03-23T16:34:31Z

Adding DOI and WikipediaVersioned (two things used quite often in a few of the obo foundry ontologies)
Followed patterns from searching repo - probably needs a proper looking over to make sure I didn't do anything stupid.
Tagging @hkir-dev & @matentzn
Related to obophenotype/uberon#1874

matentzn · 2022-03-23T18:14:30Z

Have you tested this or just guessed it from the source code?

Can we add

OMIM, OMIMPS, DOI, ORCID?

Or better yet, add support for an external prefix map?

shawntanzk · 2022-03-23T19:59:31Z

@matentzn guess it lol - I can try building it, but currently don't have eclipse installed, and not sure even if I did I really know how to handle stuff. Was mostly doing this cause I had some free time and thought I'd try to take some workload off @hkir-dev (was hoping there was already mapping, but after search realised it was this way, so while I was at it, thought I'd do these changes following pattern, hence really needing proper looking over).

Or better yet, add support for an external prefix map?

100% would be keen for this, but it might be beyond my skillset

hkir-dev · 2022-03-23T20:11:25Z

Good progress @shawntanzk . Seems we need to fix the doi regex (though seems it is a little difficult to have a perfectly working one). We can work and test together.

@matentzn providing an external map might be the ultimate solution, but it will require a lot of changes (config menu updates, config file validation and processing etc.) It would be good to go iteratively and have a first version with the current design.

matentzn · 2022-03-23T20:13:10Z

Sound good @hkir-dev, can you perhaps add then OMIM, OMIMPS, DOI, ORCID, Orphanet? This would make some of our powerusers very happy!

shawntanzk · 2022-03-24T10:10:57Z

OMIM: https://omim.org/entry/
OMIMPS: https://www.omim.org/phenotypicSeries/
ORCID: https://orcid.org/
Orphanet: https://www.orpha.net/consor/www/cgi-bin/OC_Exp.php?Expert=

hkir-dev · 2022-03-25T14:53:36Z

Unit and manual tests completed, new extractors are working as expected.

hkir-dev · 2022-03-25T15:07:56Z

But observed a malfunction in the already existing OboFoundryLinkExtractor.
UBERON:0007329 is redirected to identifiers.org/UBERON:0007329 instead of purl.obolibrary.org/obo/UBERON_0007329

I suspect about the order of IdentifiersDotOrgLinkExtractor and OboFoundryLinkExtractor but seems this order changed intentionally: dd514de

Independent from the current PR, obo link extraction requires further analysis and a separate bug report.

shawntanzk · 2022-03-25T15:30:44Z

Thanks @hkir-dev, see you cleaned up some of my silly mistakes too heh, thanks heaps!

matentzn

Excellent @hkir-dev! Looks great!

matthewhorridge · 2022-05-23T17:46:37Z

This is a great contribution! Thanks very much @shawntanzk. We really appreciate it!

shawntanzk added 2 commits March 23, 2022 14:47

Add DOI and WikipediaVersioned Extractor

69b533b

file name typo

eaebf4e

hkir-dev and others added 3 commits March 24, 2022 10:13

fixed DOI regex and some test cases

e7d9980

added OMIM, OMIMPS, ORCID, Orphanet

e9d0aa9

Tests completed and javadocs updated

4fbe1d0

matentzn approved these changes Mar 25, 2022

View reviewed changes

matthewhorridge merged commit 0aab1be into protegeproject:master May 23, 2022

shawntanzk mentioned this pull request May 23, 2022

Check that WikipediaVersioned links resolve obophenotype/uberon#1874

Closed

shawntanzk mentioned this pull request Aug 16, 2023

Addresses #2103 Edit 'pain receptor cell' obophenotype/cell-ontology#2104

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DOI and WikipediaVersioned Extractor #1049

Add DOI and WikipediaVersioned Extractor #1049

shawntanzk commented Mar 23, 2022

matentzn commented Mar 23, 2022

shawntanzk commented Mar 23, 2022

hkir-dev commented Mar 23, 2022

matentzn commented Mar 23, 2022

shawntanzk commented Mar 24, 2022

hkir-dev commented Mar 25, 2022

hkir-dev commented Mar 25, 2022 •

edited

Loading

shawntanzk commented Mar 25, 2022

matentzn left a comment

matthewhorridge commented May 23, 2022

Add DOI and WikipediaVersioned Extractor #1049

Add DOI and WikipediaVersioned Extractor #1049

Conversation

shawntanzk commented Mar 23, 2022

matentzn commented Mar 23, 2022

shawntanzk commented Mar 23, 2022

hkir-dev commented Mar 23, 2022

matentzn commented Mar 23, 2022

shawntanzk commented Mar 24, 2022

hkir-dev commented Mar 25, 2022

hkir-dev commented Mar 25, 2022 • edited Loading

shawntanzk commented Mar 25, 2022

matentzn left a comment

Choose a reason for hiding this comment

matthewhorridge commented May 23, 2022

hkir-dev commented Mar 25, 2022 •

edited

Loading