new command: mint #763

balhoff · 2020-11-06T18:26:41Z

In order to support more decentralized development workflows, I'd like to propose a new command robot mint.

Problem: it’s a real pain to work with different OBO ID spaces at different times in Protégé. Even if they made that work better, it would still continue to be a pain to manage id ranges for different editors, especially for someone making a drive-by pull request on an ontology.

Solution: robot mint

This command would support a workflow like this:

Ontology contributors generate new terms using a pattern (the patterns are configurable):

http://purl.obolibrary.org/temp#$uuid
e.g., http://purl.obolibrary.org/temp#CC0DB546-71C7-4E73-9C54-CD75A4BFB111

Do work on branches, create PR, merge to master.

Have a separate action, like a dedicated Jenkins job or person, which runs robot mint, only on stuff that’s been merged into master.

robot mint --minted-id-prefix "http://purl.obolibrary.org/obo/GO_" replaces UUID IRIs with next GO_NNNNNNN ids in the sequence

The mint command would also insert an annotation minted_from to link to the UUID IRI to be used to resolve issues where someone accidentally continues work on a branch. Maybe we can request a property for this in OMO. In any case it would be a command-line option. minted_from annotations can be filtered from release products.

mint would not generate a new permanent IRI if a term already exists with a minted_from value for the temp IRI.

The mint command could also be used to implement provisional term workflows (not sure how much distinction there really is from the main use case). There could be a mint option for whether to leave temp IRIs in as declared, deprecated, entities (e.g. classes) or to erase and only leave as minted_from annotation IRI values.

If we come to agreement on this, I will implement it.

Related ticket in GO: geneontology/go-ontology#13812

The text was updated successfully, but these errors were encountered:

balhoff · 2020-11-06T18:27:08Z

Preliminary usage output (removed common ROBOT options for clarity):

usage: robot mint --input <file> --minted-id-prefix <iri-prefix>
             --temp-id-prefix <iri-prefix> --minted-from-property <iri>
             --min-id <integer> --max-id <integer> --pad-width <integer>
             --keep-deprecated <bool> --output <file>
    --keep-deprecated <arg>        keep temporary terms in the ontology as
                                   deprecated entities
    --max-id <arg>                 fail if no identifier can be minted
                                   less than or equal to this number
    --min-id <arg>                 start minted identifiers from the max
                                   of either this number or the highest
                                   identifier found which is less than or
                                   equal to max-id
    --minted-from-property <arg>   property IRI used to link minted
                                   identifiers to temporary identifiers
    --minted-id-prefix <arg>       IRI prefix to prepend to minted
                                   identifiers
    --pad-width <arg>              apply leading zeroes to minted
                                   identifiers up to this width
    --temp-id-prefix <arg>         IRI prefix indicating temporary
                                   identifiers

balhoff · 2020-11-06T18:32:01Z

The command would be pretty configurable but it would be beneficial for OBO to adopt a prefix convention for temporary term IDs. This would save editors from needing to do any reconfiguration of Protégé between contributions to different ontologies. If the IDs look like http://purl.obolibrary.org/temp#CC0DB546-71C7-4E73-9C54-CD75A4BFB111, we could redirect http://purl.obolibrary.org/temp to a page describing the workflow and that IDs of this form are not officially published and useable. I think it would be good for the ontology namespace of the target ontology to NOT be included, in order to allow editing of multiple ontologies as just described, and also to prevent a temporary ID from looking at all like a "GO term" (or whatever ontology).

matentzn · 2020-11-06T18:51:39Z

This is a great idea.. Would it be possible to make ROBOT mint sensitive to the id-ranges.owl files we already have in our repos? By sensitive I would think that the optional parameter like -u nico would pick the next ID in the range specified by the ID ranges file (the default would be -u robot which would look for a entry for robot in the id-ranges file). Else we could get conflicts on people working with "the old style", using patterns or OBO edit, where a random ID my not be necessary - or possible..

dosumis · 2020-11-06T19:24:51Z

I like it. The main potential issue I can see is ID ranges. Could we really have GitHub actions tied to a user? OTOH - this reduces the need for ID ranges in the first place. As long as everyone on a project uses this then we shouldn't get any ID clashes anyway. Maybe we should look around for a small-ish project to test it on @Clare72 - would you be interested in testing this on a FlyBase ontology?

balhoff · 2020-11-06T19:37:43Z

I had imagined not using ID ranges, or at least restricting the way Nico says, such that all robot minted IRIs are in a specified range and folks using the old way would work in separate ranges. I think we could add parsing of id-ranges.owl to support Nico's idea.

jamesaoverton · 2020-11-06T20:21:03Z

This is a good idea and a well thought out proposal. Much appreciated @balhoff !

I could definitely use something like this for some of my projects. There are some workflows in Python that I've started on, and I'll give some thought to how this could fit.

@balhoff When do you see this being run in the workflow? It seems like it has to run after a PR has been merged to master, otherwise multiple PRs could claim the same IDs. If we trust it, it could potentially run as a GitHub Action after merge to master.

We started talking to the Protege team about project configuration across multiple tools, and this could be a good fit: INCATools/ontology-development-kit#328

It would be convenient to use existing ID ranges. Maybe we could try to determine the ID range based on the author of the last git commit.

One more possible feature that I want to float is linking temporary IDs to issues, but I'm not sure how that would fit in this design. OBI has never used an ID ranges file, and has recently been using a Google Sheet for term ID reservations. It includes a column linking back to the relevant issue, and I really like that.

balhoff · 2020-11-07T14:35:14Z

@jamesaoverton:

When do you see this being run in the workflow? It seems like it has to run after a PR has been merged to master, otherwise multiple PRs could claim the same IDs. If we trust it, it could potentially run as a GitHub Action after merge to master.

Yes, this is what I have in mind. Only one person or job would do the minting, off of master. Something I've been thinking about is how this will interact with nightly snapshots, which GO builds and publishes every day from master. There will be points on master when minting has not yet happened. We could have a separate snapshot branch which the minting job pushes to each time it runs mint, and publish from there instead.

Maybe we could try to determine the ID range based on the author of the last git commit.

I think this is problematic since multiple PRs may have been merged between minting commits. I don't yet see a use case for minting IDs from an ID range tied to authors of particular commits. Since ID ranges are there to avoid collisions, if the minting job is given its own ID range this will not be a problem. If anyone is using ID ranges for provenance, they should just use a creator annotation property instead.

One more possible feature that I want to float is linking temporary IDs to issues, but I'm not sure how that would fit in this design.

We're using the term tracker item relation in GO to link terms to issues (with a value of type xsd:anyURI). If you use this on the temporary ID, the annotation would migrate to the official ID when you run robot mint.

balhoff · 2020-11-07T18:39:15Z

The mint command would also insert an annotation minted_from to link to the UUID IRI to be used to resolve issues where someone accidentally continues work on a branch. Maybe we can request a property for this in OMO. In any case it would be a command-line option. minted_from annotations can be filtered from release products.

I think we could use http://purl.org/dc/terms/replaces for the default relation from minted IRIs to temporary IRIs.

balhoff mentioned this issue Nov 6, 2020

Skeleton of mint command #764

Draft

5 tasks

balhoff added the enhancement label Nov 6, 2020

balhoff self-assigned this Nov 6, 2020

balhoff mentioned this issue Nov 6, 2020

Need SOP for avoiding ID clashes when 1 editor has multiple branches awaiting merge geneontology/go-ontology#13812

Closed

balhoff mentioned this issue Jan 15, 2021

new term: "co-roosting with", subtype of "ecologically co-occuring with" oborel/obo-relations#419

Closed

balhoff mentioned this issue Feb 26, 2024

Creating a new class with an auto-allocated ID INCATools/kgcl#56

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new command: mint #763

new command: mint #763

balhoff commented Nov 6, 2020 •

edited

Loading

balhoff commented Nov 6, 2020 •

edited

Loading

balhoff commented Nov 6, 2020

matentzn commented Nov 6, 2020

dosumis commented Nov 6, 2020

balhoff commented Nov 6, 2020

jamesaoverton commented Nov 6, 2020

balhoff commented Nov 7, 2020

balhoff commented Nov 7, 2020

new command: mint #763

new command: mint #763

Comments

balhoff commented Nov 6, 2020 • edited Loading

balhoff commented Nov 6, 2020 • edited Loading

balhoff commented Nov 6, 2020

matentzn commented Nov 6, 2020

dosumis commented Nov 6, 2020

balhoff commented Nov 6, 2020

jamesaoverton commented Nov 6, 2020

balhoff commented Nov 7, 2020

balhoff commented Nov 7, 2020

balhoff commented Nov 6, 2020 •

edited

Loading

balhoff commented Nov 6, 2020 •

edited

Loading