Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new command: mint #763

Open
balhoff opened this issue Nov 6, 2020 · 8 comments
Open

new command: mint #763

balhoff opened this issue Nov 6, 2020 · 8 comments
Assignees

Comments

@balhoff
Copy link
Contributor

balhoff commented Nov 6, 2020

In order to support more decentralized development workflows, I'd like to propose a new command robot mint.

Problem: it’s a real pain to work with different OBO ID spaces at different times in Protégé. Even if they made that work better, it would still continue to be a pain to manage id ranges for different editors, especially for someone making a drive-by pull request on an ontology.

Solution: robot mint

This command would support a workflow like this:

Ontology contributors generate new terms using a pattern (the patterns are configurable):

  • http://purl.obolibrary.org/temp#$uuid
  • e.g., http://purl.obolibrary.org/temp#CC0DB546-71C7-4E73-9C54-CD75A4BFB111

Do work on branches, create PR, merge to master.

Have a separate action, like a dedicated Jenkins job or person, which runs robot mint, only on stuff that’s been merged into master.

robot mint --minted-id-prefix "http://purl.obolibrary.org/obo/GO_" replaces UUID IRIs with next GO_NNNNNNN ids in the sequence

The mint command would also insert an annotation minted_from to link to the UUID IRI to be used to resolve issues where someone accidentally continues work on a branch. Maybe we can request a property for this in OMO. In any case it would be a command-line option. minted_from annotations can be filtered from release products.

mint would not generate a new permanent IRI if a term already exists with a minted_from value for the temp IRI.

The mint command could also be used to implement provisional term workflows (not sure how much distinction there really is from the main use case). There could be a mint option for whether to leave temp IRIs in as declared, deprecated, entities (e.g. classes) or to erase and only leave as minted_from annotation IRI values.

If we come to agreement on this, I will implement it.

Related ticket in GO: geneontology/go-ontology#13812

@balhoff
Copy link
Contributor Author

balhoff commented Nov 6, 2020

Preliminary usage output (removed common ROBOT options for clarity):

usage: robot mint --input <file> --minted-id-prefix <iri-prefix>
             --temp-id-prefix <iri-prefix> --minted-from-property <iri>
             --min-id <integer> --max-id <integer> --pad-width <integer>
             --keep-deprecated <bool> --output <file>
    --keep-deprecated <arg>        keep temporary terms in the ontology as
                                   deprecated entities
    --max-id <arg>                 fail if no identifier can be minted
                                   less than or equal to this number
    --min-id <arg>                 start minted identifiers from the max
                                   of either this number or the highest
                                   identifier found which is less than or
                                   equal to max-id
    --minted-from-property <arg>   property IRI used to link minted
                                   identifiers to temporary identifiers
    --minted-id-prefix <arg>       IRI prefix to prepend to minted
                                   identifiers
    --pad-width <arg>              apply leading zeroes to minted
                                   identifiers up to this width
    --temp-id-prefix <arg>         IRI prefix indicating temporary
                                   identifiers

@balhoff
Copy link
Contributor Author

balhoff commented Nov 6, 2020

The command would be pretty configurable but it would be beneficial for OBO to adopt a prefix convention for temporary term IDs. This would save editors from needing to do any reconfiguration of Protégé between contributions to different ontologies. If the IDs look like http://purl.obolibrary.org/temp#CC0DB546-71C7-4E73-9C54-CD75A4BFB111, we could redirect http://purl.obolibrary.org/temp to a page describing the workflow and that IDs of this form are not officially published and useable. I think it would be good for the ontology namespace of the target ontology to NOT be included, in order to allow editing of multiple ontologies as just described, and also to prevent a temporary ID from looking at all like a "GO term" (or whatever ontology).

@matentzn
Copy link
Contributor

matentzn commented Nov 6, 2020

This is a great idea.. Would it be possible to make ROBOT mint sensitive to the id-ranges.owl files we already have in our repos? By sensitive I would think that the optional parameter like -u nico would pick the next ID in the range specified by the ID ranges file (the default would be -u robot which would look for a entry for robot in the id-ranges file). Else we could get conflicts on people working with "the old style", using patterns or OBO edit, where a random ID my not be necessary - or possible..

@dosumis
Copy link

dosumis commented Nov 6, 2020

I like it. The main potential issue I can see is ID ranges. Could we really have GitHub actions tied to a user? OTOH - this reduces the need for ID ranges in the first place. As long as everyone on a project uses this then we shouldn't get any ID clashes anyway. Maybe we should look around for a small-ish project to test it on @Clare72 - would you be interested in testing this on a FlyBase ontology?

@balhoff
Copy link
Contributor Author

balhoff commented Nov 6, 2020

I had imagined not using ID ranges, or at least restricting the way Nico says, such that all robot minted IRIs are in a specified range and folks using the old way would work in separate ranges. I think we could add parsing of id-ranges.owl to support Nico's idea.

@jamesaoverton
Copy link
Member

This is a good idea and a well thought out proposal. Much appreciated @balhoff !

I could definitely use something like this for some of my projects. There are some workflows in Python that I've started on, and I'll give some thought to how this could fit.

@balhoff When do you see this being run in the workflow? It seems like it has to run after a PR has been merged to master, otherwise multiple PRs could claim the same IDs. If we trust it, it could potentially run as a GitHub Action after merge to master.

We started talking to the Protege team about project configuration across multiple tools, and this could be a good fit: INCATools/ontology-development-kit#328

It would be convenient to use existing ID ranges. Maybe we could try to determine the ID range based on the author of the last git commit.

One more possible feature that I want to float is linking temporary IDs to issues, but I'm not sure how that would fit in this design. OBI has never used an ID ranges file, and has recently been using a Google Sheet for term ID reservations. It includes a column linking back to the relevant issue, and I really like that.

@balhoff
Copy link
Contributor Author

balhoff commented Nov 7, 2020

@jamesaoverton:

When do you see this being run in the workflow? It seems like it has to run after a PR has been merged to master, otherwise multiple PRs could claim the same IDs. If we trust it, it could potentially run as a GitHub Action after merge to master.

Yes, this is what I have in mind. Only one person or job would do the minting, off of master. Something I've been thinking about is how this will interact with nightly snapshots, which GO builds and publishes every day from master. There will be points on master when minting has not yet happened. We could have a separate snapshot branch which the minting job pushes to each time it runs mint, and publish from there instead.

Maybe we could try to determine the ID range based on the author of the last git commit.

I think this is problematic since multiple PRs may have been merged between minting commits. I don't yet see a use case for minting IDs from an ID range tied to authors of particular commits. Since ID ranges are there to avoid collisions, if the minting job is given its own ID range this will not be a problem. If anyone is using ID ranges for provenance, they should just use a creator annotation property instead.

One more possible feature that I want to float is linking temporary IDs to issues, but I'm not sure how that would fit in this design.

We're using the term tracker item relation in GO to link terms to issues (with a value of type xsd:anyURI). If you use this on the temporary ID, the annotation would migrate to the official ID when you run robot mint.

@balhoff
Copy link
Contributor Author

balhoff commented Nov 7, 2020

The mint command would also insert an annotation minted_from to link to the UUID IRI to be used to resolve issues where someone accidentally continues work on a branch. Maybe we can request a property for this in OMO. In any case it would be a command-line option. minted_from annotations can be filtered from release products.

I think we could use http://purl.org/dc/terms/replaces for the default relation from minted IRIs to temporary IRIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants