Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

providers: provide DataCite-like DOI locally #125

Merged

Conversation

fenekku
Copy link
Contributor

@fenekku fenekku commented Oct 22, 2019

It would probably be good to have an @inveniosoftware/architects reviewing this too because it generates a PID (DOI) which is a core element of Invenio and it does so in a non-trivial way (I think).

Copy link
Contributor

@ntarocco ntarocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!!! 💯 🚀

May I ask to slightly change the code to be even more reusable?
I would like to use base32 for normal recids, not only DOIs.
123456789 -> ABCD-EDFG

can we "plug it in" somehow in the recordid.py? Optionally, to avoid to break backward-compatibility.

invenio_pidstore/providers/base32.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/base32.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/base32.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/base32.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/base32.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/base32.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/base32.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/datacite.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/datacite.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/datacite.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/base32.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/base32.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/base32.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/base32.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/datacite.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/datacite.py Outdated Show resolved Hide resolved
@fenekku fenekku force-pushed the 124_generate_datacite_like_doi branch 3 times, most recently from 547cef3 to df0febf Compare October 23, 2019 20:11
@fenekku
Copy link
Contributor Author

fenekku commented Oct 23, 2019

Had to add min_length to ensure encoded length, without this tests were failing on Travis.

Copy link
Member

@ppanero ppanero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tho in the separate module as discussed IRL.

@fenekku fenekku force-pushed the 124_generate_datacite_like_doi branch 4 times, most recently from 0c5c6ea to 058dc1c Compare November 4, 2019 19:44
@ppanero ppanero added this to In progress in InvenioRDM November Board via automation Nov 5, 2019
@ppanero ppanero moved this from In progress to In Review in InvenioRDM November Board Nov 5, 2019
@fenekku fenekku force-pushed the 124_generate_datacite_like_doi branch from 058dc1c to c467623 Compare November 5, 2019 15:38
- Generate a random, configurable length, base32, URI-friendly,
  hyphen-separated, optionally checksummed DOI suffix
- Cross-document
- Closes inveniosoftware#124
@fenekku fenekku force-pushed the 124_generate_datacite_like_doi branch from c467623 to 9e569ce Compare November 5, 2019 16:36
Copy link
Contributor

@ntarocco ntarocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Good job! 👍 🚀 💯

Question, maybe for @lnielsen too: do we want in the long run in Invenio to have recids with this new base32 format? If yes, shall we create another RecordId provider that generate such IDs?
In our case, we will have to re-create that provider both in ILS and future CDS... I am wondering if it makes sense to put it here so that it is available to everyone...

invenio_pidstore/providers/datacite.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/datacite.py Outdated Show resolved Hide resolved
Copy link
Member

@lnielsen lnielsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an overall issue here, that I'm sorry I haven't detected before. We need a PIDProvider that's able to generate internal identifiers (base32) - a bit similar to RecordIdProvider. The DataCite provider should generate the DOI by taking the internal identifier and use it as suffix.

invenio_pidstore/config.py Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
invenio_pidstore/providers/datacite.py Outdated Show resolved Hide resolved
Copy link
Member

@ppanero ppanero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Great job!

A few minors, check in depth the test comments, there might be an inconsistency in the "default" length/split.

Approved: Assuming passing tests :)

invenio_pidstore/__init__.py Outdated Show resolved Hide resolved
invenio_pidstore/__init__.py Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
tests/test_providers.py Show resolved Hide resolved
tests/test_providers.py Show resolved Hide resolved
@fenekku fenekku removed this from In Review in InvenioRDM November Board Nov 7, 2019
@fenekku fenekku force-pushed the 124_generate_datacite_like_doi branch from db52720 to 28cbe26 Compare November 7, 2019 15:57
@fenekku
Copy link
Contributor Author

fenekku commented Nov 7, 2019

Thanks for the review @ppanero . Changes made. Just needs an architect to also approve.

Copy link
Contributor

@ntarocco ntarocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Fantastic work! Thanks for this! :)

A couple of comments:

  • I guess the V2 naming/pattern and deprecation of current RecordIdProvider was agreed with @lnielsen and/or @ppanero?
  • I understand that the potential collision of a random generated base32 id is probably very minimal, but I am wondering if we should check if there is a collision, e.g. (pseudocode):
while RecordIdProviderV2.get(pid_value):
    pid_value = cls.generate_id(options)

kwargs['pid_value'] = pid_value

I fear we might have the db raising exception if the generated id is already in the table...

run-tests.sh Outdated Show resolved Hide resolved
@fenekku fenekku force-pushed the 124_generate_datacite_like_doi branch from 28cbe26 to 8154467 Compare November 11, 2019 14:05
@fenekku
Copy link
Contributor Author

fenekku commented Nov 11, 2019

Fixed the bash header.

1- Naming is always going to be hard. I think it's fine.
2- Still not sure if I understand the concerns about this:

I fear we might have the db raising exception if the generated id is already in the table...

I don't fear it... I hope for it? We do want an exception if the id is already in the table and we
will get one according to: https://github.com/inveniosoftware/invenio-pidstore/blob/master/invenio_pidstore/models.py#L163
We are not trying to recover from it, because user can just retry. If we do try to recover ourselves, it would lead to a rabbit hole of other extremely rare edge cases it seems to me (I am open to ideas to do recovery sanely if people have them). Checking for existence beforehand would be a consistently big performance hit for an incredibly rare 'payoff'. I'd rather move on given the time spent on the PR...

@fenekku fenekku force-pushed the 124_generate_datacite_like_doi branch from 8154467 to 20f37d0 Compare November 11, 2019 14:35
Copy link
Member

@lnielsen lnielsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding retrying on collisions: I would leave it as is, which is consistent with how we deal with uuid's for e.g. records.

invenio_pidstore/providers/recordid_v2.py Show resolved Hide resolved
@fenekku
Copy link
Contributor Author

fenekku commented Nov 15, 2019

@lnielsen Build is passing and it has been approved: this can be merged.

@fenekku
Copy link
Contributor Author

fenekku commented Nov 18, 2019

@lnielsen : to be more explcit: neither Pablo or I can merge this, so if an architect could merge it, it would unblock us :)

@ntarocco ntarocco merged commit 5316a47 into inveniosoftware:master Nov 18, 2019
InvenioRDM October Board automation moved this from In Review to Done Nov 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

pid: generate random alphanumeric PIDs
4 participants