Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

global: fix unicode issues #34

Merged
merged 1 commit into from
Nov 22, 2017
Merged

Conversation

slint
Copy link
Member

@slint slint commented Nov 17, 2017

No description provided.

@@ -51,6 +51,12 @@
('doi.org/10.1016/j.epsl.2011.11.037', ['doi', 'handle'],
'10.1016/j.epsl.2011.11.037',
'https://doi.org/10.1016/j.epsl.2011.11.037'),
('10.1016/üникóδé-дôΐ', ['doi', 'handle'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting case - is this an actual DOI?

Copy link
Member Author

@slint slint Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the spec it could be, though from a brief search (from Zenodo records and the web in general) I didn't find any solid example of such DOIs.

(I discovered the issue/error by accident, (user forgot an em-dash in a DOI), and saw that the spec supported UTF-8 characters in DOIs)

@@ -51,6 +51,12 @@
('doi.org/10.1016/j.epsl.2011.11.037', ['doi', 'handle'],
'10.1016/j.epsl.2011.11.037',
'https://doi.org/10.1016/j.epsl.2011.11.037'),
('10.1016/üникóδé-дôΐ', ['doi', 'handle'],
'10.1016/üникóδé-дôΐ',
'https://doi.org/10.1016/üникóδé-дôΐ'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand DOI character encoding specification we would need to URL-encode the non-ASCII characters.

@lnielsen lnielsen added this to the v0.3.0 milestone Nov 20, 2017
@lnielsen lnielsen assigned lnielsen and slint and unassigned lnielsen Nov 20, 2017
@slint slint removed their assignment Nov 21, 2017
@diegodelemos diegodelemos self-assigned this Nov 21, 2017
@diegodelemos diegodelemos removed their assignment Nov 21, 2017
@krzysztof krzysztof assigned krzysztof and unassigned krzysztof Nov 21, 2017
@krzysztof krzysztof self-requested a review November 21, 2017 14:04
@@ -32,25 +32,38 @@
('http://www.example.org/ark:/13030/tqb3kh97gh8w', ['ark', 'url'], '', ''),
('10.1016/j.epsl.2011.11.037', ['doi', 'handle'],
'10.1016/j.epsl.2011.11.037',
'https://doi.org/10.1016/j.epsl.2011.11.037'),
'https://doi.org/10.1016%2Fj.epsl.2011.11.037'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct according to the DOI spec? I think all ascii characters should probably stay as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this part of the spec, (and the final sentence of this section), it's better for forward slashes to be url-encoded

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is funny - so 2.5.2.4 puts slash in the recommend category for encoding. Just below in 2.6.2 it says:

The DOI name "10.1006/jmbi.1998.2354" would be made an actionable link as "https://doi.org/10.1006/jmbi.1998.2354".

(i.e. no encoding) :S

Let's discuss IRL - not fully sure what the impact would be.

@lnielsen lnielsen self-assigned this Nov 22, 2017
@slint slint force-pushed the unicode-fix branch 2 times, most recently from c9ebe91 to ed803c0 Compare November 22, 2017 10:05
* Handles unicode characters in DOI.

* URL-encodes generated DOI URLs
@lnielsen lnielsen merged commit 84ab6fa into inveniosoftware:master Nov 22, 2017
@slint slint deleted the unicode-fix branch September 27, 2019 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants