Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Un-Text::Unidecode author names in commits #40

Closed
avar opened this issue Feb 22, 2010 · 3 comments
Closed

Un-Text::Unidecode author names in commits #40

avar opened this issue Feb 22, 2010 · 3 comments

Comments

@avar
Copy link

avar commented Feb 22, 2010

A commit like this one has by name as "AEvar Arnfjord Bjarmason" instead of "Ævar Arnfjörð Bjarmason". It's converted by PAUSE (presumably).

Please convert the commits to use original author names if possible.

@schwern
Copy link
Contributor

schwern commented Feb 22, 2010

The issue is likely that CPANPLUS (which I'm using to get the author info) is drawing from authors/01mailrc.txt.gz which in your case says:

alias AVAR       "AEvar Arnfjord Bjarmason <avar@cpan.org>"

gitpan can only be as accurate as its data. It would need a better source. Do you know of one? Perhaps where search.cpan.org is getting it? There is /authors/00whois.html but I don't know if that's canonical or complete.

@avar
Copy link
Author

avar commented Feb 22, 2010

PAUSE itself maintains this data, but I don't know if it's exported anywhere. If I go to "Edit Account Info" at PAUSE there's Full Name which in my case is "Ævar Arnfjörð Bjarmason" and an ASCII transliteration of Full Name where the default is "AEvar Arnfjord Bjarmason" is supplied by the Text::Unidecode module.

@schwern
Copy link
Contributor

schwern commented Aug 13, 2014

The proper Unicode names are in 02authors.txt.gz. This is pretty easy to parse. Parse::CPAN::Authors, which gitpan is now using to get author information, currently reads from 01mailrc.txt.gz. It can be replaced with a home rolled version, or better yet PCA extended to accept 02authors.txt.gz.

@schwern schwern added this to the Pre-launch 2.0 milestone Aug 13, 2014
schwern added a commit that referenced this issue Sep 20, 2014
01mailrc has ASCII-fied names.  02authors contains author's real names.
Parse::CPAN::Authors only read 01mailrc, so I've written my own parser
for 02authors.

For #40
@schwern schwern closed this as completed Sep 20, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants