Un-Text::Unidecode author names in commits #40

avar opened this Issue Feb 22, 2010 · 3 comments


None yet
2 participants

avar commented Feb 22, 2010

A commit like this one has by name as "AEvar Arnfjord Bjarmason" instead of "Ævar Arnfjörð Bjarmason". It's converted by PAUSE (presumably).

Please convert the commits to use original author names if possible.


schwern commented Feb 22, 2010

The issue is likely that CPANPLUS (which I'm using to get the author info) is drawing from authors/01mailrc.txt.gz which in your case says:

alias AVAR       "AEvar Arnfjord Bjarmason <avar@cpan.org>"

gitpan can only be as accurate as its data. It would need a better source. Do you know of one? Perhaps where search.cpan.org is getting it? There is /authors/00whois.html but I don't know if that's canonical or complete.

avar commented Feb 22, 2010

PAUSE itself maintains this data, but I don't know if it's exported anywhere. If I go to "Edit Account Info" at PAUSE there's Full Name which in my case is "Ævar Arnfjörð Bjarmason" and an ASCII transliteration of Full Name where the default is "AEvar Arnfjord Bjarmason" is supplied by the Text::Unidecode module.


schwern commented Aug 13, 2014

The proper Unicode names are in 02authors.txt.gz. This is pretty easy to parse. Parse::CPAN::Authors, which gitpan is now using to get author information, currently reads from 01mailrc.txt.gz. It can be replaced with a home rolled version, or better yet PCA extended to accept 02authors.txt.gz.

schwern added this to the Pre-launch 2.0 milestone Aug 13, 2014

@schwern schwern added a commit that referenced this issue Sep 20, 2014

@schwern schwern Use 02authors for getting author information to fix Unicode names.
01mailrc has ASCII-fied names.  02authors contains author's real names.
Parse::CPAN::Authors only read 01mailrc, so I've written my own parser
for 02authors.

For #40

schwern closed this Sep 20, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment