-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix PEP 0 name parsing #1386
Fix PEP 0 name parsing #1386
Conversation
surname = " ".join(name_parts[-2:]) | ||
name.update(forename=forename, surname=surname) | ||
|
||
# handles double surnames after a middle initial (e.g. N. Vander Weele) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tough, a name like R. David Murray
should give Murray
as a surname for the PEP index.
That said, a name written in Japanese English convention like INADA Naoki
should return INADA
as surname.
In short, it’s not generally possible to parse world names into US categories of «first», «middle», «last»
I guess we have to accept imperfect for now, add special cases when we notice problems (what PEP will add the first Name MacName Sr
🙂), and maybe someday rework the system to have the proper way: metadata (or some author index dict) should include full name and short name for PEP 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Luckily (?!) in PEP 458 R. David Murray
is listed without the full stop in his first initial, so the name is still parsed correctly. I don't think there's a good solution for INADA Naoki
beyond adding a special-case exception
In short, it’s not generally possible to parse world names into US categories of «first», «middle», «last»
Or UK, Australia, Canada etc 😜. But I get your point. I'm reminded of this post about names - hopefully #40 doesn't apply to us...
I think that your last suggestion having some sort of lookup table is probably the best solution, as in all the PEPs there are still only a relativley small number of authors (248) - it's quite late here so will add that feature tommorow. It also keeps special cases etc. out of the code to keep it from becoming knobbly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be OK if this PEP fixed the most egregious cases (III
) without covering 100% of possibilities 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gives me something to do! Latest commit adds such a metadata lookup and therefore simplifies the name parsing code.
This should make it so that names can be correctly entered into AUTHORS.csv and PEP 0 will reflect this. I've also identified some duplicate entries (e.g. P.J. Eby
& Phillip J. Eby
, Greg
and Gregory
Ewing, Jim J. Jewett
& Jim Jewett
, Martin v. Löwis
& Martin von Löwis
). Is it acceptable to modify PEP headers to canonicalise these names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would add multiple entries to the data file rather than editing historical documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest data file (exceptions rather than full mapping) doesn’t de-duplicate these entries, should it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data file is checked first (in init), so unsure where duplicates would propogate from?
Always good to be preventative but not sure I understand this one, sorry!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe my comment doesn’t make sense!
I don’t have a clear picture of the current behaviour of the code, so I wondered if the change from full data file to exceptions data file did preserve the feature you added of normalizing the duplicate names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! I forgot to add that back in, you're right - have done so now (only adding the less used variant and mapping it to the 'cannonical' variant, to keep the file smaller)
Is there anything outstanding to do on this one @merwok? / What would next steps be? |
Is there a preview up somewhere? If not, I’ll have a look on my laptop. |
I haven't set one up, sorry |
532f361
to
7a0b5b5
Compare
Hum, I understand that the purpose of this PEP is to fix:
"Ernest W. Durbin III" becomes "III". How is the CSV file generated? How is it supposed to be maintained? I would prefer to only store "exceptions" in this file: if a name is I'm not sure why my name "Victor Stinner" is rendered as "Stinner, Victor". Why not simply copying the name unchanged? Is it a convention? Also I'm not sure if it's a good idea to provide a long list of email addresses. |
I think Eric also gave examples of surnames from a non-anglicised tradition (e.g.
I'd be happy to render the name unchanged, but don't want to make unilateral changes for obvious reasons! Rendering the name unchanged would make all of these workarounds unnecessary, so would be easier from a maintenance perspective. Having done a bit of looking, the Line 177 in 1fbe18e
This seems to have originated from @warsaw in commit b7ac9d0 (Aug 2000). I might suggest that if the Authors/Owners key were to be changed, the names by each PEP should be updated to a similar format.
True! |
… as per Victor's suggestion
Implemented in latest commit. I haven't removed the long list of names/emails (under heading Authors/Owners), but can do this. I wonder if it should be in a new PR though, to limit scope. Happy to do this if you'd like! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your efforts!
Hi! I will finalize this as soon as I can then turn to the Sphinx PRs! |
Thanks! Seems @hugovk is also doing a bunch of pep-infra PRs, so if mine get approval will look to ensure no conflict. |
I don't think there will be much overlap, if any, but feel free to ping me if you've any questions! |
@merwok Is there a list of outstanding concerns to help finalize this PR? I'd enthusiastically work on them. it looks like per Zen this ended up going the explicit route for overrides, is it just cleaning up the attempt at automatic handling? |
Re what this PR does, in comparison to #1385 this is pretty single-issue! I added an explicit overrides file, and cleaned up a bit of the automatic handling. |
I only wanted to test the branch locally to check the output, but I’ve had no time so far. @ewdurbin please feel free to take over to approve and merge! thanks for the help. |
closing |
This fixes name parsing in PEP 0 for generation of the indicies, for example
Ernest W. Durbin III
,The Python core team and community
,Eric N. Vander Weele
.This fix is also included in PR #1385, but as per @merwok's suggestion in #2 I'm creating this new pull request for the single issue.
Thanks,
Adam