Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(person): Improvements to Dutch name generation, in particular regarding affixes #1778

Merged
merged 9 commits into from Feb 12, 2023
Merged

fix(person): Improvements to Dutch name generation, in particular regarding affixes #1778

merged 9 commits into from Feb 12, 2023

Conversation

amarillion
Copy link
Contributor

Addresses #1777, plus some minor improvements to Dutch name generation.
I've based this on PR #1637 (fullname name patterns) assuming that will be approved as well.

@amarillion amarillion requested a review from a team as a code owner January 24, 2023 15:50
@ST-DDT ST-DDT added s: on hold Blocked by something or frozen to avoid conflicts p: 1-normal Nothing urgent m: person Something is referring to the person module labels Jan 24, 2023
@codecov
Copy link

codecov bot commented Jan 24, 2023

Codecov Report

Merging #1778 (aa41222) into next (d5b78c7) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             next    #1778      +/-   ##
==========================================
- Coverage   99.64%   99.63%   -0.02%     
==========================================
  Files        2347     2346       -1     
  Lines      235657   235735      +78     
  Branches     1145     1142       -3     
==========================================
+ Hits       234811   234863      +52     
- Misses        824      850      +26     
  Partials       22       22              
Impacted Files Coverage Δ
src/locales/nl/person/index.ts 100.00% <ø> (ø)
src/locales/nl/person/female_prefix.ts 100.00% <100.00%> (ø)
src/locales/nl/person/last_name.ts 100.00% <100.00%> (ø)
src/locales/nl/person/male_prefix.ts 100.00% <100.00%> (ø)
src/locales/nl/person/suffix.ts 100.00% <100.00%> (ø)
src/modules/internet/user-agent.ts 89.34% <0.00%> (-7.11%) ⬇️
src/modules/location/index.ts 98.70% <0.00%> (-0.22%) ⬇️

@amarillion amarillion changed the title #1777 Improvements to Dutch name generation, in particular regarding affixes fix(person): Improvements to Dutch name generation, in particular regarding affixes Jan 24, 2023
@ST-DDT ST-DDT linked an issue Jan 27, 2023 that may be closed by this pull request
10 tasks
@ST-DDT ST-DDT added needs rebase There is a merge conflict s: on hold Blocked by something or frozen to avoid conflicts and removed s: on hold Blocked by something or frozen to avoid conflicts labels Feb 2, 2023
@ST-DDT
Copy link
Member

ST-DDT commented Feb 2, 2023

#1637 has been merged. Please rebase.

@ST-DDT ST-DDT removed the s: on hold Blocked by something or frozen to avoid conflicts label Feb 2, 2023
@amarillion
Copy link
Contributor Author

Rebased onto next

matthewmayer
matthewmayer previously approved these changes Feb 5, 2023
@import-brain import-brain removed the needs rebase There is a merge conflict label Feb 5, 2023
Copy link
Member

@import-brain import-brain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put the last name list in alphabetical order

@amarillion
Copy link
Contributor Author

I've sorted the list of names ASCIIbetically. NB this is not the Dutch sorting order.

@import-brain
Copy link
Member

I've sorted the list of names ASCIIbetically. NB this is not the Dutch sorting order.

Thank you! @faker-js/maintainers Should we sort the names ASCIIbetically or the Dutch way?

@amarillion
Copy link
Contributor Author

amarillion commented Feb 6, 2023

Oh, and I misspoke - it's not ASCIIbetical (because that would imply case-sensitive sorting - Z before a). It's currently sorted with regular English case-insensitive sorting.

So, in a nice example of "be careful what you wish for" we have the following sorting options:

  1. case sensitive sorting (ASCIIbetical): Kok < Visser < de Leeuw
  2. case insensitive sorting (English): de Leeuw < Kok < Visser
  3. case insensitive sorting (Dutch): Kok < de Leeuw < Visser

The one that is implemented with the latest commit is #2. So, which sorting method should I use? Personally, I don't mind either way.

@ST-DDT
Copy link
Member

ST-DDT commented Feb 6, 2023

One of the reasons why we want the sorted lists is to aid in the review process to detect similar or identical entries faster.
So I think a case intensive sort is definitely required.
For the start, we should use (2) English locale. We just have to make sure that (our tests/scripts) this also works for non-English users.
@amarillion The "Dutch case insensitive sorting" applies only to names or is the alphabet itself sorted differently from the English one? Everything that is not easily applicable to the entire locale, is a non-goal.

@ST-DDT ST-DDT added the s: accepted Accepted feature / Confirmed bug label Feb 6, 2023
src/locales/nl/person/last_name.ts Outdated Show resolved Hide resolved
@amarillion
Copy link
Contributor Author

amarillion commented Feb 7, 2023

I've added two more sources for the family names.

Since the purpose of sorting is detecting duplicates, I think the current English sorting is fine and we can leave it at that.

@import-brain import-brain dismissed their stale review February 8, 2023 22:38

list is now sorted

@ST-DDT
Copy link
Member

ST-DDT commented Feb 8, 2023

I've added two more sources for the family names.

Well, you added more links, but you didn't add any new names, so it is basically still the same list from the first source in the list.
(And thus doesn't solve the legal implications)

ff4dee0 (#1778)

@amarillion
Copy link
Contributor Author

I don't get what you mean by legal implications. It's quite obvious that wikipedia doesn't own this list, and that the original source is in fact from https://www.meertens.knaw.nl/nfb/documenten/top100.pdf, as stated in the article.

A list of "top 100 most occurring family names in 2007" are facts, and facts are in fact not copyrightable.

@amarillion
Copy link
Contributor Author

However, if it helps I've made a small modification to the list so it's no longer exactly the same as the source.

@ST-DDT
Copy link
Member

ST-DDT commented Feb 9, 2023

A list of "top 100 most occurring family names in 2007" are facts, and facts are in fact not copyrightable.

IANAL: The facts itself might not be, but the work to compile it may be or at least, some claim it to be: https://blog.polco.us/copyright-apply-survey

Thanks for your efford to address this, but 1% differnce might not be enough.

@matthewmayer
Copy link
Contributor

Maybe go back to the original nl last_name list and just make some changes to add some more tussenvoegsel names, using your own knowledge/judgement rather than relying on a single source?

@amarillion
Copy link
Contributor Author

All right, I made a major modification to the list. It's now a fully original compilation of names, a random selection of various sources created by me personally. Any resemblance with any other list you might find is purely coincidental, of course lists of common names are bound to have overlap.

Maybe go back to the original nl last_name list and just make some changes to add some more tussenvoegsel names, using your own knowledge/judgement rather than relying on a single source?

Funnily enough, after adding the appropriate tussenvoegsels, nearly every name on the original list is also in the top 100 list. I only detected two differences: 'Klein', which I added, and 'Stichting', which isn't a family name.

@ST-DDT ST-DDT requested review from matthewmayer, import-brain and a team February 11, 2023 12:12
@ST-DDT ST-DDT merged commit fbd0db5 into faker-js:next Feb 12, 2023
@amarillion amarillion deleted the 1777-dutch-names branch February 13, 2023 08:06
@amarillion
Copy link
Contributor Author

Nice work guys, I appreciate your attention to detail!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
m: person Something is referring to the person module p: 1-normal Nothing urgent s: accepted Accepted feature / Confirmed bug
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Treatment of the "tussenvoegsel", the family name affix in the Dutch locale
4 participants