Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: locale aware sorting #2906

Closed

Conversation

matthewmayer
Copy link
Contributor

@matthewmayer matthewmayer commented May 17, 2024

Draft implementation for #2905

Uses the locale key to customize the sort order.

Requires Intl

The actual changes are in scripts/generate-locales.ts

Copy link

netlify bot commented May 17, 2024

Deploy Preview for fakerjs ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 5137f05
🔍 Latest deploy log https://app.netlify.com/sites/fakerjs/deploys/6663fe670c9c200008549cd4
😎 Deploy Preview https://deploy-preview-2906.fakerjs.dev
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented May 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.96%. Comparing base (567d66d) to head (5137f05).
Report is 2 commits behind head on next.

Additional details and impacted files
@@            Coverage Diff             @@
##             next    #2906      +/-   ##
==========================================
+ Coverage   99.95%   99.96%   +0.01%     
==========================================
  Files        2987     2987              
  Lines      216037   216038       +1     
  Branches      951      604     -347     
==========================================
+ Hits       215943   215966      +23     
+ Misses         94       72      -22     
Files Coverage Δ
src/locales/ar/commerce/department.ts 100.00% <100.00%> (ø)
src/locales/ar/date/month.ts 100.00% <100.00%> (ø)
src/locales/ar/date/weekday.ts 100.00% <100.00%> (ø)
src/locales/ar/vehicle/manufacturer.ts 100.00% <100.00%> (ø)
src/locales/ar/vehicle/model.ts 100.00% <100.00%> (ø)
src/locales/az/color/human.ts 100.00% <100.00%> (ø)
src/locales/az/commerce/department.ts 100.00% <100.00%> (ø)
src/locales/az/commerce/product_name.ts 100.00% <100.00%> (ø)
src/locales/az/company/prefix.ts 100.00% <100.00%> (ø)
src/locales/az/date/weekday.ts 100.00% <100.00%> (ø)
... and 208 more

... and 2 files with indirect coverage changes

scripts/generate-locales.ts Outdated Show resolved Hide resolved
scripts/generate-locales.ts Outdated Show resolved Hide resolved
scripts/generate-locales.ts Outdated Show resolved Hide resolved
scripts/generate-locales.ts Outdated Show resolved Hide resolved
scripts/generate-locales.ts Outdated Show resolved Hide resolved
@ST-DDT ST-DDT added p: 1-normal Nothing urgent c: locale Permutes locale definitions c: infra Changes to our infrastructure or project setup labels May 17, 2024
@ST-DDT ST-DDT added this to the v9.0 milestone May 17, 2024
@ST-DDT ST-DDT linked an issue May 17, 2024 that may be closed by this pull request
@matthewmayer
Copy link
Contributor Author

i want to see if the team think this approach is desirable before getting too nitpicky with this PR?

@ST-DDT
Copy link
Member

ST-DDT commented May 17, 2024

I dont see any drawbacks.

@matthewmayer
Copy link
Contributor Author

Possible drawbacks

  • Most other tools are not locale aware eg if I select a bunch of lines in VS code and sort them I'll get a different order
  • won't work in environments with no Intl support
  • changes in ICU between different versions of node might cause different sort orders
  • case insensitive sorting makes it harder to spot items with inconsistent casing to other entries.

@ST-DDT
Copy link
Member

ST-DDT commented May 18, 2024

Thanks for listing all the potential drawbacks.

  • Most other tools are not locale aware eg if I select a bunch of lines in VS code and sort them I'll get a different order

True, but that is even the case without locale aware sorting as they treat upper and lowercase differently sometimes, words with suffixes are even worse due to the ' behind it that messes their order up.

I consider this a low barrier or entry as we require node for building anyway and node should always come with Intl included AFAIK.
What do the others think?

@ST-DDT
Copy link
Member

ST-DDT commented Jun 6, 2024

Team Decision

  • We would like to have this for v9.0
  • @matthewmayer Could you please continue this PR?

@matthewmayer matthewmayer marked this pull request as ready for review June 7, 2024 09:59
@matthewmayer matthewmayer requested a review from a team as a code owner June 7, 2024 09:59
@ST-DDT
Copy link
Member

ST-DDT commented Jun 8, 2024

CI doesnt seem to pass. Please run pnpm run preflight.

@xDivisionByZerox
Copy link
Member

xDivisionByZerox commented Jun 8, 2024

CI doesnt seem to pass. Please run pnpm run preflight.

I did run preflight on this branch and didn't see any changes 🤔

When I switched my node version to 22 the script emitted changes, which I find very interesting TBH.

Diff
diff --git a/src/locales/lv/commerce/department.ts b/src/locales/lv/commerce/department.ts
index 605dd1c1..22af106b 100644
--- a/src/locales/lv/commerce/department.ts
+++ b/src/locales/lv/commerce/department.ts
@@ -4,9 +4,9 @@ export default [
   'Auto',
   'Bakaleja',
   'Bērnu',
+  'Datoru',
   'Dārglietu',
   'Dārzkopības',
-  'Datoru',
   'Elektronikas',
   'Filmu',
   'Grāmatu',

@xDivisionByZerox xDivisionByZerox requested review from a team June 8, 2024 11:43
@matthewmayer
Copy link
Contributor Author

i guess node 22 has a slightly different ICU version to node 20 causing a different sort order in lv. That was one of the potential drawbacks I noted in #2906 (comment)

@matthewmayer matthewmayer added the do NOT merge yet Do not merge this PR into the target branch yet label Jun 8, 2024
@matthewmayer
Copy link
Contributor Author

matthewmayer commented Jun 8, 2024

i'm flagging this do not merge yet. i think its quite bad if different users on different node versions end up with different generated locale files.

Although there's only one example currently, it may well be there are more examples if you run normalization on node 20+22 across all files, not just the current modules with normalization enabled (i only have node 20 installed locally at the moment, my node 22 is borked, so i cant easily test this, perhaps @xDivisionByZerox you could try?)

@matthewmayer
Copy link
Contributor Author

Note that Node will update ICU versions even within a major Node version

https://github.com/nodejs/node/blob/v20.0.0/tools/icu/current_ver.dep - 72.1
https://github.com/nodejs/node/blob/v20.14.0/tools/icu/current_ver.dep - 75.1

@matthewmayer
Copy link
Contributor Author

nvm use 20.0.0
node
> 'Datoru'.localeCompare('Dārglietu', 'lv')
1
nvm use 20.14.0
node
> 'Datoru'.localeCompare('Dārglietu', 'lv')
-1

@ST-DDT
Copy link
Member

ST-DDT commented Jun 8, 2024

i'm flagging this do not merge yet. i think its quite bad if different users on different node versions end up with different generated locale files.

So do you think, we should generally not sort it in a locale aware manner or do you think about alternative solutions e.g. explicitly importing a specific ICU version that we do not update during major versions?

@Shinigami92
Copy link
Member

So do you think, we should generally not sort it in a locale aware manner or do you think about alternative solutions e.g. explicitly importing a specific ICU version that we do not update during major versions?

I would like to still vote for locale aware sorting
Maybe we need to set our pipeline to node:22 for that specific check? (it is already...) so I mean, we need to format it once and set nvmrc to use node:22 or something like that

@ST-DDT
Copy link
Member

ST-DDT commented Jun 8, 2024

Is there a way to download the icu stuff as a dependency?

@matthewmayer
Copy link
Contributor Author

So do you think, we should generally not sort it in a locale aware manner or do you think about alternative solutions e.g. explicitly importing a specific ICU version that we do not update during major versions?

I would like to still vote for locale aware sorting

Maybe we need to set our pipeline to node:22 for that specific check? (it is already...) so I mean, we need to format it once and set nvmrc to use node:22 or something like that

That wouldn't be sufficient as sort order can change even within different minor versions of the same major node release.

@matthewmayer
Copy link
Contributor Author

the more i read about it the more i feel that the ICU data is too unstable to be implicitly used in snapshot tests, and is likely to cause irritating issues for new contributors in future. See for example nodejs/node#51090 for a similar issue where changes in ICU data in a semver-minor version caused breaking tests for many projects.

A naive Unicode sort, while nonsensical for some languages, is at least stable.

@ST-DDT
Copy link
Member

ST-DDT commented Jun 9, 2024

I checked the icu code and it is cpp, so we cannot just depend on a fixed version easily AFAICT.

@matthewmayer
Copy link
Contributor Author

I'll let you discuss in next meeting. My vote would be to abandon this.

@ST-DDT ST-DDT added the s: needs decision Needs team/maintainer decision label Jun 10, 2024
@xDivisionByZerox
Copy link
Member

After reviewing the discussions in this PR, I would suggest to drop this feature.

A naive Unicode sort, while nonsensical for some languages, is at least stable.

I believe this is an excellent summary to the problem we face.

@ST-DDT
Copy link
Member

ST-DDT commented Jun 11, 2024

Should we add a comment to the relevant code section to raise awareness of this?

@Shinigami92
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: infra Changes to our infrastructure or project setup c: locale Permutes locale definitions do NOT merge yet Do not merge this PR into the target branch yet p: 1-normal Nothing urgent s: needs decision Needs team/maintainer decision
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check whether the locale data should use locale aware sorting
4 participants