Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

County method used with uniqueArray helper crashes faker. #2207

Closed
8 of 10 tasks
jviall opened this issue Jun 9, 2023 · 10 comments · Fixed by #2239
Closed
8 of 10 tasks

County method used with uniqueArray helper crashes faker. #2207

jviall opened this issue Jun 9, 2023 · 10 comments · Fixed by #2239
Labels
c: bug Something isn't working has workaround Workaround provided or linked m: helpers Something is referring to the helpers module s: accepted Accepted feature / Confirmed bug
Milestone

Comments

@jviall
Copy link

jviall commented Jun 9, 2023

Pre-Checks

Describe the bug

Using the county method with the uniqueArray completely crashes not just faker but my entire runtime. In my case, I discovered it in a React + Storybook project, where I switched from using country to using county, and my entire browser became unresponsive.

Attached is a stackblitz (based on the one linked in the docs), that does the same thing, comparing the country method to the county method using the uniqueArray helper, where generating some unique countries works fine, but counties do not.

Minimal reproduction code

https://stackblitz.com/edit/faker-js-demo-rqnnk9?file=index.ts

Additional Context

I personally use faker in a Vite + Storybook + React environment, using ES Modules targeting an ESNext build. However, it is reproduce-able (as shown in the stackblitz) in plain JS modules.

Environment Info

System:
    OS: macOS 13.2.1
    CPU: (10) arm64 Apple M2 Pro
    Memory: 63.45 MB / 32.00 GB
    Shell: 5.8.1 - /bin/zsh
  Binaries:
    Node: 18.16.0 - ~/Library/Caches/fnm_multishells/22781_1678232715369/bin/node
    Yarn: 3.4.1 - /usr/local/bin/yarn
    npm: 9.5.1 - ~/Library/Caches/fnm_multishells/22781_1678232715369/bin/npm
  Browsers:
    Chrome: 114.0.5735.106
    Firefox: 112.0.2
    Safari: 16.3
  npmPackages:
    @faker-js/faker: 8.0.2 => 8.0.2

Which module system do you use?

  • CJS
  • ESM

Used Package Manager

pnpm

@jviall jviall added c: bug Something isn't working s: pending triage Pending Triage labels Jun 9, 2023
@matthewmayer
Copy link
Contributor

There are not 20 different unique counties in the default en locale. Try using a different locale eg en_US or en_GB.

@matthewmayer
Copy link
Contributor

I wonder if uniqueArray should have some logic where when passed a method, it aborts if it has tried generating say 1000x the requested number of items in the array.

@ST-DDT
Copy link
Member

ST-DDT commented Jun 10, 2023

The array part is safe.

if (Array.isArray(source)) {
const set = new Set<T>(source);
const array = Array.from(set);
return this.shuffle(array).splice(0, length);
}

That can be used as a workaround.

But for the function input we have to add the max retry logic.

const set = new Set<T>();
try {
if (typeof source === 'function') {
while (set.size < length) {
set.add(source());
}
}
} catch {
// Ignore
}
return Array.from(set);

@ST-DDT ST-DDT added has workaround Workaround provided or linked s: accepted Accepted feature / Confirmed bug m: helpers Something is referring to the helpers module and removed s: pending triage Pending Triage labels Jun 10, 2023
@matthewmayer
Copy link
Contributor

per #1944 we should also do something about the https://github.com/faker-js/faker/blob/next/src/locales/en/location/county.ts file - either remove it (and have it throw an error), or replace it with more realistic data like https://github.com/faker-js/faker/blob/next/src/locales/en_US/location/county.ts or https://github.com/faker-js/faker/blob/next/src/locales/en_GB/location/county.ts

@jviall
Copy link
Author

jviall commented Jun 12, 2023

There are not 20 different unique counties in the default en locale. Try using a different locale eg en_US or en_GB.

Interesting, I assumed that the default locale included all possible locales. Could there be additional documentation explaining 1) the default locale's data and 2) how choosing a locale impacts the size of the dataset you'll generate from?

@ST-DDT
Copy link
Member

ST-DDT commented Jun 12, 2023

Could there be additional documentation explaining 1) the default locale's data

It's English. Not specific to any country. So you might get both color or colour.

  1. how choosing a locale impacts the size of the dataset you'll generate from?

IMO this is a hard one because it litterally generates data for the thing you selected. So en is any English. en_US is American English + America specific things and uses the general English stuff where no specifc values exists.
As for the quantity, we cannot set a lower limit because some countries only a handful of a kind (e.g. cities or counties). We have/plan for an upper limit of 500? elements per raw data list, excluding merged ones.

So instead of jugling with more precise and hard to maintain definitions I would rather try to fix issues such as these on a data level.
We could add a jsdoc tag on each method with their used data as a key such as location.county however that probably doesnt help find the respective source data and would need additional documentation, that needs additional referencing in order to be found. Dont get me wrong, I'm not against adding documentation. But adding good usable documentation is hard. If you have a suggestion, please let us know.

@jviall
Copy link
Author

jviall commented Jun 20, 2023

@ST-DDT I'm mostly asking for clarity on @matthewmayer's comment:

There are not 20 different unique counties in the default en locale. Try using a different locale eg en_US or en_GB.

I never tried this, but if it's true, why is it that when picking en there seem to be a smaller data set than specifying en_US or en_GB?

@matthewmayer
Copy link
Contributor

matthewmayer commented Jun 21, 2023

en_GB and en_US have a longer list of counties (real British counties and popular US county names)
https://github.com/faker-js/faker/blob/next/src/locales/en_GB/location/county.ts
https://github.com/faker-js/faker/blob/next/src/locales/en_US/location/county.ts

en only has 6 (a truncated list of British counties)
https://github.com/faker-js/faker/blob/next/src/locales/en/location/county.ts

this is a historical glitch which we havent got around to fixing yet. There's a lot of legacy data issues like this which are getting gradually cleaned up.

@xDivisionByZerox
Copy link
Member

After reading through this thread I'm still not sure how this would be solved. Should en/location/county be a combination of county values from en_GB and en_US or should it not exist at all?

@matthewmayer
Copy link
Contributor

I think it should probably just be the same as en_US as the en states are US states.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: bug Something isn't working has workaround Workaround provided or linked m: helpers Something is referring to the helpers module s: accepted Accepted feature / Confirmed bug
Projects
No open projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

4 participants