Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug on phone number generator for en-US locale caused by incorrect .yml file structure #2924

Merged
merged 1 commit into from Mar 28, 2024

Conversation

aprescott
Copy link
Contributor

@aprescott aprescott commented Mar 27, 2024

Motivation / Background

PhoneNumber.cell_phone expects an i18n key of cell_phone.formats, but the en-US.yml file currently has faker.phone_number.cell_phone instead of faker.cell_phone.

The result of that mismatch is that cell_phone will ignore the defined formats, and potentially generate invalid US numbers (e.g. those with an area code beginning with 1) when the locale is en-US.

31d99d1 reworked YAML structure and appears to have inadvertently moved the key.

You can see the change in US number behavior in a console:

Faker::Config.locale = "en-US"
Faker::PhoneNumber.translate("faker.cell_phone.formats")

On 3.3.0 this returns

["###-###-####", "(###) ###-####", "###.###.####", "### ### ####"]

because it's the fallback value within the en (not en-US) locale file. The correct value should be the formats outdented in this commit.

Fixes #2922 (although it's currently closed).

Additional information

See #2922 (comment).

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug, refactor something, or add a feature.
  • Tests and Rubocop are passing before submitting your proposed changes.

If you're proposing a new generator or locale:

  • Double-check the existing generators documentation to make sure the new generator you want to add doesn't already exist.
  • You've reviewed and followed the Contributing guidelines.

en_translations = I18n.translate('faker.cell_phone.formats', locale: :en, raise: true)
en_us_translations = I18n.translate('faker.cell_phone.formats', locale: Faker::Config.locale, raise: true)

refute_equal en_translations, en_us_translations
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made up this test to try and capture the essence of the problem, but I'm open to writing the test in some other way if there's a better pattern to follow.

It seems that the root cause of the problem is that there was a .yml file with a structure different than the "schema" that's expected of all files. So in that sense this feels like almost the wrong thing to test because it's too isolated. I'm not familiar with the codebase though, and this seems to cover the broken case.

Copy link
Contributor

@thdaraujo thdaraujo Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand the intent, but I'm not sure this is the right test to add either.

I would love to see a test that could catch some obviously wrong en-US numbers. None of the existing tests caught this issue.

My understanding is that test_validity_of_phone_method_output was supposed to catch things like this, but it did not. It is also very hard to read due to the complex regex.

Maybe we could add a simpler test that does some straightforward sanity checks or rejects impossible area codes, etc.?

If not, that's fine. We should remove this test, and I'll open another issue specific for improving the coverage for en-US phone numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since test_validity_of_phone_method_output only tests phone_number and not a method that calls cell_phone, it doesn't cover the broken case. A test like test_validity_of_phone_method_output against cell_phone would cover it, but maybe not deterministically, because I think all the fallback en value in cell_phone.formats might look like en-US's cell_phone.formats, except for the leading 1 values? That might not be true though.

As it's unclear how exactly to test this, I'll go with removing my test from this PR!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed!

lib/locales/en-US.yml Outdated Show resolved Hide resolved
@aprescott aprescott force-pushed the en-us-yml-cell-phones-fix branch 2 times, most recently from a36a8d8 to 24a4b19 Compare March 27, 2024 18:12
@aprescott
Copy link
Contributor Author

A short-term workaround until this is fixed is to explicitly provide the corrected .yml content in your application's i18n files:

en-US:
  faker:
    cell_phone:
      formats:
        - "#{PhoneNumber.area_code}-#{PhoneNumber.exchange_code}-#{PhoneNumber.subscriber_number}"
        - "(#{PhoneNumber.area_code}) #{PhoneNumber.exchange_code}-#{PhoneNumber.subscriber_number}"
        - "#{PhoneNumber.area_code}-#{PhoneNumber.exchange_code}-#{PhoneNumber.subscriber_number}"
        - "#{PhoneNumber.area_code}.#{PhoneNumber.exchange_code}.#{PhoneNumber.subscriber_number}"
        - "#{PhoneNumber.area_code}-#{PhoneNumber.exchange_code}-#{PhoneNumber.subscriber_number}"
        - "(#{PhoneNumber.area_code}) #{PhoneNumber.exchange_code}-#{PhoneNumber.subscriber_number}"
        - "#{PhoneNumber.area_code}-#{PhoneNumber.exchange_code}-#{PhoneNumber.subscriber_number}"
        - "#{PhoneNumber.area_code}.#{PhoneNumber.exchange_code}.#{PhoneNumber.subscriber_number}"

@theycallmeswift
Copy link

LGTM! Thanks for investigating. I was banging my head against the wall on that mistagged release during bisect.

@thdaraujo thdaraujo changed the title Move en-US.yml's cell_phone entry to the correct position Fix bug on phone number generator for en-US locale Mar 28, 2024
@thdaraujo thdaraujo changed the title Fix bug on phone number generator for en-US locale Fix bug on phone number generator for en-US locale caused by incorrect .yml file structure Mar 28, 2024
Copy link
Contributor

@thdaraujo thdaraujo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch, thanks for fixing this!

I left a suggestion about the test, let me know if it makes sense.

en_translations = I18n.translate('faker.cell_phone.formats', locale: :en, raise: true)
en_us_translations = I18n.translate('faker.cell_phone.formats', locale: Faker::Config.locale, raise: true)

refute_equal en_translations, en_us_translations
Copy link
Contributor

@thdaraujo thdaraujo Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand the intent, but I'm not sure this is the right test to add either.

I would love to see a test that could catch some obviously wrong en-US numbers. None of the existing tests caught this issue.

My understanding is that test_validity_of_phone_method_output was supposed to catch things like this, but it did not. It is also very hard to read due to the complex regex.

Maybe we could add a simpler test that does some straightforward sanity checks or rejects impossible area codes, etc.?

If not, that's fine. We should remove this test, and I'll open another issue specific for improving the coverage for en-US phone numbers.

`PhoneNumber.cell_phone` expects an i18n key of `cell_phone.formats`,
but the en-US.yml file currently has `faker.phone_number.cell_phone`
instead of `faker.cell_phone`.

The result of that mismatch is that `cell_phone` will ignore the defined
formats, and potentially generate invalid US numbers (e.g. those with an
area code beginning with 1) when the locale is en-US.

31d99d1 reworked YAML structure and
appears to have inadvertently moved the key.

You can see the change in US number behavior in a console:

  Faker::Config.locale = "en-US"
  Faker::PhoneNumber.translate("faker.cell_phone.formats")

On 3.3.0 this returns

  ["###-###-####", "(###) ###-####", "###.###.####", "### ### ####"]

because it's the fallback value within the `en` (not `en-US`) locale
file. The correct value should be the formats outdented in this commit.
@aprescott
Copy link
Contributor Author

@thdaraujo thanks for the review! Let me know if I need to add a CHANGELOG entry or anything. Happy to make any other revisions.

@thdaraujo
Copy link
Contributor

Can confirm that this generates mostly valid numbers for en-US (only 1 fail in 100k).

Here's an example test run (based on @theycallmeswift 's reproduction script in #2922):

# frozen_string_literal: true

require 'bundler/inline'

gemfile(true) do
  source 'https://rubygems.org'

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  # gem 'faker', '3.3.0' # fails
  # gem 'faker', '3.2.3' # passes
  gem 'faker', '3.3.0.1-dev' # code from this PR
  gem 'phonelib'
  gem 'minitest'
end

require 'minitest/autorun'

Faker::Config.locale = 'en-US'

class BugTest < Minitest::Test
  def test_fail_on_first_bad_number
    @tester = Faker::PhoneNumber

    100_000.times do
      number = @tester.cell_phone_in_e164

      assert Phonelib.valid?(number), "Expected #{number} to be a valid phone number"
    end
  end

  def test_numbers
    @tester = Faker::PhoneNumber

    n = 100_000

    results = Array.new(10_000) { Phonelib.valid?(@tester.cell_phone_in_e164) }
    tally = results.tally

    assert results.all? == true, "Expected #{n} valid phone numbers, but only #{tally[true]} of #{tally[false]} valid"
  end
end

__END__

Run options: --seed 46684

# Running:

FF

Finished in 101.336742s, 0.0197 runs/s, 570.4052 assertions/s.

  1) Failure:
BugTest#test_fail_on_first_bad_number [phone_test.rb:28]:
Expected +15056418096 to be a valid phone number

  2) Failure:
BugTest#test_numbers [phone_test.rb:40]:
Expected 100000 valid phone numbers, but only 9999 of 1 valid

2 runs, 57803 assertions, 2 failures, 0 errors, 0 skips

@thdaraujo thdaraujo merged commit e15e606 into faker-ruby:main Mar 28, 2024
8 checks passed
@aprescott aprescott deleted the en-us-yml-cell-phones-fix branch March 28, 2024 13:42
@stefannibrasil stefannibrasil mentioned this pull request Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Faker 3.3.0 generating invalid phone numbers
3 participants