Improve `#deterministically_verify` helper #2828

erichmachado · 2023-09-15T01:34:06Z

Motivation / Background

This Pull Request has been created because I was trying to fix a test using the #deterministic_verify helper with the random parameter set and saw that my test was failing with the default depth value (2).

I then realized a few issues with the current implementation:

The documentation does not reflect the implementation. It documents expecting an Integer value for random while it expects a Random instance internally.
It does not guarantee the same Random#rand iteration across depth iterations if a custom random value is set.
1. This makes any runs with depth > 1 fail the internal assertion due to different values being generated across depth iterations.
2. It's easy to verify this, just set random to a custom value on any existing deterministic_verify block on the codebase.
It does not protect the existing Faker::Config.random state from being mutated across #deterministic_verify runs.
1. It's also easy to verify this, just place a debugger call before and after each #deterministic_verify block and inspect the value of Faker::Config.random around it.

I've addressed those issues above by:

Updated the implementation to receive an Integer through a seed parameter instead of a Random instance through random.
1. The documentation was updated accordingly and also to reflect the default values.
2. The implementation was also simplified to define both default values on the method signature instead resolving internally to a default instance.
Ensuring that if seed is set then it will always use a new Random instance before calling subject_proc.
1. This ensures the same random#rand iteration for each internal depth iteration.
2. This is the same guarantee provided when random (before) / seed (now) is not set, thus making those iterations generate the same values (as expected).
Relying on MiniTest's Object#stub to prevent mutating the Faker::Config.random state inside depth iterations.

I've also included a few other minor improvements:

The internal assertions won't run against values generated on the same depth iteration anymore (this was a waste of cycles).
1. The assertions were comparing the generated instances against themselves for equality before this change, now they will only be compared against other instances.
2. This might help partially address Use of PRNG can cause pathological performance in 'bisect' tools #2720 also.
Removed the multi-line block chain from the implementation for clarity and to comply with linter rules instead of silencing it.
Replaced inject with map since there was (apparently) no benefit in using the earlier after the changes above.
Use yield instead of manually calling &block. The reason is that we get an automatic LocalJumpError if no block is provided, so it was possible to remove the manual raise from the implementation.

Additional information

Checklist

Before submitting the PR make sure the following are checked:

This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
- I consider it multiple changes, but with a similar scope, so I've decided to put all in one PR. Please let me know if multiple PRs are preferred.
- I do believe it's better to deliver it all in one go in this particular case.
Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
Tests are added or updated if you fix a bug, refactor something, or add a feature.
- I'm not sure if this requires a test or not; there were no tests currently - the existing test suite will rely on/exercise this code, though.
Tests and Rubocop are passing before submitting your proposed changes.

If you're proposing a new generator:

Open an issue first for discussion before you write any code.
Double-check the existing generators documentation to make sure the new generator you want to add doesn't already exist.
You've reviewed and followed the Documentation guidelines.

…en :random is provided

…:Config.random state on each :depth iteration

…ns on a generated value against itself

…helper implementation

…lementation

…thod signature

…erify helper method

stefannibrasil

This is a great contribution. Thank you @erichmachado for the detailed description.

We started using this helper more a few weeks ago, so probably that put this helper to real test. I liked how it reads much better now.

LGTM. I'd like to get a review from @thdaraujo as well.

stefannibrasil · 2023-09-29T23:48:13Z

test/test_helper.rb

-# @param depth [Integer] the depth of deterministic comparisons to run.
-# @param random [Integer] A random number seed; Used to override the default.
+# @param depth [Integer] the depth of deterministic comparisons to run; the default value is 2.
+# @param seed [Integer] A random number seed; Used to override the default value which is 42.


Oh, I like that name. It says more about it is.

erichmachado · 2023-10-03T14:29:42Z

This is a great contribution. Thank you @erichmachado for the detailed description.

We started using this helper more a few weeks ago, so probably that put this helper to real test. I liked how it reads much better now.

LGTM. I'd like to get a review from @thdaraujo as well.

Thank you for your time and for your review @stefannibrasil. No problem, let's wait for @thdaraujo's review as well. 🙇‍♂️

thdaraujo

Thank you @erichmachado for the great contribution and the in-depth description. I agree with most of your points, and I think this will greatly improve this helper method.

I also agree that we should prevent generators from changing the random instance, but I think we still want to catch those things. So I think instead of stubbing it, we should do something different. I left some questions and suggestions, let me know what you think!

Thanks!

thdaraujo · 2023-10-03T16:32:55Z

test/test_helper.rb

-  raise 'need block' unless block_given?
+def deterministically_verify(subject_proc, depth: 2, seed: 42)
+  results = depth.times.map do
+    Faker::Config.stub :random, Random.new(seed) do


My understanding is that you want to stub random here to guarantee that every iteration will use the same random instance.

I think the desired outcome is that generators should never override/change the random generator during their execution, and we want to have an easy way to assert that with deterministically_verify.

In my view, if we stub random here, I think we might not be able to catch generators that try to set a new value for random again during their execution, so deterministically_verify would allow them to pass. Does that make sense?

Maybe a different approach could be to set the random instance like you're doing it here. And them maybe stub the setter Faker::Config.random= to raise an error if any generator tries to override/change the random generator during the iteration.

Or, maybe an even simpler approach would be to check that the seed is the same before and after the iteration to make sure the generator never changes it. What do you think?

cc @stefannibrasil

Oh, that's a great point! +1 to check the seed is the same before and after the iteration.

Abut checking this on multiple threads: @erichmachado are you up for working on that as well? There are these tests that might be helpful: https://github.com/faker-ruby/faker/blob/main/test/test_default_locale.rb and https://github.com/faker-ruby/faker/blob/main/test/faker/default/test_faker_unique_generator.rb Otherwise, we can create a new issue for that. Thanks!

Hi @thdaraujo, thank you for your time and your review. Much appreciated!

I'll try to explain the rationale behind it below. Please, let me know if you still have questions.

My understanding is that you want to stub random here to guarantee that every iteration will use the same random instance.

Guaranteeing that every internal depth iteration has the same random sequence ("instance") is actually achieved by ensuring that each one of those iterations starts with a fresh Random instance initialized with the same seed value. This is sufficient to ensure that the RNG sequence will be replayed correctly on each cycle.

The motivation to iterate on depth on the initial implementation apparently was to ensure that the code under test was using a deterministic RNG algorithm, since iterating multiple times with the same seed should generate the same output in this case - and I believe this is sound.

The problem was that before we were using a memoized random instance whenever the random: parameter (which was renamed as seed: now) was set; and this was preventing the RNG from generating the same output across depth iterations, because it had already started the sequence in the previous cycles:

Faker::Config.random = random || Random.new(42)

Which we could have fixed with an update like this:

Faker::Config.random = Random.new(seed) || Random.new(42)

And if we assign 42 as a default value for seed on the method signature it could then be abbreviated to:

Faker::Config.random = Random.new(seed)

But here lies another issue: without stubbing, the Faker::Config.random state will be touched every time a depth iteration runs, which I believe to be an undesired behaviour since it is prone to leak that across specs (and that's actually what I was able to confirm via debugging).

Stubbing guarantees that every internal depth iteration will have access to its own Random instance without any side-effect to the global Faker::Config.random state, which addresses that issue.

For example, it's possible to verify that the Faker::Config.random state is leaking across specs with a simple check, as described on item 3 above (replicating it here for reference):

It does not protect the existing Faker::Config.random state from being mutated across #deterministic_verify runs.

It's also easy to verify this, just place a debugger call before and after each #deterministic_verify block and inspect the value of Faker::Config.random around it.

Ideally, just set the Faker::Config.random to a known value before a #deterministic_verify block call on an existing spec and check which value it will have after it.

I assume we don't want it to mutate just because the helper was called, even though we expect the code exercised inside the block to have access to the alternate RNG while running.

In my view, if we stub random here, I think we might not be able to catch generators that try to set a new value for random again during their execution, so deterministically_verify would allow them to pass. Does that make sense?

Based on the analysis above, I believe this change will add to the protection in this direction, but not for the generators (library) code but rather for the specs themselves.

If we want to ensure that library code is not mutating global state, then I recommend relying on a partial double assertion on Faker::Config.random= or an around(:each) catch-all spec instead, in order to guard against updates to Faker::Config.random - like you've mentioned above. 👍 (and I'm happy to help with another contribution there if you like)

In summary, both approaches would complement each other, from my understanding. One is protecting the Faker::Config.random from unintended mutation on the library code itself while the other is protecting against unintended mutations during spec runs.

oh yeah, now I get it. Thanks for the in-depth explanation!

I haven't considered this part when I was reading your proposal:

But here lies another issue: without stubbing, the Faker::Config.random state will be touched every time a depth iteration runs, which I believe to be an undesired behaviour since it is prone to leak that across specs (and that's actually what I was able to confirm via debugging).

Stubbing guarantees that every internal depth iteration will have access to its own Random instance without any side-effect to the global Faker::Config.random state, which addresses that issue.

And if you're interested in working on the guard against generators changing the Faker::Config.random, that would be super cool. Thanks!

thdaraujo · 2023-10-03T16:43:53Z

(by the way, this also reminds me that we should probably create a similar verifier for generators running on different threads that could verify they all return the same values for the same given random seed. But that's out of scope for this discussion. I think this would have helped when we were working on #1700 )

cc: @stefannibrasil

thdaraujo

this is great, thank you!

thdaraujo · 2023-10-07T00:28:50Z

test/test_helper.rb

-    results << subject_proc.call.freeze.tap(&block)
-  end.repeated_combination(2) { |(first, second)| assert_equal first, second }
-  # rubocop:enable Style/MultilineBlockChain
+  results.combination(2) { |(first, second)| assert_equal first, second }


nice catch! 😎

erichmachado added 6 commits September 14, 2023 21:18

Ensure that the #deterministically_verify helper works as expected wh…

2d69e18

…en :random is provided

Prevent the #deterministically_verify helper from mutating the Faker:…

b218d94

…:Config.random state on each :depth iteration

Prevent the #deterministically_verify helper from performing assertio…

3c21e19

…ns on a generated value against itself

Remove the multi-line block chain from the #deterministically_verify …

a06f8f3

…helper implementation

Replace #inject with #map on the #deterministically_verify helper imp…

afa915e

…lementation

Use yield instead of &block on the #deterministically_verify helper

48771fd

erichmachado force-pushed the em-deterministically-verify-helper-improvements branch from d345583 to 48771fd Compare September 15, 2023 01:52

erichmachado added 3 commits September 14, 2023 22:02

Move the default value of :random to the #deterministically_verify me…

68bf990

…thod signature

Provide :seed instead of :random instance to the #deterministically_v…

bebd0ac

…erify helper method

Update #deterministically_verify helper method documentation

f91d210

stefannibrasil approved these changes Sep 29, 2023

View reviewed changes

thdaraujo reviewed Oct 3, 2023

View reviewed changes

thdaraujo approved these changes Oct 7, 2023

View reviewed changes

thdaraujo changed the title ~~Some #deterministically_verify helper improvements~~ Improve #deterministically_verify helper Oct 7, 2023

thdaraujo merged commit 67e6511 into faker-ruby:main Oct 7, 2023
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `#deterministically_verify` helper #2828

Improve `#deterministically_verify` helper #2828

erichmachado commented Sep 15, 2023 •

edited

stefannibrasil left a comment

stefannibrasil Sep 29, 2023

erichmachado commented Oct 3, 2023

thdaraujo left a comment

thdaraujo Oct 3, 2023 •

edited

stefannibrasil Oct 4, 2023

erichmachado Oct 4, 2023 •

edited

thdaraujo Oct 7, 2023 •

edited

thdaraujo commented Oct 3, 2023

thdaraujo left a comment

thdaraujo Oct 7, 2023

Improve #deterministically_verify helper #2828

Improve #deterministically_verify helper #2828

Conversation

erichmachado commented Sep 15, 2023 • edited

Motivation / Background

Additional information

Checklist

stefannibrasil left a comment

Choose a reason for hiding this comment

stefannibrasil Sep 29, 2023

Choose a reason for hiding this comment

erichmachado commented Oct 3, 2023

thdaraujo left a comment

Choose a reason for hiding this comment

thdaraujo Oct 3, 2023 • edited

Choose a reason for hiding this comment

stefannibrasil Oct 4, 2023

Choose a reason for hiding this comment

erichmachado Oct 4, 2023 • edited

Choose a reason for hiding this comment

thdaraujo Oct 7, 2023 • edited

Choose a reason for hiding this comment

thdaraujo commented Oct 3, 2023

thdaraujo left a comment

Choose a reason for hiding this comment

thdaraujo Oct 7, 2023

Choose a reason for hiding this comment

Improve `#deterministically_verify` helper #2828

Improve `#deterministically_verify` helper #2828

erichmachado commented Sep 15, 2023 •

edited

thdaraujo Oct 3, 2023 •

edited

erichmachado Oct 4, 2023 •

edited

thdaraujo Oct 7, 2023 •

edited