BUG: Ensure SeedSequence 0-padding does not collide with spawn keys #16551

rkern · 2020-06-09T16:04:45Z

The implicit 0-padding that is done to small entropy inputs to make them the
size of the internal pool conflicts with the spawn keys, which start with an
appended 0.

In order to maintain stream compatibility with unspawned SeedSequences, we
explicitly 0-pad short inputs out to the pool size only if the spawn key is
provided, and thus would trigger the bug. This should minimize the impact on
users that were not encountering the bug.

Fixes #16539 The implicit 0-padding that is done to small entropy inputs to make them the size of the internal pool conflicts with the spawn keys, which start with an appended 0. In order to maintain stream compatibility with unspawned `SeedSequence`s, we explicitly 0-pad short inputs out to the pool size only if the spawn key is provided, and thus would trigger the bug. This should minimize the impact on users that were not encountering the bug.

seberg

Looks fine to me. Do we have to limit the number of spawned children to something like 2**32 or maybe 2**64:

In [6]: SeedSequence(42, spawn_key=(2**32,)).generate_state(4)                                                          
Out[6]: array([3908812709, 3341582407, 3454793571,  679120907], dtype=uint32)

In [7]: SeedSequence(42, spawn_key=(0,1)).generate_state(4)                                                             
Out[7]: array([3908812709, 3341582407, 3454793571,  679120907], dtype=uint32)

This may be a bit hypothetical, but just noticed that it is possible...

seberg · 2020-06-09T17:17:46Z

numpy/random/tests/test_seed_sequence.py

+    # Ensure that large integers are inserted in little-endian fashion to avoid
+    # trailing 0s.
+    ss0 = SeedSequence(42)
+    ss1 = SeedSequence(42 << 32)


Should this test (also) spawn a single child? Otherwise, does it actually test the trailing 0s, since the length would differ?

This one isn't related to #16539, just explicitly testing that we did design the conversion of large integers to uint32 words correctly. There were two obvious ways to do it, and the other (perhaps more-obvious) way would have also caused a problem.

The way we convert 42 << 32 into a uint32 array for consumption by SeedSequence converts it to [0, 42]. The other way to do it ("big-endian") would have converted it to [42, 0], which would mean that 42, 42 << 32, 42 << 64 and 42 << 96 all would generate the same states.

I relalize, I figured that in one case (if this was the "bad" version) this would be [42,] and in the other [42, 0] and I assume that this would still give a different result due to the hashing procedure being sensitive to trailing 0s as well.
But with the 0 appending from the spawn key, it becomes [42, ..., 0] in both cases, where ... is filled with 0s.
In any case, I don't mind, it is still a good test.

The problem is that the concatenated words of entropy (entropy and spawn_key, as returned by get_assembled_entropy()) will still get 0-padded to pool_size if it's shorter. It just happens implicitly.

numpy/numpy/random/bit_generator.pyx

Lines 359 to 361 in 7a82f14

# Our pool size is bigger than our entropy, so just keep

# running the hash out.

mixer[i] = hashmix(0, hash_const)

The hashing algorithm is only sensitive to different number of trailing 0s for trailing 0s beyond the pool_size. This is testing for a potential design flaw (which I avoided) even without the spawn_key.

seberg · 2020-06-09T17:50:28Z

One thing is whether we should add release notes for this. Which I guess means in this case we have to edit the 1.19 release notes directly, not sure how to best do t hat.

charris · 2020-06-09T18:03:50Z

If it has a release note it will be easy to pick up from doc/release/upcoming_changes/ in the 1.19.x branch.

rkern · 2020-06-09T18:10:05Z

Do we have to limit the number of spawned children to something like 2**32

Notionally, yes, we should prevent spawning if it would go past 2**32.

seberg · 2020-06-09T20:35:33Z

Yeah, its a different (and very corner case) issue in any case, thanks. I think we can just merge this, just would like to make you/Chuck a call on whether to put it into 1.19 and how the release note should be.

charris · 2020-06-10T16:09:06Z

Still needs a release note. It can be removed from master after backporting. There will also be a release note for the default rng change, so this isn't unique. The PR could also be made against the 1.19 branch and forward ported.

charris · 2020-06-10T17:00:56Z

That release note file will be overwritten, the real one is in the maintenance/1.19.x branch. I can work with this, but it will be easier to deal with if next time it is a file in doc/release/upcoming_changes, that way it will show up when backported instead of causing conflicts with the working file.

charris · 2020-06-10T17:23:21Z

Thinking it over, the best bet is probably to make the default rng change against maintenance/1.19.x and then we can forward port it to master.

rkern · 2020-06-10T17:54:58Z

Are you referring to the changes in this PR?

charris · 2020-06-10T18:08:05Z

Thanks Robert.

charris · 2020-06-10T18:47:50Z

@rkern I was thinking of the upcoming change of the default generator. But this was easy enough to backport, I just cherry-picked the first commit and cut and pasted the notes insert, so that works.

rkern · 2020-06-10T18:52:20Z

Thanks! Sorry for the faffing around.

… in numpy 1.19 (numpy/numpy#16551); fix some pytest warnings

charris added 00 - Bug 09 - Backport-Candidate PRs tagged should be backported component: numpy.random labels Jun 9, 2020

seberg reviewed Jun 9, 2020

View reviewed changes

DOC: Add release note.

e1db72c

charris merged commit 073316d into master Jun 10, 2020

charris mentioned this pull request Jun 10, 2020

BUG: Ensure SeedSequence 0-padding does not collide with spawn keys #16563

Merged

seberg deleted the fix/seed-sequence-zeros branch June 10, 2020 19:31

seberg restored the fix/seed-sequence-zeros branch June 10, 2020 19:31

seberg deleted the fix/seed-sequence-zeros branch June 16, 2020 14:07

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Jun 20, 2020

cloudbopper added a commit to cloudbopper/anamod that referenced this pull request Oct 20, 2020

Update gold files after upgrading dependencies since SeedSequence fix…

fbb6087

… in numpy 1.19 (numpy/numpy#16551); fix some pytest warnings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Ensure SeedSequence 0-padding does not collide with spawn keys #16551

BUG: Ensure SeedSequence 0-padding does not collide with spawn keys #16551

rkern commented Jun 9, 2020

seberg left a comment •

edited

seberg Jun 9, 2020

rkern Jun 9, 2020

rkern Jun 9, 2020

seberg Jun 9, 2020

rkern Jun 9, 2020

seberg commented Jun 9, 2020

charris commented Jun 9, 2020

rkern commented Jun 9, 2020

seberg commented Jun 9, 2020

charris commented Jun 10, 2020

charris commented Jun 10, 2020

charris commented Jun 10, 2020

rkern commented Jun 10, 2020

charris commented Jun 10, 2020

charris commented Jun 10, 2020

rkern commented Jun 10, 2020

	# Our pool size is bigger than our entropy, so just keep
	# running the hash out.
	mixer[i] = hashmix(0, hash_const)

BUG: Ensure SeedSequence 0-padding does not collide with spawn keys #16551

BUG: Ensure SeedSequence 0-padding does not collide with spawn keys #16551

Conversation

rkern commented Jun 9, 2020

seberg left a comment • edited

Choose a reason for hiding this comment

seberg Jun 9, 2020

Choose a reason for hiding this comment

rkern Jun 9, 2020

Choose a reason for hiding this comment

rkern Jun 9, 2020

Choose a reason for hiding this comment

seberg Jun 9, 2020

Choose a reason for hiding this comment

rkern Jun 9, 2020

Choose a reason for hiding this comment

seberg commented Jun 9, 2020

charris commented Jun 9, 2020

rkern commented Jun 9, 2020

seberg commented Jun 9, 2020

charris commented Jun 10, 2020

charris commented Jun 10, 2020

charris commented Jun 10, 2020

rkern commented Jun 10, 2020

charris commented Jun 10, 2020

charris commented Jun 10, 2020

rkern commented Jun 10, 2020

seberg left a comment •

edited