Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gerrychain.random sets random seed globally #364

Closed
pjrule opened this issue Aug 30, 2021 · 3 comments
Closed

gerrychain.random sets random seed globally #364

pjrule opened this issue Aug 30, 2021 · 3 comments
Assignees

Comments

@pjrule
Copy link
Contributor

pjrule commented Aug 30, 2021

gerrychain.random sets a default random seed (2018) globally:

random.seed(seed)

This is bad—for starters, the state of other packages could be affected without users immediately noticing.

@InnovativeInventor
Copy link
Member

It turns out this is a more serious problem than it looks at first glance (in other words, I encountered a real-world use case where this matters). In particular, when searching for seed plans, the recursive_seed_part algo occasionally gets stuck. This wouldn't be that bad of a problem if the RNG properly randomized itself. However, as the RNG reseeds itself on every import, it'll never get un-stuck in certain settings. To be clear, this scenario that I encountered is a two-parter and is a result of:

  1. recursive seed part gets stuck, sometimes
  2. the RNG resets itself on each import of GerryChain

This leads to recursive_seed_part being permanently stuck when used a certain way. I think the solution would be either to:
a) disable the default random seed setting function (do we really need this when tools like pcompress exist?)
b) randomly seed the RNG only for seed partition generation (this seems tricky to do)

@InnovativeInventor InnovativeInventor self-assigned this Mar 18, 2022
@InnovativeInventor
Copy link
Member

Ah, so it looks like my bug with recursive_seed_part is caused by this line at the top of tree.py:

from .random import random

peterrrock2 added a commit that referenced this issue Jan 12, 2024
Regarding the deletion of the random.py file: this is
a fix for issue #364.

The problem that the inclusion of the
random.py file was supposed to address was the issue of
reproducibility of Markov chain, and the idea was to set
the global random seed to 2018 at any point where we
would like to import the random module internally within
the gerrychain package. However, this approach also causes
the trees that are generated by the tree.py file to be
fixed if the user does not set the random seed after
the import of the gerrychain package. So an import pattern of

import random
random.seed(0)
import gerrychain
print(random.random())
print(random.random())

will output

0.5331579307274593
0.02768951210200299

as opposed to the expected

0.8444218515250481
0.7579544029403025

will actually force the random seed to be 2018 rather than the
expected 0. This can often cause issues in jupyter notebooks
where the user is not aware that the random seed has been forcibly
set to 2018 after the import of gerrychain. Instead, it is best
to allow to user to set the random seed themselves, and to not
forcibly set the random seed within the gerrychain package
since that can affect the execution of other packages and
can cause the chain to hang when the 2018 seed does not produce
a valid tree.

This issue does not appear if we remove the random.py file
and instead use the random module from the standard library
within the tree.py and accept.py files. This is because of
how python handles successive imports of the same module.
Consider the following snipit:

import random
random.seed(0)
import random
print(random.random())
print(random.random())

This will output

0.8444218515250481
0.7579544029403025

as expected. This is because the random module is only imported once
and then places its name in the internal list of imported modules.
Subsequent imports of the random module within the same python
session will not will simply retrieve the module from the list
and will not re-execute the code contained within the module.
Thus, the random seed is only set once and not reset when the
random module is imported again.

In terms of reproducibility, this means that the user will
be required to set the random seed themselves if they want
to reproduce the same chain, but this is a relatively
standard expectation, and will be required when we move
the package over to a rust backend in the future.
@peterrrock2 peterrrock2 mentioned this issue Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants