Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove randomness dependency on PYTHONHASHSEED #363

Closed
adamchainz opened this issue Jul 12, 2016 · 13 comments
Closed

Remove randomness dependency on PYTHONHASHSEED #363

adamchainz opened this issue Jul 12, 2016 · 13 comments
Labels

Comments

@adamchainz
Copy link
Contributor

We're using faker extensively in our test suite with factory-boy to generate dummy data. I love it! To help with repeatability in tests, I developed the pytest plugin pytest-randomly to allow re-using the random seed between test runs, so if they fail we can reuse the seed to get back the same random data from faker and debug the test.

Unfortunately we're moving from Python 2 to 3 and this includes activating PYTHONHASHSEED by default. We're currently trying to add it on Python 2 so that we're ready.

Since PYTHONHASHSEED means that dictionary iteration order is random, it means that anything that depends on this breaks. This is generally good as if you really depend on the iteration order, you should use a different data structure.

Unfortunately the way Faker generates some random data (specifically I've found random_element when using a dict) is affected by dictionary iteration order. This means that fixing the random seed for the random module isn't enough, and one would need to fix PYTHONHASHSEED too. What a pain! It's also quite hard to do since there isn't a way inside a python process to query what the value of PYTHONHASHSEED is, unless it was explicitly set as an environment variable as opposed to set to 'random'.

I think it would be better if Faker just didn't depend on dictionary iteration order, so its random data is also repeatable. This should be possible in most cases by swapping dict for OrderedDict.

@fcurella
Copy link
Collaborator

Thank for the report @adamchainz

I think random_element method is good the way it currently is.

The issue is with specific providers that declare weighted elements as dicts instead of OrderedDict. We need to audit and fix them.

Additionally, I think this is something that should be mentioned in docs, including examples and docstrings.

@fcurella fcurella added the bug label Jul 12, 2016
@adamchainz
Copy link
Contributor Author

Just realized OrderedDict is 2.7+ but Faker still supports 2.6. Lists of tuples is probably the way to go here then? Unless you want to add a backport dependency for OrderedDict from somewhere.

@fcurella
Copy link
Collaborator

I wouldn't exclude dropping support for 2.6, but I need @joke2k 's opinion on that.

@kevin-brown
Copy link

kevin-brown commented Jul 12, 2016

You could always introduce a Python 2.6 dependency on the ordereddict package and fall back to it.

OrderedDict for older versions of python

@adamchainz
Copy link
Contributor Author

Thanks @kevin-brown , probably the best solution

@fcurella
Copy link
Collaborator

FWIW, dropping support for python 2.6 would also allows us to stop including importing importlib in our requirements.

@jeffwidman
Copy link
Contributor

jeffwidman commented Aug 4, 2016

+1 for dropping 2.6. Anyone who wants to stay on 2.6 at this point is more likely just maintaining an app and not writing something new, so they can just use an old version of faker. Not true everywhere, I can think of several bigco's that are stuck on versions between 2.4-2.6 for major codebases, but they all have efforts underway to switch to 2.7.

adamchainz pushed a commit to adamchainz/faker that referenced this issue Aug 4, 2016
Ref joke2k#363, required so we can use `OrderedDict` without an extra dependency, plus it's super old and insecure.
adamchainz pushed a commit to adamchainz/faker that referenced this issue Aug 4, 2016
Ref joke2k#363, required so we can use `OrderedDict` without an extra dependency, plus it's super old and insecure.
fcurella pushed a commit that referenced this issue Aug 9, 2016
Ref #363, required so we can use `OrderedDict` without an extra dependency, plus it's super old and insecure.
fcurella added a commit that referenced this issue Sep 9, 2016
Address #363. Replace `dicts` with `OrderedDict`s
@fcurella
Copy link
Collaborator

fcurella commented Sep 9, 2016

I think this might have been fixed in #384. Leaving this open for now in case I've missined something. I will close next week is there's nothing wrong

adamchainz pushed a commit to adamchainz/faker that referenced this issue Sep 11, 2016
Only allow providers to use `OrderedDict`s, to avoid any more `PYTHONHASHSEED` problems. Ref joke2k#363.
@adamchainz
Copy link
Contributor Author

Awesome, thanks for the tedious PR. I started on one but there were so many dicts to convert, then I was away for a bit. One thing my local changes make is to ban dicts in random_element and only allow OrderedDicts, see #385.

fcurella pushed a commit that referenced this issue Sep 16, 2016
Only allow providers to use `OrderedDict`s, to avoid any more `PYTHONHASHSEED` problems. Ref #363.
adamchainz pushed a commit to adamchainz/faker that referenced this issue Sep 23, 2016
Only allow providers to use `OrderedDict`s, to avoid any more `PYTHONHASHSEED` problems. Ref joke2k#363.
adamchainz pushed a commit to adamchainz/faker that referenced this issue Sep 23, 2016
Only allow providers to use `OrderedDict`s, to avoid any more `PYTHONHASHSEED` problems. Ref joke2k#363.
adamchainz pushed a commit to adamchainz/faker that referenced this issue Sep 29, 2016
Only allow providers to use `OrderedDict`s, to avoid any more `PYTHONHASHSEED` problems. Ref joke2k#363.
adamchainz pushed a commit to adamchainz/faker that referenced this issue Sep 29, 2016
Only allow providers to use `OrderedDict`s, to avoid any more `PYTHONHASHSEED` problems. Ref joke2k#363.
fcurella pushed a commit that referenced this issue Jul 15, 2019
Only allow providers to use `OrderedDict`s, to avoid any more `PYTHONHASHSEED` problems. Ref #363.

# Conflicts:
#	faker/providers/__init__.py
#	faker/tests/__init__.py
@timschwab
Copy link

Today I was informed by the error message that the random_element provider does not accept a dict as an argument but instead requires an OrderedDict. That message sent me here, which is why I'm deciding to resurrect a long dead issue to ask if this constraint is still needed. As of Python 3.7, iteration over a dict is guaranteed to be in insertion order. This makes the two objects nearly identical. I would be happy to submit a PR to remove this restriction if you guys agree.

@fcurella
Copy link
Collaborator

@timschwab Thanks for pointing that out. Being able to use dict instead of OrderedDict sure sounds nice! Feel free to submit your PR!

@fdemmer
Copy link
Contributor

fdemmer commented Aug 28, 2023

@timschwab did you ever get around to writing that change?

@timschwab
Copy link

Hey @fdemmer unfortunately no, I got pulled away from the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants