Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monkeypatch doesn't patch a module if it has already been imported by production code. #2229

Closed
JohanLorenzo opened this issue Feb 2, 2017 · 26 comments

Comments

@JohanLorenzo
Copy link

JohanLorenzo commented Feb 2, 2017

Thank you for this great testing library 😃

If I understand #762 correctly, modules should be reimported if pytest monkeypatch a module, like this:

monkeypatch.setattr('pytest_bug.b.foo', foo_patch)

even if the module has been imported this way:

from pytest_bug.b import foo

I have a reduced test case in this repo: https://github.com/JohanLorenzo/pytest-bug-2229. The README contains instructions about how to run it.

I checked the contexts via PDB:

  • When you're in the test, foo is correctly patched.
  • However, when you're in the function you'd like to test, foo hasn't been changed.

Environment

  • Python 3.5.3 (On Arch Linux)
  • pip list: appdirs==1.4.0,packaging==16.8,py==1.4.32,pyparsing==2.1.10,pytest==3.0.6,pytest-bug==0.0.1,pytest-bug-2229==0.0.1,six==1.10.0
@nicoddemus
Copy link
Member

Hi @JohanLorenzo,

That's a common gotcha with mocking in Python, and is not particular to monkeypatch but affects also other mocking modules including unittest.mock. I found an article that explains this common issue here: http://alexmarandon.com/articles/python_mock_gotchas, see the "Patching in the wrong place" section.

Closing this for now, but feel free to ask further questions!

@JohanLorenzo
Copy link
Author

Okay, I'll stick to from pytest_bug import b from now on. Thanks for the link!

@nicoddemus
Copy link
Member

No problem! 👍

@spearsem
Copy link

spearsem commented Jul 8, 2019

@nicoddemus I'd like to also add that the standard 'patching in the wrong place' advice doesn't help if you need to patch a module-level variable specifically before the module is imported, because the setting of that variable will have a side-effect during the moment when the module itself is imported.

For example consider something like this:

#my_module.py
import os

if os.environ.get("MY_SETTING") is not None:
    important_variable = 'foo'
else:
    important_variable = 'bar'

def do_something():
    if important_variable == 'foo':
        return True
    return False

Suppose in my tests I want to monkeypatch the OS environment variable 'MY_SETTING' but before any tests will import my_module.py. I may even want to patch it multiple different ways and import my_module.py different ways with different patched variables that have different "start up" side effects.

Such as some conftest.py file similar to this

# conftest.py
import pytest


@pytest.fixture(scope='session', autouse=True)
def apply_my_setting_for_tests():
    from _pytest.monkeypatch import MonkeyPatch
    m = MonkeyPatch()
    m.setenv('MY_SETTING', 'BAZ')
    yield m
    m.undo()

But this does not appear to actually apply the patching, as though pytest collects and imports the module first, presenting no opportunity to get different on-import side effects.

@RonnyPfannschmidt
Copy link
Member

@spearsem even if you want that, its completely against the basics of the python module system,
so its a good luck, figure out that hack yourself and support it yourself thing

@spearsem
Copy link

spearsem commented Jul 8, 2019

@RonnyPfannschmidt how is it "against" the module system? This pattern is actually quite common in Python. For example, in scientific computing, your production model might be something very large loaded in memory, but there might be a toy fixture model very fast to load just for tests. You can't load both and can't load on-demand per each function, since the time to re-load per function call might still be too much even for the small model. So you need to "load once" but the exact logic of that loading might need to depend on external factors. Often this way of managing loaded data sets is just dictated to the programmer, e.g. from third parties or something, and so the fact that you need to do this while testing is not optional. You often cannot control whether or not this type of side-effectful import is being done, you just have to live with third-party code that foists it onto you.

@RonnyPfannschmidt
Copy link
Member

@spearsem loading of specific datasets in those cases is done by invoking functions, not by reloading modules after changing env vars

@RonnyPfannschmidt
Copy link
Member

other valid ways is configuring and skipping tests on different testruns

@spearsem
Copy link

spearsem commented Jul 8, 2019

@RonnyPfannschmidt what I am asking is more generic. Suppose you're given a third party module that has important side-effects that occur based on settings it automatically detects during import time. You can't change this, but you need your tests to run with different values for those autodetected, on-import settings. What do you do currently?

@RonnyPfannschmidt
Copy link
Member

@spearsem

short term: run the testrunner multiple times
long term: try to completely remove the need for those dependencies as their global configuration is a liability

@nicoddemus
Copy link
Member

Hi @spearsem,

@RonnyPfannschmidt is correct in the sense that unfortunately there's not much monkeypatch can actually do using standard Python rules. I guess it is possible to hack something together by using reload instead, but this brings its own set of problems too.

If it is something you can change, I agree that the best long term solution is to stop doing module-level stuff that depends on global variables, as this is very brittle in the sense that it is hard to make sure nobody imports the module at the wrong moment.

The short term solution is to run the test runner multiple times, as @RonnyPfannschmidt suggests: you can use the testdir fixture with the runpytest_subprocess method invocation. This way you will get full control when the module will be actually imported. There are tons of examples in pytest's own test suite on how to use testdir. 👍

@RonnyPfannschmidt
Copy link
Member

my suggestion si to actually run pytest twice on the outside, pytester is more for testing pytest plugins, less for controlling pytest executions

@espears1
Copy link

espears1 commented Aug 19, 2020

@RonnyPfannschmidt unfortunately, I don't think your advice would work for most mainstream, recommended, common workflows. For example it's even one of the core principles of 12 Factor apps to store all config in the environment.

It's extremely common to need to read os.environ and set module-level constants at import time. For example, it may be critical that the setting cannot change between function calls (thus os.environ shouldn't be reconsulted on subsequent calls), and you may have logic of module instantiation that has to run at import time and is customized / affected by the state of the environment. Some use cases may have the luxury to replace this with function calls or maybe a clever use of an lru_cache, but it's deeply unreasonable to say all use cases must refactor from first principles to do it that way.

This is especially common with third party dependencies that can't be omitted or replaced, so the developer may have no control over relying on checks to os.environ happening at import time. The advice to longterm remove such dependencies would be literally intractable for the strong majority of users.

Instead, what is needed is a way for pytest to sandbox the import mechanism in tests, so that for the duration of the test, any uses of reload or manipulations of sys.modules cache doesn't have a wider impact on any other test cases, parallel test executors, etc., and can be cleaned up / restored after the fact.

@RonnyPfannschmidt
Copy link
Member

12 factor does not preclude /change those details, the key question is, do you want easy looking code or testable code

Personally I can't find any acceptable reason for not having a central testable entrypoint for configuration over littering it all over the place just to trade controlled dependencies for random globals.

My general personal experience with magical globals for configuration is, that it breeds horrible bugs that are hard to test and hard to debug in production.

If the mainstream opinion was that such a situation is fine, then it'd be another good example for the polemic phrase "the majority is Always wrong".

@espears1
Copy link

Reading from environ at import time is not in conflict with a central config. In the particular use case I am trying to workaround in pytest, everything is contained in a single file called constants.py where all config for a project is sourced, some of which has to be read from environment before we can import things like tensorflow, because tensorflow automatically detects things around hardware, optimized CPU instructions, configured compilers, etc., that needs to be configured correctly at import time.

@espears1
Copy link

This is an illustrative example: tensorflow/tensorflow#1258 (comment)

@RonnyPfannschmidt
Copy link
Member

That seems to me like a intentionally horrendous globals dependency of tensor flow, that should be solved via a supporting pytest plugin that deals with that problem

@espears1
Copy link

That's certainly one perspective you can take. For a huge group of other people using this type of pattern in scientific computing for decades, it's just "good software that works and uses environment variables smoothly" and should be something that any test runner can generically handle out of the box. It does seem like a huge gulf between the two perspectives, but as I don't develop on tensorflow or many other scientific tools that use this same pattern, I can't really say what their response would be, I can only imagine it would be equal in magnitude as yours but in the opposite direction and based on at least as much professional software design experience.

@RonnyPfannschmidt
Copy link
Member

That's why a plugin should contain that difference in design that stems from from different design goals driven by different target audiences

@nicoddemus
Copy link
Member

Aside from preferences, this bears mentioning:

Instead, what is needed is a way for pytest to sandbox the import mechanism in tests, so that for the duration of the test, any uses of reload or manipulations of sys.modules cache doesn't have a wider impact on any other test cases, parallel test executors, etc., and can be cleaned up / restored after the fact.

This is far more difficult to accomplish than it appears at first glance, and it might bring its own host of problems given the heterogeneity of code bases out there.

This utility doesn't need to be implemented by pytest, mind you, the implementation should be pytest agnostic and can be integrated with pytest via a plugin:

def pytest_configure(config):
    config._store["import_sandbor"] = ImportSandbox()

@pytest.hookimpl(hookwrapper=True)
def pytest_runtest_setup(item):
    sandbox = config._store["import_sandbox"]
    sandbox.snapshot()
    yield

@pytest.hookimpl(hookwrapper=True)
def pytest_runtest_teardown(item):
    yield
    sandbox = config._store["import_sandbox"]
    sandbox.restore()

Or something along those lines.

@graingert
Copy link
Member

Instead, what is needed is a way for pytest to sandbox the import mechanism in tests

I think pytest-forked will do this for you, I actually use billiard.get_context("spawn") for this sort of thing in my projects.

@Emptyless
Copy link

Emptyless commented Jul 10, 2023

edit: disclaimer, do not use this in production. This is highly experimental / POC and not intended to be used in any real world code.

Although this might go against the "patching in the wrong place" advice from the start of the issue, it is possible to patch the builtins.__import__ from a top-level conftest.py and track the imports. For example, let's assume that there is some module module_a with a function some_func, based on the Stack Overflow answer patch the __import__:

#test/conftest.py
patches = ['module_a.some_func']

def _import(name, *args, **kwargs):
    if name == 'module_a' and len(args) > 3 and args[2][0] == 'some_func':
        patches.append(f"{args[0]['__name__']}.some_func")
    return original_import(name, *args, **kwargs)


import builtins

original_import = builtins.__import__
builtins.__import__ = _import

@pytest.fixture(scope="function", autouse=True)
def some_func_mock(monkeypatch):
    def raise_unconfigured(*args, **kwargs):
        raise RuntimeError(f"unmocked module_a.some_func call for {args}, {kwargs")

    _some_func_mock = MagicMock()
    _some_func_mock.side_effect = raise_unconfigured
    for patch in patches:
        monkeypatch.setattr(patch, _some_func_mock)
    return _some_func_mock

This is very rough and won't take into account from X import Y as Z type imports etc but serves to illustrate how one might solve this. I feel like this could very well be generalized and be part of the monkeypatch fixture in some style like:

monkeypatch.setattr(patch, _some_func_mock, global=True)

Where global=True could be an optional parameter to denote that all previously imported objects should also be patched.

@RonnyPfannschmidt
Copy link
Member

It's a community disservice to post foot guns of that level without appropriate warnings

@Emptyless
Copy link

It's a community disservice to post foot guns of that level without appropriate warnings

I've added a more explicit disclaimer. Do you also have feedback on the proposed design?

@RonnyPfannschmidt
Copy link
Member

It's a partial solution that allows bad factorings to be less painful and will seriously splat if at any point some extra indirection is added, I strongly recommend to ensure such a hack is never part of a main branch, the cleanup cost is absymal

@orenl
Copy link

orenl commented Sep 19, 2024

@spearsem, @espears1, for cases that depend on os.environ I find pytest hooks useful to control the environment prior to any loading (which rids the need to patch the module pre-loading). For example:

pytest.hookimpl()
def pytest_sessionstart(session: pytest.Session) -> None:
    test_env: Dict[str, str] = {
        'ENV_VAR_1': 'value 1',
        'ENV_VAR_2': 'value 2,
        ...
    }
    for key, val in test_env.items():
        os.environ[key] = val

As a side note, you may want to sanitize the environment from other variables to ensure that tests have the exact same enviroment everywhere (as well as in docker setup, if applicable).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants