Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Establish a uniform way to clear all caches in a given module #80666

Open
serhiy-storchaka opened this issue Mar 30, 2019 · 14 comments
Open

Establish a uniform way to clear all caches in a given module #80666

serhiy-storchaka opened this issue Mar 30, 2019 · 14 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir tests Tests in the Lib/test dir type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 36485
Nosy @brettcannon, @rhettinger, @ezio-melotti, @voidspace, @serhiy-storchaka, @boxed, @animalize
PRs
  • bpo-36485: Add sys.clear_caches(). #12632
  • bpo-36485: Add the cachesreg module. #12639
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-03-30.11:56:59.909>
    labels = ['tests', '3.8', 'type-feature', 'library']
    title = 'Establish a uniform way to clear all caches in a given module'
    updated_at = <Date 2019-04-02.20:34:12.894>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2019-04-02.20:34:12.894>
    actor = 'rhettinger'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)', 'Tests']
    creation = <Date 2019-03-30.11:56:59.909>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36485
    keywords = ['patch']
    message_count = 14.0
    messages = ['339190', '339227', '339232', '339234', '339237', '339250', '339251', '339254', '339255', '339301', '339311', '339312', '339355', '339366']
    nosy_count = 7.0
    nosy_names = ['brett.cannon', 'rhettinger', 'ezio.melotti', 'michael.foord', 'serhiy.storchaka', 'Anders.Hovm\xc3\xb6ller', 'malin']
    pr_nums = ['12632', '12639']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue36485'
    versions = ['Python 3.8']

    @serhiy-storchaka
    Copy link
    Member Author

    Some modules have caches. There is a need to clear all tests before running tests. Brett proposed to add in all modules with caches a function with the standardized name which is responsible for clearing module related caches. [1]

    The proposed PR adds a new function clear_caches() in the sys module. It iterates all imported modules and calls function __clearcache__() if it is defined.

        def clear_caches():
            for mod in reversed(list(sys.modules.values())):
                if hasattr(mod, '__clearcache__'):
                    mod.__clearcache__()

    clear_caches() will be used in test.regrtest and can be used in user code. The PR defines also function __clearcache__ for modules which are cleared manually in the current code.

    This is a preliminary implementation, bikeshedding is welcome.

    [1] https://mail.python.org/pipermail/python-ideas/2019-March/056165.html

    @serhiy-storchaka serhiy-storchaka added 3.8 only security fixes stdlib Python modules in the Lib dir tests Tests in the Lib/test dir type-feature A feature request or enhancement labels Mar 30, 2019
    @rhettinger
    Copy link
    Contributor

    Not sure that I agree there is a testing need to clear all caches regardless of what they do. Test code should explicitly state whether it relies on a particular cache being cleared at some particular point in time.

    Also, the concept of "need to clear all caches" isn't well-formed. Would you want sys.intern caches to be cleared? What about the internal caches in SQLite?

    And do you think polling for a new magic attribute is the right approach? We would get looser coupling and better control by letting modules register themselves as needed.

    --- re.py ---

    sys.register_cache_clear_function(callback=purge, doc='pattern cache and re cache')

    --- ipaddress.py --

    sys.register(IPv4Address.is_private.is_getter.cache_clear, doc='check for private networks)

    @voidspace
    Copy link
    Contributor

    An auto-magic cache clearing mechanism is really tempting. I tend to agree with Raymond though, if code needs and progress a cache clearing mechanism it should be treated and accessible.

    They're are probably some problematic caches still within unittest however. Do test results still keep alive all tracebacks until test reporting?

    @voidspace
    Copy link
    Contributor

    On 30 Mar 2019, at 23:48, Michael Foord <report@bugs.python.org> wrote:

    Michael Foord <michael@voidspace.org.uk> added the comment:

    An auto-magic cache clearing mechanism is really tempting. I tend to agree with Raymond though, if code needs and progress a cache clearing mechanism it should be treated and accessible.

    • exposes (not progress)
    • tested (not treated)

    Sorry.

    They're are probably some problematic caches still within unittest however. Do test results still keep alive all tracebacks until test reporting?

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue36485\>


    @voidspace
    Copy link
    Contributor

    Tests codify knowledge about the system under test, so it doesn't matter that the test suite has to know how to clear caches. It's specifically a good thing that the test writer knows which caches exist and need clearing, and how to do it. The harder thing mighty be determining what scope to do the clearing (per test, class or module) bit unittest exposes hooks for fixtures at those points for anything that needs doing automatically.

    @animalize
    Copy link
    Mannequin

    animalize mannequin commented Mar 31, 2019

    I suggest the documentation be written in more detail.

    For example, in __clearcache__'s section, state explicitly that this magic function is for module-level cache, and it will be invoked by sys.clear_caches().

    Maybe also introduce the background: some caches may grow unlimitedly, sys.clear_caches() gives the user a chance to empty them.

    @serhiy-storchaka
    Copy link
    Member Author

    My initial idea was to add a lightweight module cachesreg with two functions: register() and clear_caches(). PR 12639 implements it.

    But I like Brett's idea more, because it is simpler. The only disadvantage of it is that if you make a typo in __clearcache__, this function will be silently ignored.

    I thought also about different levels of cachesreg.cachesreg.register() could take two arguments -- the level and the clearing function. Then cachesreg.clear_caches() could allow to clear only caches of specified level and smaller/larger.

    Both PRs add clearing callbacks only for modules which already cleared in regrtests. There are more caches, and with implementing any of these ideas it will be easier to add clearing caches in other modules.

    @boxed
    Copy link
    Mannequin

    boxed mannequin commented Mar 31, 2019

    I think this is a great idea. We would have needed this many times for tests over the years.

    @animalize
    Copy link
    Mannequin

    animalize mannequin commented Mar 31, 2019

    My initial idea was to add a lightweight module cachesreg with two functions: register() and clear_caches().

    If it only has two functions, it could be a sub-module sys.cachesreg

    Or a lifecycle module, as the name, dedicated to such kind of functions. Register callback functions for memory low, poweroff system, etc.
    I don't want lifecycle module, just provide a possibility.

    @brettcannon
    Copy link
    Member

    RE: "And do you think polling for a new magic attribute is the right approach?": my thinking behind that idea is that by standardizing the function name it's easy to tell if there's a cache, but you can also do away with the registration with a 3 lines of code. To me, the priority is clearing caches on a per-module basici and having a clear-all mechanism can be beneficial, not the other way around.

    @rhettinger
    Copy link
    Contributor

    [Brett]

    To me, the priority is clearing caches on a per-module basici
    and having a clear-all mechanism can be beneficial, not the
    other way around.

    That makes more sense.

    I'm changing the title to match the actual feature request and intent:

    "Add a way to clear all caches" -> "Establish a uniform way to clear all caches in a given module"

    @rhettinger rhettinger changed the title Add a way to clear all caches Establish a uniform way to clear all caches in a given module Apr 2, 2019
    @rhettinger
    Copy link
    Contributor

    Quick question, would the existing sys.reload() logic suffice?

    -- mymodule.py --

    cache = {}                  # On reload, this would clear the cache
    
    def f(x):
        if x in cache:
            return cache[x]
        y = x**2
        cache[x] = y
        return y

    @brettcannon
    Copy link
    Member

    Did you mean importlib.reload() instead of sys.reload()?

    And technically it would *if* you're okay with the other side-effects of reloading, e.g. making sure no one has a reference to any objects from the module's namespace which won't change in-place (e.g. if you stored a reference to the cache in some code then the reload wouldn't clear it for the stored reference).

    @rhettinger
    Copy link
    Contributor

    Did you mean importlib.reload() instead of sys.reload()?

    Yes.

    And technically it would *if* you're okay with the other
    side-effects of reloading,

    If you want to go forward with this, go for it. I would like to be able to explain to another person why this is needed, but personally can't visualize a circumstance where a person is testing module, doesn't know how to use the existing cache clearing APIs, but needs to clear caches (not sure why), and doesn't either know or want what happens on import. I've never seen this situation arise, but if it's something you want, I won't stand it the way.

    BTW, if you're going to have some sort of clear_all(), perhaps it should cover the sys.intern() dictionary and string hashes as well. AFAICT, there's nothing special about a regex cache that gives it a greater need to be cleared

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir tests Tests in the Lib/test dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants