Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend cache invalidation to include changes in other files (take 2) #8396

Open
wants to merge 66 commits into
base: main
Choose a base branch
from

Conversation

luk-f-a
Copy link
Contributor

@luk-f-a luk-f-a commented Aug 25, 2022

This PR extends the cache invalidation mechanism to detect changes in functions imported from other files and used in the main function. It does that by including the timestamp of the file where any called function resides.

This PR is still a draft.

This approach can deal with the problem described in #8375 (comment)
Implemented:

  • invalidation when a dispatcher calls another dispatcher, including tests
  • invalidation when a cfunc calls another dispatcher, including tests
  • invalidation when a ufunc call another ufunc, including tests
  • invalidation of user-defined overloads, including tests
  • user documentation
  • developer documentation

Notes for reviewer:
The key idea is to extend the mechanism that already exists to invalidate the cache when the file of the main function changes, to invalidate the cache the file of secondary functions change. For each file identified, the FileStamp will be saved in the cache. Those changes can be found in caching.py (methods save, load and related methods).
Calculating the FileStamp is easy, the larger changes in this PR are related to figuring which functions (are therefore files) are dependencies of the main function. At the center of that task is the function get_function_dependencies in caching.py. This function uses the typemap information to identify which functions are called by the main function. After each secondary function is identified, then they are queried to retrieve those other functions on which the secondary functions might depend. This querying is done against a cache_deps_info method in the function-like object (dispatcher, cfunc, overload, etc).
The process is iterative, but it is cached to avoid recalculating the dependencies every time.
The above describes the "saving" part of the caching, but not the "loading" part of the caching. Since the saving part relies on typing information which is not available during loading, the loading part works exclusively with the FileStamps which were stored in the cache. Each file identified as a dependency will be checked against the stored FileStamp. Any mismatch invalidates the cache.

Limitations (included in the documentation):

  • functions overloaded without the overload decorator.
  • numba-inlined functions
  • guvectorize, due to problems with JITLibrary
  • parfors

Notes:
When cfunc calls another cfunc, or a dispatcher calls a cfunc, the cache does not work. Therefore it's not necessary to take into account in the invalidation.
Invalidation does not work with guvectorize, which uses JITLibrary, which do not have type_annotation information
parfors: functions like 'sum__parallel__impl____locals____sum__1__v2__val.2' not found in typemap. There's a workaround to ignore those cases, but invalidation does not work.

@esc
Copy link
Member

esc commented Aug 7, 2023

As release manager, I have decided that this isn't mature enough to be ready for the upcoming 0.58 release (next week) and as such I am moving this into our backlog

@esc esc modified the milestones: Numba 0.58 RC, 0.59.0-rc1 Aug 7, 2023
@dlee992
Copy link
Contributor

dlee992 commented Aug 9, 2023

Hi, does anyone have time to review this? I can give my preliminary review first, since this patch is really useful! Having it can avoid our pains when we use numba intensively. cc @stuartarchibald @guilhermeleobas

@guilhermeleobas
Copy link
Contributor

guilhermeleobas commented Aug 9, 2023

Hi @dlee992, I believe this PR is waiting on a second review from Stuart. But feel free to give your comments as well.

@dlee992
Copy link
Contributor

dlee992 commented Aug 9, 2023

Got it. I tested it locally, really cool to have it!

deps[py_file] = get_source_stamp(py_file)
# retrieve all dependencies of the function in fc_ty
indirect_deps = get_deps_info(fc_ty, sig)
deps.update(indirect_deps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @luk-f-a . After testing, I met an error in this line:

  File "numba/core/dispatcher.py", line 1012, in compile
    self._cache.save_overload(sig, cres)
  File "numba/core/caching.py", line 705, in save_overload
    self._save_overload(sig, data)
  File "numba/core/caching.py", line 714, in _save_overload
    deps_filestamps = get_function_dependencies(data)
  File "numba/core/caching.py", line 826, in get_function_dependencies
    deps.update(indirect_deps)
TypeError: 'NoneType' object is not iterable

Sry, I can't provide a concrete testcase for this. But I think we should at least add a None check for indirect_deps to avoid this error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error means get_deps_info could return None at a certain situation.

if getattr(sys, 'frozen', False):
st = os.stat(sys.executable)
else:
st = os.stat(filepath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this line should also be combined with a try catch, since filepath could not exist, then os.stat() will throw an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A potential case is that when two threads are running:

  1. Thread 1 generates some files, which are saved as metadata in caching;
  2. Thread 2 does some cache cleaning, which removes the generated files by Thread 1;
  3. Thread 1 uses the legacy metadata in os.stat(filepath), ah, an error happens.

@dlee992
Copy link
Contributor

dlee992 commented Aug 25, 2023

Ah, I found another important bug about this PR, besides my comments above. will give a reproducer later. It seems failed to recognise one kind of local import into a cache file's dependencies.

Updates: turned out this PR doesn't work with structuref and/or structref's overload_method defined in another file. It can't detect structref's overload_method's change.

Updates again: I think this PR also needs to handle overload_method, which is an instance of BoundFunction and not handled by this PR. Need to Clarify that this PR doesn't handle this, or add the implementation for it! @luk-f-a

@dlee992
Copy link
Contributor

dlee992 commented Aug 28, 2023

With this #9168, we can add BoundFunction (i.e., overload_method's type template) in dep_types in caching.py. Then add some implementations for handling this type, then we can detect overload_method related cache invalidation as well!

I can contribute this feature to this PR, but how to do this on Luk's branch? just clone it and push to here?

@luk-f-a
Copy link
Contributor Author

luk-f-a commented Aug 30, 2023

@dlee992 thanks for your comments. I'll look into it as soon as possible.

@dlee992
Copy link
Contributor

dlee992 commented Aug 31, 2023

@luk-f-a You're welcome. I can give a gist of my local patch for adding overload_method to dependency analysis.

BTW, a @njit(inline="always") function (called by other njit function) doesn't show up in the dependency graph, is it expected?

here is my impl for caching: https://gist.github.com/dlee992/517b9fa1ff86d3b3c7524e85918e7b5a, just FYI. Oh, note: must also apply the changes in #9168

@dlee992
Copy link
Contributor

dlee992 commented Apr 25, 2024

I found another bug after using my patch to support overload_method in this dependency graph.

Need another patch to fetch these infos back.

BTW, can I directly push my patches into this PR? Didn't do this before, and not sure how to do this.

Perhaps, I can continue to maintain this PR if anyone is interested.

@dlee992
Copy link
Contributor

dlee992 commented Apr 26, 2024

Yeah, I made a patch to support overload_method correctly for constructing "complete" cache dependency graphs.
Not sure how to push into this PR, or if no one objects, I can open a new PR to push this forward?

The test case is:

""""""
import os
from functools import partial

from numba import njit, types
from numba.extending import overload_method

from study.numba.test_ol_method import add
from study.numba.test_ol_method2 import minus
from study.numba.test_ol_method3 import float_minus

os.environ["NUMBA_DEBUG_CACHE"] = "1"
# os.environ["NUMBA_DEBUG_TYPEINFER"] = "1"

njit_cache = partial(njit, cache=True)
ol_method_cache = partial(overload_method, cache=True)

# 1st way
@njit_cache
def fake_minus(a, b):
    return minus(a, b)

@njit_cache
def fake_float_minus(a, b):
    return float_minus(a, b)

@ol_method_cache(types.Integer, "minus")
def ol_minus(a, b):
    if isinstance(b, types.Integer):
        return fake_minus.py_func

    if isinstance(b, types.Float):
        return fake_float_minus.py_func

@njit_cache
def foo(a, b, d):
    e = a.minus(b)
    f = a.minus(d)
    return e, f

print(foo(3, 2, 2.0))

and define minus and float_minus in other files:

@njit_cache
def minus(a, b):
    return a - b

Currently, this PR can't handle this case. ex: foo didn't know it depends on minus defined in another file test_ol_method2.

@guilhermeleobas
Copy link
Contributor

BTW, can I directly push my patches into this PR? Didn't do this before, and not sure how to do this.

You can create a branch based on Lucio's branch and push as a new pull request, or just submit your patch and add a comment saying it needs #8396 to be merged first.

I'd like to put this one up for discussion after the 0.60 release. Since it is being opened for a while, it would be good to have some closure on this one.

@luk-f-a
Copy link
Contributor Author

luk-f-a commented May 1, 2024

hi @dlee992 thanks for making improvements to the PR. I have to be realistic and unfortunately I won't be able to do further work on this PR.
@guilhermeleobas you are right. I think this PR is a valuable addition but back in 2022 it was not possible to get it reviewed and merged due to other, higher priorities for the core devs. I hope the work does not get lost and it becomes useful at some point in the future. I don't have time to work on it, but it's fine for me if anyone else wants to take over.

@dlee992
Copy link
Contributor

dlee992 commented May 24, 2024

Thanks! @luk-f-a .
However, after 2nd thought, I prefer we should stick to this PR, since

  1. this one has already supported jit and overload decently.
  2. this one has already been pretty big, adding more feature into this is not realistic, and possibly unfriendly to reviewers.

If this is reviewed and finally merged, I can separately open a small PR to extend overload_method feature.

Thoughts?

@luk-f-a
Copy link
Contributor Author

luk-f-a commented May 26, 2024

@dlee992 what you describe about limiting scope makes a lot of sense. Are you planning to take over the work with the current scope and work the PR to completion?

@dlee992
Copy link
Contributor

dlee992 commented May 28, 2024

@luk-f-a. I can work on parts of this current PR, I mean if someone gave some review comments, I can work on them if I understand the original code changes. But I need to admit that I don't fully understand all of them. But will try.

Another issue that is delaying this forward is the concern from #9316. There still has an unfinished discussion if numba should use per-file cache rather than per-function cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants