Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable caching functions arising from exec of strings. #6406

Closed

Conversation

stuartarchibald
Copy link
Contributor

As title.

@stuartarchibald
Copy link
Contributor Author

This needs more thought about uniquely capturing a FunctionType instance.

  • globals
  • closures
  • anything else in .__code__ that is captured and can change.

The file path is also a bit strange and needs looking at.

It also needs tests.

@stuartarchibald stuartarchibald force-pushed the wip/cache_string_functions_2 branch 2 times, most recently from 93eaa15 to e0e0f14 Compare December 16, 2020 12:23
Comment on lines +338 to +340
for x in py_func.__code__.co_consts:
buf.append(str(x))
const_bytes = (''.join(buf)).encode('UTF-8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just do a repr(co_consts).encode()?
str on individual elements makes it the following looks the same:

  • (1, 1)
  • (11,)

Copy link
Contributor Author

@stuartarchibald stuartarchibald Dec 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP :), will fix, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a vague memory, I think I put this in when messing about locally as it was an easy thing to trigger as part of the identifier.

for x in py_func.__code__.co_consts:
buf.append(str(x))
const_bytes = (''.join(buf)).encode('UTF-8')
data = py_func.__code__.co_code + const_bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't pick up changes in function name, function arguments, etc. Maybe consider all states in https://github.com/python/cpython/blob/66d3b589c44fcbcf9afe1e442d9beac3bd8bcd34/Objects/codeobject.c#L989-L1012

Might also need the function type signature as this does not store the value of default args.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I've been considering #6406 (comment) and generally poking at the contents of codeobject to see what can/cannot be easily worked out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern here is freevars and globals, and wiring through enough data to achieve accurate and correct behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just not sure if we can handle freevars and globals as is. I am open to the idea of making cacheable functions handle freevars and globals differently than non-cacheable functions. I'm thinking making them linked at load time.

@dlee992
Copy link
Contributor

dlee992 commented Jul 15, 2022

Wow, I feel this will be a nice feature. SDC also uses many string-format jit function via exec, and one time compilation will consume > 10s, which is kind of unaccpetable (PS, I'm not pretty sure whether this caching feature is the main issue for long-time compilation in SDC).

Any updates on this PR?

Resolves conflicts in:
	numba/core/caching.py
	numba/tests/test_dispatcher.py
Comment on lines +1174 to +1182
f = mod.str_closure1
self.assertPreciseEqual(f(3), 6) # 3 + 3 = 6
f = mod.str_closure2
self.assertPreciseEqual(f(3), 8) # 3 + 5 = 8
f = mod.str_closure3
self.assertPreciseEqual(f(3), 10) # 3 + 7 = 8
f = mod.str_closure4
self.assertPreciseEqual(f(3), 12) # 3 + 9 = 12
self.check_pycache(5) # 1 nbi, 4 nbc
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally when the environment is rebuilt as part of cache replay the module for the function is loaded so as to provide globals etc. These tests are failing because there's no module associated with a str exec function and the function refers to a variable that is in the globals supplied to exec.

@stuartarchibald
Copy link
Contributor Author

I think most of the test failures in the above are because the cache file presence check is looking in a __pycache__ based on the imported module containing the exec(str) functions, whereas the _StringSrcCacheLocator inherits from _UserWideCacheLocator which caches in the user-wide cache directory.

@github-actions
Copy link

This pull request is marked as stale as it has had no activity in the past 3 months. Please respond to this comment if you're still interested in working on this. Many thanks!

@github-actions github-actions bot added the stale Marker label for stale issues. label Mar 21, 2023
@esc esc removed their request for review March 21, 2023 07:53
@stuartarchibald stuartarchibald removed the stale Marker label for stale issues. label Mar 21, 2023
@github-actions
Copy link

This pull request is marked as stale as it has had no activity in the past 3 months. Please respond to this comment if you're still interested in working on this. Many thanks!

@github-actions github-actions bot added the stale Marker label for stale issues. label Jun 20, 2023
@github-actions github-actions bot added the abandoned - stale PRs automatically closed due to stale status label Jun 27, 2023
@github-actions github-actions bot closed this Jun 27, 2023
@gmarkall
Copy link
Member

Reopening as there's still interest in this work.

@gmarkall gmarkall reopened this Jun 27, 2023
@github-actions github-actions bot removed abandoned - stale PRs automatically closed due to stale status stale Marker label for stale issues. labels Jun 28, 2023
@github-actions
Copy link

This pull request is marked as stale as it has had no activity in the past 3 months. Please respond to this comment if you're still interested in working on this. Many thanks!

@github-actions github-actions bot added the stale Marker label for stale issues. label Sep 27, 2023
@github-actions github-actions bot added the abandoned - stale PRs automatically closed due to stale status label Oct 4, 2023
@github-actions github-actions bot closed this Oct 4, 2023
@dlee992
Copy link
Contributor

dlee992 commented Apr 30, 2024

I would like to catch up with this abandoned PR. If I can find a workable solution based on it, will open a PR to continue.

I feel this draft has already worked? Just need to merge with main, and fix some corner-cases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress abandoned - stale PRs automatically closed due to stale status stale Marker label for stale issues.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants