-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional fastmath optimizations via env var #290
Conversation
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, very cool, thank you!
Surprising that the benchmarks don't do better tbh. Do we know whether the initial results in #287 an anomaly? I thought this might make up for #256 (though I note that your results on intel are 0.30x vs 0.11x on my ARM...)
I was trying to think whether it's possible to change the setting at runtime, and so we could add this as a parameter in the benchmarks, rather than running the benchmark script twice.
I think it's possible, but not easy — it would require something like
Lines 79 to 96 in fa851cc
@property | |
def target(self): | |
if self._target_cpu: | |
return "cpu" | |
else: | |
if _is_in_unsafe_thread_pool(): | |
logger.debug( | |
"Numbagg detected that we're in a thread pool with workqueue threading. " | |
"As a result, we're turning off parallel support to ensure numba doesn't abort. " | |
"This will result in lower performance on parallelizable arrays on multi-core systems. " | |
"To enable parallel support, run outside a multithreading context, or install TBB or OpenMP. " | |
"Numbagg won't re-check on every call — restart your python session to reset the check. " | |
"For more details, check out https://numba.readthedocs.io/en/stable/developer/threading_implementation.html#caveats" | |
) | |
self._target_cpu = True | |
return "cpu" | |
else: | |
return "parallel" |
numbagg/numbagg/test/test_benchmark.py
Lines 59 to 63 in 86ce31d
@pytest.fixture | |
def clear_numba_cache(func): | |
func.gufunc.cache_clear() | |
yield |
How do tests do with the flag enabled?
I'll prospectively merge so we can test it some more.
Thank you very much @frazane!
@max-sixty purely guessing, but I think the problem with |
Overall I think this is an interesting area to explore. But the perf gains aren't that high or widespread. So let's leave this in and see whether we can find any that are — if there are cases where it's 5x faster then that totally changes the calculus on whether we try and promote this path for users... |
With this PR
fastmath
optimizations can be optionally enabled via aNUMBAGG_FASTMATH
environment variable. A warning is always issued when used. Importantly, we don't simply usefastmath=True
but specify a set of flags that should not result in unsafe behavior, but possibly in reduced precision. The "no nans" and "no infs" fastmath flags are not used. See also the discussion in #287.In my benchmarks I only observe a 2x performance improvement on
nansum
andnanmean
(for double precision floats, in micro-benchmarks I saw 4x for single precision), smaller performance improvements onnanstd
andnanvar
, and no noticeable differences, or even slightly worse results, on all other aggregations.Closes #287
Tests
Tests are passing, including those for correctness, when
NUMBAGG_FASTMATH=true
, which should indicate it's safe to use it. This also includes tests for large arrays (1'000'000 elements).Benchmark: Linux system with 8 skylake-avx512 CPUs
NUMBAGG_FASTMATH=true
:pandas
bottleneck
numpy
pandas
bottleneck
numpy
bfill
ffill
group_nanall
group_nanany
group_nanargmax
group_nanargmin
group_nancount
group_nanfirst
group_nanlast
group_nanmax
group_nanmean
group_nanmin
group_nanprod
group_nanstd
group_nansum_of_squares
group_nansum
group_nanvar
move_corr
move_cov
move_exp_nancorr
move_exp_nancount
move_exp_nancov
move_exp_nanmean
move_exp_nanstd
move_exp_nansum
move_exp_nanvar
move_mean
move_std
move_sum
move_var
nanargmax
[^5]nanargmin
[^5]nancount
nanmax
[^5]nanmean
nanmin
[^5]nanquantile
nanstd
nansum
nanvar
NUMBAGG_FASTMATH=false
:pandas
bottleneck
numpy
pandas
bottleneck
numpy
bfill
ffill
group_nanall
group_nanany
group_nanargmax
group_nanargmin
group_nancount
group_nanfirst
group_nanlast
group_nanmax
group_nanmean
group_nanmin
group_nanprod
group_nanstd
group_nansum_of_squares
group_nansum
group_nanvar
move_corr
move_cov
move_exp_nancorr
move_exp_nancount
move_exp_nancov
move_exp_nanmean
move_exp_nanstd
move_exp_nansum
move_exp_nanvar
move_mean
move_std
move_sum
move_var
nanargmax
[^5]nanargmin
[^5]nancount
nanmax
[^5]nanmean
nanmin
[^5]nanquantile
nanstd
nansum
nanvar