Combined parfor chunking and caching PRs. #7625

DrTodd13 · 2021-12-06T21:21:32Z

This replaces #6025 and #7522. There was overlap between these two PRs around using the dynamic thread count so rather than delaying the merge I went ahead and combined them.

This combined PR provides an API for selecting a parfor chunk size to deal with load balancing issues and it eliminates all use of static thread counts in generated parfor code. Thus, parfor code (even with reductions) is now cacheable and if you change the chunksize or thread count after reloading from cache then you will use the new values as they are applied correctly in the code now.

Closes #2556
Closes #3144

…re calling set_parallel_chunksize.

…size, 2) set chunksize back to the default of 0 and then after the gufunc returns, restore the chunksize back to the previously saved value. This way, the current thread gets its default chunksize behavior inside the parallel region but goes back to its previous value when the region is over.

…its use point.

Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>

More details on how actual chunksize can differ from specification. Moved code examples in docs to tests/doc_examples/test_parallel_chunksize.py. Export (g,s)et_parallel_chunksize from numba.np.ufunc. Fix withcontext parallel_chunksize doc string. Change set_parallel_chunksize to return previous chunk size. Use that return value to remove need for get_parallel_chunksize in some places. Raise exception if negative value to set_parallel_chunksize.

…, the full reduction array is passed to all gufunc workers. They each get their threadid to work on just their slice of the full reduction array. This simplifies some of the internal reduction code. This frees the reduction array length from any association with the size of the schedule.

…e. Use the dynamic thread count when constructing the schedule so that the parallel=True function can be correctly cacheable.

…n array allocation.

sklam · 2022-06-22T17:31:37Z

The latest changes looks good to me

stuartarchibald

Thanks for the update @DrTodd13. I just noticed there's a near duplicate cache test file name the needs addressing (see inline comment). I also reviewed all the outstanding queries that got lost in the long review and have commented on those, all are resolved with the exception of #7625 (comment) which is still of concern. I'm going to give this a run through the build farm now on the basis that public CI will be sufficient to deal with the minor change resulting in merging the cache test files. Thanks again!

stuartarchibald · 2022-06-23T08:41:09Z

numba/tests/parfor_cache_usecases.py

@@ -0,0 +1,37 @@
+"""


I just noticed that there's another file called parfors_cache_usecases.py (the difference is the s after parfor) I think this file should be merged into that one and the corresponding cache test updated to reflect the change.

See IntelLabs#72

docs/source/developer/threading_implementation.rst

numba/core/types/functions.py

stuartarchibald · 2022-06-23T08:51:09Z

numba/core/withcontexts.py

@@ -464,3 +466,70 @@ def _mutate_with_block_callee(blocks, blk_start, blk_end, inputs, outputs):
        block=ir.Block(scope=scope, loc=loc),
        outputs=outputs,
    )
+
+class _ParallelChunksize(WithContext):


I think as it's private we can just move it later as needed.

stuartarchibald · 2022-06-23T08:54:01Z

numba/np/ufunc/parallel.py

@@ -648,6 +663,18 @@ def impl():
    return impl


+@intrinsic
+def _iget_num_threads(typingctx):
+    _launch_threads()


parfor lowering could use get_num_threads but I'm reluctant to add more typing queries into lowering, it makes things harder to debug.

stuartarchibald · 2022-06-23T08:54:18Z

numba/np/ufunc/workqueue.c

+static void
+add_task(void *fn, void *args, void *dims, void *steps, void *data)
+{
+    add_task_internal(fn, args, dims, steps, data, 0);
+}
+


stuartarchibald · 2022-06-23T08:54:46Z

numba/parfors/parfor_lowering.py

+        gufunc_txt += "    " + param_dict[var] + \
+             "=" + param_dict[arr] + "[" + gufunc_thread_id_var + "]\n"


Thanks for explaining.

stuartarchibald · 2022-06-23T08:55:59Z

numba/parfors/parfor_lowering.py


    get_num_threads = cgutils.get_or_insert_function(
        builder.module,
        llvmlite.ir.FunctionType(llvmlite.ir.IntType(types.intp.bitwidth), []),
        "get_num_threads")

    num_threads = builder.call(get_num_threads, [])
+    current_chunksize = builder.call(get_chunksize, [])


I think we covered this in #7625 (comment)

As title.

stuartarchibald · 2022-06-23T10:18:00Z

@DrTodd13 I've opened IntelLabs#72 to address the duplication of test files, please could you take a look and if you approve merge in, many thanks.

stuartarchibald · 2022-06-23T11:45:47Z

RE the outstanding comment from: #7625 (comment), PR #8186 has 991a965 which removes the proposed unification method on the ExternalFunctionPointer type and the tests at least pass in public CI. Is this sufficient evidence to suggest it is not needed?

gmarkall · 2022-06-23T11:48:03Z

gpuci run tests (just running this as there are some changes to numba.core - I'm pretty sure it won't affect anything negatively in CUDA, but just double-checking here)

Refactor parfor cache tests to make use of existing code.

numba/core/types/functions.py

Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>

sklam · 2022-06-23T17:27:26Z

one unresolved comment: https://github.com/numba/numba/pull/7625/files#r904748398

Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>

sklam · 2022-06-23T23:15:51Z

A windows test failed:


======================================================================
FAIL: test_caller (numba.tests.test_parfors_caching.TestParforsCache)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\1\s\numba\tests\test_parfors_caching.py", line 45, in test_caller
    self.run_test(f, num_funcs=3)
  File "D:\a\1\s\numba\tests\test_parfors_caching.py", line 22, in run_test
    self.assertPreciseEqual(f(ary), f.py_func(ary))
  File "D:\a\1\s\numba\tests\support.py", line 390, in assertPreciseEqual
    self.fail("when comparing %s and %s: %s" % (first, second, failure_msg))
AssertionError: when comparing [0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1] and [0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]: 0.1 != 0.10000000000000003

----------------------------------------------------------------------

stuartarchibald · 2022-06-24T08:40:54Z

A windows test failed:


======================================================================
FAIL: test_caller (numba.tests.test_parfors_caching.TestParforsCache)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\1\s\numba\tests\test_parfors_caching.py", line 45, in test_caller
    self.run_test(f, num_funcs=3)
  File "D:\a\1\s\numba\tests\test_parfors_caching.py", line 22, in run_test
    self.assertPreciseEqual(f(ary), f.py_func(ary))
  File "D:\a\1\s\numba\tests\support.py", line 390, in assertPreciseEqual
    self.fail("when comparing %s and %s: %s" % (first, second, failure_msg))
AssertionError: when comparing [0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1] and [0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]: 0.1 != 0.10000000000000003

----------------------------------------------------------------------

It's strange that this should suddenly start failing. It looks like a minor numerical error, probably just from using reductions/accumulation in here:

numba/numba/tests/parfors_cache_usecases.py

Lines 9 to 25 in 33e94b0

    
           @njit(parallel=True, cache=True) 
        
           def arrayexprs_case(arr): 
        
               return arr / arr.sum() 
        
           @njit(parallel=True, cache=True) 
        
           def prange_case(arr): 
        
               out = np.zeros_like(arr) 
        
               c = 1 / arr.sum() 
        
               for i in range(arr.size): 
        
                   out[i] = arr[i] * c 
        
               return out 
        
           @njit(cache=True) 
        
           def caller_case(arr): 
        
               return prange_case(arrayexprs_case(arr))

I think the "fix" is to use np.testing.assert_allclose in the check_module and run_module methods in https://github.com/numba/numba/pull/7625/files#diff-89333c093ac43075778fe1a5bdd16ed10fd6380e4a76dc5e9e8fdb148596f679

…ly equal check for output of summations.

stuartarchibald

Thanks for all your work on this @DrTodd13!

sklam · 2022-06-24T17:01:48Z

smoketesting at BFID numba_smoketest_cpu_yaml_110

gmarkall · 2022-06-26T14:53:02Z

🎉

DrTodd13 and others added 30 commits July 22, 2020 15:34

Initial support for selecting the chunk size for parallel regions.

29eaed6

Flake8 fixes.

861e23d

Flake8 fixes.

52ef533

Merge branch 'master' into autochunk

65d0806

Put the chunksize from the with context argument into a variable befo…

ea4c36f

…re calling set_parallel_chunksize.

Make parallel_chunksize statie.

9975c13

Add tests for setting the parallel chunksize.

41d75c0

Merge branch 'master' into autochunk

eb2ad38

Add void to argument list to make a strict prototype.

0596204

Remove file-wide multiprocessing import and rely on scoped import at …

b1ca9e4

…its use point.

Update docs/source/user/parallel.rst

ff2c105

Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>

Update docs/source/user/parallel.rst

b5ddb2b

Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>

Add tests to assert if set_parallel_chunksize with negative value.

cfee058

Move printing of schedule into the C++ code.

4a97029

Unify ExternalFunctionPointer with Function.

42c0121

Merge branch 'master' into autochunk

ce772c4

Merge branch 'master' into autochunk

a6b445f

Fix merge problems.

717cd3c

Fix flake8.

463e504

Fix flake8.

3a21a77

Always cast to int64 so that printf %ld works.

c0f6699

Move the printing of the parfor schedule into the C code in debug mod…

5d7a818

…e. Use the dynamic thread count when constructing the schedule so that the parallel=True function can be correctly cacheable.

Non-working commit of trying to get dynamic thread count for reductio…

7cb504b

…n array allocation.

Remove debug print.

c2a2360

Make parfor reductions use dynamic thread count so that caching works.

8d37bac

Remove unneeded import.

9a2a0eb

Don't get builder from pfbdr.

694e4d5

DrTodd13 added the 4 - Waiting on reviewer Waiting for reviewer to respond to author label Jun 22, 2022

stuartarchibald reviewed Jun 23, 2022

View reviewed changes

Refactor parfor cache tests to make use of existing code.

be6853a

As title.

stuartarchibald mentioned this pull request Jun 23, 2022

Testing #7625 #8186

Closed

Merge pull request #72 from stuartarchibald/cont/7625_2

dddd61c

Refactor parfor cache tests to make use of existing code.

stuartarchibald reviewed Jun 23, 2022

View reviewed changes

numba/core/types/functions.py Outdated Show resolved Hide resolved

Update numba/core/types/functions.py

6cdfaf2

Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>

Update docs/source/developer/threading_implementation.rst

cd7b70e

Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>

Switch parfor_cache_usecases to use NumPy allclose instead of precise…

edf323f

…ly equal check for output of summations.

stuartarchibald approved these changes Jun 24, 2022

View reviewed changes

stuartarchibald added 4 - Waiting on CI Review etc done, waiting for CI to finish and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Jun 24, 2022

sklam merged commit 2236cd2 into numba:main Jun 24, 2022

stuartarchibald mentioned this pull request Jul 25, 2022

Seemingly random segfault on macOS if function is in larger library #5890

Closed

1 task

stuartarchibald mentioned this pull request Sep 19, 2022

UMAP Segmentation Faults lmcinnes/umap#747

Open

stuartarchibald mentioned this pull request Jan 3, 2023

Segfault with caching and parallel=True. #7518

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combined parfor chunking and caching PRs. #7625

Combined parfor chunking and caching PRs. #7625

DrTodd13 commented Dec 6, 2021 •

edited by stuartarchibald

sklam commented Jun 22, 2022

stuartarchibald left a comment

stuartarchibald Jun 23, 2022

stuartarchibald Jun 23, 2022

stuartarchibald Jun 23, 2022

stuartarchibald Jun 23, 2022

stuartarchibald Jun 23, 2022

stuartarchibald Jun 23, 2022

stuartarchibald Jun 23, 2022

stuartarchibald commented Jun 23, 2022

stuartarchibald commented Jun 23, 2022

gmarkall commented Jun 23, 2022

sklam commented Jun 23, 2022

sklam commented Jun 23, 2022 •

edited

stuartarchibald commented Jun 24, 2022

stuartarchibald left a comment

sklam commented Jun 24, 2022

gmarkall commented Jun 26, 2022

		gufunc_txt += " " + param_dict[var] + \
		"=" + param_dict[arr] + "[" + gufunc_thread_id_var + "]\n"

Combined parfor chunking and caching PRs. #7625

Combined parfor chunking and caching PRs. #7625

Conversation

DrTodd13 commented Dec 6, 2021 • edited by stuartarchibald

sklam commented Jun 22, 2022

stuartarchibald left a comment

Choose a reason for hiding this comment

stuartarchibald Jun 23, 2022

Choose a reason for hiding this comment

stuartarchibald Jun 23, 2022

Choose a reason for hiding this comment

stuartarchibald Jun 23, 2022

Choose a reason for hiding this comment

stuartarchibald Jun 23, 2022

Choose a reason for hiding this comment

stuartarchibald Jun 23, 2022

Choose a reason for hiding this comment

stuartarchibald Jun 23, 2022

Choose a reason for hiding this comment

stuartarchibald Jun 23, 2022

Choose a reason for hiding this comment

stuartarchibald commented Jun 23, 2022

stuartarchibald commented Jun 23, 2022

gmarkall commented Jun 23, 2022

sklam commented Jun 23, 2022

sklam commented Jun 23, 2022 • edited

stuartarchibald commented Jun 24, 2022

stuartarchibald left a comment

Choose a reason for hiding this comment

sklam commented Jun 24, 2022

gmarkall commented Jun 26, 2022

DrTodd13 commented Dec 6, 2021 •

edited by stuartarchibald

sklam commented Jun 23, 2022 •

edited