Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrayBuilder: replace shared with unique #1155

Merged
merged 20 commits into from Dec 21, 2021

Conversation

ianna
Copy link
Collaborator

@ianna ianna commented Nov 17, 2021

No description provided.

@ianna ianna marked this pull request as draft November 17, 2021 11:27
@codecov
Copy link

codecov bot commented Nov 17, 2021

Codecov Report

Merging #1155 (bce33ab) into main (9955401) will increase coverage by 1.53%.
The diff coverage is 83.23%.

Impacted Files Coverage Δ
src/awkward/_v2/_connect/pyarrow.py 83.33% <0.00%> (-0.48%) ⬇️
src/awkward/_v2/_reducers.py 96.48% <ø> (ø)
src/awkward/_v2/forms/recordform.py 66.46% <0.00%> (ø)
src/awkward/_v2/operations/convert/ak_to_numpy.py 100.00% <ø> (ø)
...c/awkward/_v2/operations/structure/ak_unflatten.py 80.00% <ø> (ø)
src/awkward/_v2/types/arraytype.py 80.76% <0.00%> (-0.72%) ⬇️
src/awkward/_v2/contents/unmaskedarray.py 56.27% <51.72%> (+0.82%) ⬆️
src/awkward/_v2/contents/indexedarray.py 59.75% <63.23%> (+0.44%) ⬆️
src/awkward/_v2/_typetracer.py 69.19% <63.58%> (-0.36%) ⬇️
src/awkward/_v2/contents/emptyarray.py 69.54% <74.19%> (+0.26%) ⬆️
... and 31 more

@ianna ianna force-pushed the ianna/array-builder-to-use-unique-pointers branch from f66aafb to 6608898 Compare December 1, 2021 08:18
@ianna
Copy link
Collaborator Author

ianna commented Dec 2, 2021

Just for the record - as of the last commit: a reference C++ test is the same as in #690 (without a snapshot call). It takes 235.00 ms compared with 16.97 s reported previously.
Screenshot 2021-12-02 at 17 45 33

@jpivarski
Copy link
Member

Just for the record - as of the last commit: a reference C++ test is the same as in #690 (without a snapshot call). It takes 235.00 ms compared with 16.97 s reported previously.

Wait, what? Replacing std::shared_ptr with std::unique_ptr is a 72× speedup? On what test?

I was remembering something much smaller, like a 50% improvement or something like that.

@ianna
Copy link
Collaborator Author

ianna commented Dec 2, 2021

Just for the record - as of the last commit: a reference C++ test is the same as in #690 (without a snapshot call). It takes 235.00 ms compared with 16.97 s reported previously.

Wait, what? Replacing std::shared_ptr with std::unique_ptr is a 72× speedup? On what test?

I was remembering something much smaller, like a 50% improvement or something like that.

There were two changes so far: removing temporary objects and moving shared pointers instead of copying them. The std::shared_ptr reference count is atomic, moving it is not and is hundred times faster than copying it. Besides, we were incrementing/decrementing the same pointer. I will need to run more tests and do further cleanup, but it looks like we may not need to go much further in eliminating the shared pointers :-)

Here is the test I used:

  ak::ArrayBuilder myarray(ak::ArrayBuilderOptions(1024, 2.0));

  for (int64_t i = 0; i < 10000000; i++) {

    // populate builder with lists
    myarray.beginrecord();
    myarray.field_check("one");
    myarray.boolean(true);
    myarray.field_check("two");
    myarray.integer(1);
    myarray.field_check("three");
    myarray.real(1.1);
    myarray.endrecord();

    myarray.beginrecord();
    myarray.field_check("one");
    myarray.boolean(false);
    myarray.field_check("two");
    myarray.integer(2);
    myarray.field_check("three");
    myarray.real(2.2);
    myarray.endrecord();

    myarray.beginrecord();
    myarray.field_check("one");
    myarray.boolean(true);
    myarray.field_check("two");
    myarray.integer(3);
    myarray.field_check("three");
    myarray.real(3.3);
    myarray.endrecord();
  }

@jpivarski
Copy link
Member

That's jaw-dropping.

I knew there was a difference, but if I had known it was such a big difference, we would have done this much earlier. Fortunately, though, there will be a version 1.7.0 out soon that we can use as a baseline for studies. If that was the bottleneck (and it's so large that it must be), then my specialized JSON thing (#1165) is probably unnecessary—discovering the type dynamically will probably be as fast as knowing the type ahead of time (because I wrongly thought that it was the type-querying that was slowing it down, rather than the std::shared_ptr). We'll see.

For reasons of keeping things module, the ForthOutputBuffer doesn't use ArrayBuilder's GrowableBuffer. It has its own std::shared_ptr<void> pointers:

https://github.com/scikit-hep/awkward-1.0/blob/ae58a75a9aa5cb7106d58a22ac7bd504657a74d2/include/awkward/forth/ForthOutputBuffer.h#L352

and the resize update is here:

https://github.com/scikit-hep/awkward-1.0/blob/ae58a75a9aa5cb7106d58a22ac7bd504657a74d2/src/libawkward/forth/ForthOutputBuffer.cpp#L748-L749

The LayoutBuilder, AwkwardForth, and SpecializedJSON won't see speedups until ForthOutputBuffer is updated in the same way as GrowableBuffer.

Actually, in this PR's diff, I don't see any changes to GrowableBuffer's std::shared_ptr<void>. What you've updated is the BuilderPtr, which is definitely related to the type-discovery overhead (an unnecessary overhead, as it turns out), not the fact that snapshots share data with builders. Wasn't there also a gain to be had from replacing the pointers to the data themselves with std::unique_ptr? That's why we have to change the snapshot logic to copy, rather than view—to allow the builder's buffers to be unshared pointers. Was I wrong in thinking that that was important? (That's what I thought was the ~50% gain, from a study you did about a year ago.)

@ianna
Copy link
Collaborator Author

ianna commented Dec 3, 2021

I think, it a bit premature to draw conclusions. The next commit drops shared pointer copies for the field_check and its relative execution drops from 27.2% to 7.3%

@ianna
Copy link
Collaborator Author

ianna commented Dec 9, 2021

@jpivarski - it looks like the failure is unrelated to this PR - the test fails in a clean area:

======================================================================== FAILURES ========================================================================
_________________________________________________________________ test_numpyarray_grad_3 _________________________________________________________________

    def test_numpyarray_grad_3():
        def func_numpyarray_3(x):
            return x[::-1]
    
        value_jvp, jvp_grad = jax.jvp(
            func_numpyarray_3, (test_numpyarray_jax,), (test_numpyarray_tangent_jax,)
        )
        jit_value = jax.jit(func_numpyarray_3)(test_numpyarray)
        value_vjp, vjp_func = jax.vjp(func_numpyarray_3, test_numpyarray_jax)
    
>       assert ak.to_list(value_jvp) == [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

tests/test_0793-jax-element-wise-ops.py:73: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
awkward/operations/convert.py:1020: in to_list
    return [to_list(x) for x in array]
awkward/operations/convert.py:1020: in <listcomp>
    return [to_list(x) for x in array]
awkward/operations/convert.py:1020: in to_list
    return [to_list(x) for x in array]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = DeviceArray(9., dtype=float64)

    def __iter__(self):
      if self.ndim == 0:
>       raise TypeError("iteration over a 0-d array")  # same as numpy error
E       TypeError: iteration over a 0-d array

../../../anaconda3/lib/python3.8/site-packages/jax/_src/device_array.py:245: TypeError
============================================================== 1 failed, 17 passed in 4.05s ==============================================================

@jpivarski
Copy link
Member

@jpivarski - it looks like the failure is unrelated to this PR

Yes, I think something happened in a new JAX version. @ctrl-stormy will be replacing the JAX implementation next spring—it will be based on a different strategy (bottom-up, rather than top-down)—so I think we can disable these tests. I've already done that in #1183, though I didn't get it cleanly in any one commit.

If you add

pytestmark = pytest.mark.skip(
    reason="Top-down JAX tests disabled; to be replaced by bottom-up."
)

to tests/test_0645-from-jax.py, tests/test_0645-jax-refcount.py, tests/test_0645-to-jax.py, and tests/test_0793-jax-element-wise-ops.py, immediately after the imports and before the jax = ... lines (as in 5b0a273), then it will merge cleanly with the modification I made to do the same.

I don't know if these tests will be reinstated as-is when the JAX bottom-up implementation is done or if that implementation will have different tests, but this will at least turn them off for now without losing memory that they're a to-do item. Before releasing Awkward 2.0, we'll sweep through all the skipped tests, reinstating as many as we can, and we'll be reminded of this one at that time.

@ianna
Copy link
Collaborator Author

ianna commented Dec 10, 2021

As it is now, GrowableBuffer uses unique_ptr instead of shared_ptr. Builders return nullptr instead of shared_from_this in some cases - the overall time went down from 17.25 s to 10.91 s10.46 s (in the next commit), but the needed extra check did introduce some cost and I'm working on it.

The test run in a clean area on the left, the same test with this PR on the right.

Screenshot 2021-12-10 at 16 34 36

@jpivarski
Copy link
Member

Okay, it's significant (factor of 2), but not gigantic (factor of 10). In that case, it's still valuable to write specialized readers for various formats (like #1165: in its sample task, this shared_ptrunique_ptr improvement would take 40 seconds to 20 seconds (estimated), and the specialized JSON with shared_ptr takes it to 10 seconds—removing shared_ptr there might not make as much of a difference because I don't think we're incrementing/decrementing those pointers (worth trying anyway), and RapidJSON parsing by itself can't get any faster than 6 seconds).

More importantly, we know why: the atomic increment/decrement (shared_ptr) or null check (unique_ptr) on every step is the speedbump, but it's a necessary speedbump because it has to check to see if the tree is changing. With specialized readers, the set of buffers/tree does not change. The specialized JSON reader has to check JSON format at every step to see if it needs to complain about wrong formats (despite the disclaimer, it does some checking), but binary formats don't even do that—they just assume the spec and go. In that case, all we have as a bottleneck is Forth VM overhead, which is minimized by reducing the number of instructions ("~5 ns/instruction").

@ianna ianna force-pushed the ianna/array-builder-to-use-unique-pointers branch from 7522a74 to b39300c Compare December 20, 2021 15:09
@ianna
Copy link
Collaborator Author

ianna commented Dec 20, 2021

Hmm... this is strange (MacOS and py310):

================= 1770 passed, 68 skipped, 1 warning in 53.38s =================
Fatal Python error: Segmentation fault

Current thread 0x000000011185edc0 (most recent call first):
  Garbage-collecting
  <no Python frame>
/Users/runner/work/_temp/3e675f7f-b1be-4603-bf91-708afe92f1cf.sh: line 1: 11033 Segmentation fault: 11  python -m pytest -vv -rs tests
##[error]Bash exited with code '139'.
Finishing: Test

@ianna ianna marked this pull request as ready for review December 21, 2021 09:13
@ianna ianna requested a review from jpivarski December 21, 2021 09:13
@ianna ianna merged commit 6cdbdc4 into main Dec 21, 2021
@ianna ianna deleted the ianna/array-builder-to-use-unique-pointers branch December 21, 2021 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants