Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Broadcast with array of array byte streams #10766

Merged
merged 7 commits into from
Sep 28, 2021

Conversation

johnc1231
Copy link
Contributor

I ran into issues when broadcasting a very large struct of ndarrays for a huge linear regression, where the the total size was more than MAX_INT bytes. To solve this, I've changed SerializableRegionValue to use ArrayOfByteArrayOutputStream and ArrayOfByteInputStream, which create nested arrays of bytes instead of just one array, removing any maximum length issues.

PRing now for tests, also running benchmarks.

@tpoterba
Copy link
Contributor

tpoterba commented Aug 9, 2021

Woah, so fast!

I think this is going to have poor performance until you implement the batched readers writers that look like

    public void write(byte b[], int off, int len) throws IOException
    public int read(byte b[], int off, int len) throws IOException

@johnc1231
Copy link
Contributor Author

I agree that would be faster, but do we use those when we encode / decode broadcasted data? We should be if we aren't, but since it's working currently my suspicion is we aren't using them? Unless there's a default implementation where they work in terms of the single byte versions?

@johnc1231
Copy link
Contributor Author

The gist suggests if anything things got slightly faster after doing this, though admittedly the sentinels all got faster too.: https://gist.github.com/johnc1231/cb10f22dad676bb30da680b9e2178614

@tpoterba
Copy link
Contributor

This is really baffling. Maybe this code isn't really used in benchmarks in local mode? I would expect that every write/read from these buffers for SerializableRegionValue would go through the array/buffer read calls, because we use StreamBlockBuffer for the buffer spec:

  val wireSpec: BufferSpec = LEB128BufferSpec(
    BlockingBufferSpec(32 * 1024,
      LZ4SizeBasedBlockBufferSpec("fast", 32 * 1024,
        256,
        new StreamBlockBufferSpec)))

We won't call write/read on the output/input buffers except to write/read blocks.

@johnc1231
Copy link
Contributor Author

johnc1231 commented Aug 10, 2021

Maybe it's because OutputStream.write has a default implementation in Java?

    public void write(byte b[], int off, int len) throws IOException {
        if (b == null) {
            throw new NullPointerException();
        } else if ((off < 0) || (off > b.length) || (len < 0) ||
                   ((off + len) > b.length) || ((off + len) < 0)) {
            throw new IndexOutOfBoundsException();
        } else if (len == 0) {
            return;
        }
        for (int i = 0 ; i < len ; i++) {
            write(b[off + i]);
        }
    }

So that comes baked in using the single byte write method

@tpoterba
Copy link
Contributor

Maybe it's because OutputStream.write has a default implementation in Java?

Yeah, that default implementation should be WAY slower than an implementation that uses System.arrayCopy though. That's why I'm surprised you don't see performance regressions.

@johnc1231 johnc1231 marked this pull request as ready for review August 12, 2021 19:51
@johnc1231 johnc1231 added the WIP label Aug 12, 2021
@johnc1231 johnc1231 force-pushed the johnc-array-of-array-byte-streams branch 2 times, most recently from d5b52d8 to 87f60ac Compare September 3, 2021 20:03
@johnc1231
Copy link
Contributor Author

Ok, wrote the multi byte versions, running benchmarks.

@johnc1231
Copy link
Contributor Author

johnc1231 commented Sep 14, 2021

https://gist.github.com/johnc1231/7667f228d022c636d42924dbd181bc96

Seems like not a lot of change, though it's definitely trending slower instead of trending faster the way the earlier one did.

Copy link
Contributor

@tpoterba tpoterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment and ready to go

buf += new ByteArrayOutputStream(initialBufferCapacity)

protected var bytesInCurrentArray = 0
protected var currentArray = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of having a currentArray index, can we just have a currentBuilder: ByteArrayOutputStream? I think having to do a lookup by index in the BoxedArrayBuilder in buffer might be a performance hit, and the code will probably be simpler regardless.

@johnc1231 johnc1231 force-pushed the johnc-array-of-array-byte-streams branch from 87f60ac to d35e646 Compare September 28, 2021 16:14
@johnc1231 johnc1231 removed the WIP label Sep 28, 2021
@johnc1231
Copy link
Contributor Author

Addressed, want me to benchmark again?

@tpoterba
Copy link
Contributor

nah, ship it.

@danking danking merged commit 8cb5fb7 into hail-is:main Sep 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants