Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise parsing incoming payload #683

Merged
merged 2 commits into from
Dec 2, 2019
Merged

Conversation

travikk
Copy link
Contributor

@travikk travikk commented Nov 19, 2019

It seems redundant to copy the whole byteSource before parsing it.

In our case, with format: binary and the payload size of 50mb this creates a lot of overhead. With our tests, that slice causes extra ~10s of processing time for a 50mb payload.

I’m keen to hear your opinions about this change.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 19, 2019

CLA Check
The committers are authorized under a signed CLA.

@stanley-cheung
Copy link
Collaborator

stanley-cheung commented Nov 19, 2019

There's an error when running blaze test javascript/net/grpc/web/...

ERROR: /tmpfs/src/github/grpc-web/javascript/net/grpc/web/BUILD.bazel:152:1: Compiling 128 JavaScript files to javascript/net/grpc/web/grpcwebclientbase_test_bin.js failed (Exit 1)
javascript/net/grpc/web/grpcwebclientreadablestream.js:140: ERROR - actual parameter 1 of module$contents$grpc$web$GrpcWebStreamParser_GrpcWebStreamParser.prototype.parse does not match formal parameter
found   : Uint8Array
required: (Array<number>|ArrayBuffer|string)
    var messages = self.parser_.parse(byteSource);
                                      ^
  ProTip: "JSC_TYPE_MISMATCH" or "checkTypes" can be added to the `suppress` attribute of:
  //javascript/net/grpc/web:grpcwebclientreadablestream
  Alternatively /** @suppress {checkTypes} */ can be added to the source file.

1 error(s), 0 warning(s), 97.3% typed

You probably need a type cast there.

@stanley-cheung stanley-cheung merged commit 5c65dc4 into grpc:master Dec 2, 2019
@travikk
Copy link
Contributor Author

travikk commented Dec 4, 2019

@stanley-cheung brilliant, thanks for your time on this! Do you think you guys could make a release any time soon which would include this fix?

@stanley-cheung
Copy link
Collaborator

@travikk Actually there are some test breakage after incorporating this change - I am still trying to figure out what's wrong (the tests were internal so it wasn't discovered where the kokoro tests were run for this PR). So there are still some work to do here. Will keep you posted.

@stanley-cheung
Copy link
Collaborator

Some partial findings on the test failure:

  • When we switch to passing byteSource.buffer into the stream parser (instead of passing a copied Array version of byteSource, which is a Uint8Array), I am getting new flakey test failure (i.e. fail about 5-15 times per 20 runs).

  • After a lot of debugging, I found that, the byteLength property of byteSource.buffer does not actually match byteSource.byteLength!

  • So byteSource is a Uint8Array. byteSource.buffer is the underlying backing ArrayBuffer. And in this particular test case, the byteSource comes from this line, which comes from googCrypt.decodeStringToUint8Array.

  • Looking into googCrypt.decodeStringToUint8Array source code, I realized, they first allocate the underlying ArrayBuffer with an approxByteLength (https://github.com/google/closure-library/blob/master/closure/goog/crypt/base64.js#L342). This approxByteLength can be larger than the actual size of the data. When the actual size is known (i.e. the variable outLen), a subarray is being returned.

  • So the problem there is that, the returned Uint8Array will have a length of outLen, but the underlying backing ArrayBuffer will still have a byteLength of approxByteLength. In our parser's parse function, if the passed in input is an ArrayBuffer we create a Uint8Array view of it and that Uint8Array will have a length now of approxByteLength, which is not good. The parser will be reading past meaningful data. (Hence the flake, because sometimes we got lucky and the extra bytes were harmless, but sometimes it will cause the parser to hang)

So actually I am not sure what the solution here should be. The previous solution works because [].slice.call(byteSource) will copy exactly outLen (i.e the correct) number of bytes into the copied Array. But we can't really just use byteSource.buffer because that backing ArrayBuffer may actually be longer.

@stanley-cheung
Copy link
Collaborator

One potential is to change the GrpcWebStreamParser.parse function to accept Uint8Array instead of ArrayBuffer. That may be a safer path going forward anyways.

@travikk
Copy link
Contributor Author

travikk commented Dec 5, 2019

One potential is to change the GrpcWebStreamParser.parse function to accept Uint8Array instead of ArrayBuffer. That may be a safer path going forward anyways.

Yeah I was thinking about doing this, because then you could just pass the byteSource into parse function. Reason why I didn't do it was I wasn't exactly sure how to change the type annotation of the parse function.

@stanley-cheung
Copy link
Collaborator

Slight bummer: we are implementing this StreamParser interface, so it wasn't easy to change that.

@stanley-cheung
Copy link
Collaborator

In any case, I will probably have to revert this and figure out a different solution all together. Sorry about that.

@travikk
Copy link
Contributor Author

travikk commented Dec 5, 2019

I see. So changing the StreamParser interface to accept UInt8Array as well is not feasable?

@stanley-cheung
Copy link
Collaborator

That involves submitting a change to the google closure library itself, which has another separate process and hurdles. Not impossible, but I will have to pursue this from the inside :)

@travikk
Copy link
Contributor Author

travikk commented Dec 5, 2019

Thanks, I appreciate the effort!

stanley-cheung added a commit to stanley-cheung/grpc-web that referenced this pull request Dec 5, 2019
@stanley-cheung stanley-cheung mentioned this pull request Dec 5, 2019
stanley-cheung added a commit that referenced this pull request Dec 5, 2019
@travikk
Copy link
Contributor Author

travikk commented Jan 9, 2020

@stanley-cheung any update on this that you can report? Perhaps we should create an issue out of this PR since the work isn't done yet?

@shicks
Copy link

shicks commented Feb 19, 2020

The type signature on StreamParser.parse is basically a lie, and I'd rather not change it further (though we can if necessary). Doing so would break all downstream implementations.

That said, what is the problem with widening the parameter type in your GrpcWebStreamParser? It's allowed to widen a parameter type in a subclass (parameter types are contravariant, so this is actually narrowing the type of the function). The only concern would be if you're calling the parse() method using a reference typed as StreamParser instead of typed as your concrete subclass. But it looks like that's not what's happening here. @stanley-cheung have you tried simply widening the parameter to see if it breaks anything on your end?

@travikk
Copy link
Contributor Author

travikk commented Feb 20, 2020

To this end, I raised another PR @stanley-cheung: #734

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants