Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RE: upload cancelled with "stream error: stream no longer needed" #563

Closed
avdv opened this issue Feb 8, 2024 · 5 comments
Closed

RE: upload cancelled with "stream error: stream no longer needed" #563

avdv opened this issue Feb 8, 2024 · 5 comments

Comments

@avdv
Copy link
Contributor

avdv commented Feb 8, 2024

For the following BUCK file:

# frontend/BUCK

filegroup(
    name = "assets",
    srcs = glob([
        "public/**/*",
        "src/**/*.css",
        "src/**/*.svg",
        "src/**/*.mp4",
        "src/**/*.wav",
        "src/**/*.png",
        "src/**/*.jpg",
        "src/**/*.gif",
        "src/webfonts/**/*",
    ]),
)

genrule(
    name = "assets_pack",
    srcs = [":assets"],
    out = "assets.tar",
    bash = """tar -cf $OUT -C $(location :assets) .""",
)

when using remote execution (we are currently using the bazel-remote-worker locally) we see the following error:

Action failed: root//frontend:assets_pack (genrule)
Internal error (stage: remote_upload_error): Remote Execution Error (GRPC-SESSION-ID): RE: upload: status: Cancelled, message: "h2 protocol error: http2 error: stream error received: stream no longer needed", details: [], metadata: MetadataMap { headers: {} }: transport error: http2 error: stream error received: stream no longer needed: stream error received: stream no longer needed
stdout:
stderr:
Build ID: ab46755d-0762-43ef-bf04-0c44d8a0d44a
Network: (GRPC-SESSION-ID)
Jobs completed: 49. Time elapsed: 0.7s.
Cache hits: 0%. Commands: 2 (cached: 0, remote: 1, local: 1)
BUILD FAILED
Failed to build 'root//frontend:assets_pack (prelude//platforms:default#524f8da68ea2a374)'

This seems to be related to the structure of the srcs of the filegroup since the public folder contains symlinks into the src folder:

$ ls -lh public/icons public/webfonts 
lrwxrwxrwx 1 claudio users 17 Feb  8 09:54 public/icons -> ../src/img/icons/
lrwxrwxrwx 1 claudio users 16 Jan  9 09:14 public/webfonts -> ../src/webfonts/

After removing either public/icons or public/webfonts, the upload succeeds. And once it succeeded, it also succeeds when the symlinks are restored:

λ buck2 build frontend:assets_pack
Action failed: root//frontend:assets_pack (genrule)
Internal error (stage: remote_upload_error): Remote Execution Error (GRPC-SESSION-ID): RE: upload: status: Cancelled, message: "h2 protocol error: http2 error: stream error received: stream no longer needed", details: [], metadata: MetadataMap { headers: {} }: transport error: http2 error: stream error received: stream no longer needed: stream error received: stream no longer needed
stdout:
stderr:
Build ID: beedf529-fa52-4f0b-b8e4-537c41bff186
Network: (GRPC-SESSION-ID)
Jobs completed: 3. Time elapsed: 0.0s.
Cache hits: 0%. Commands: 1 (cached: 0, remote: 1, local: 0)
BUILD FAILED
Failed to build 'root//frontend:assets_pack (prelude//platforms:default#524f8da68ea2a374)'

λ rm frontend/public/icons

λ buck2 build frontend:assets_pack
File changed: root//tmp/work/upload/9328a678-7e97-4f98-b572-dfee848bb396
File changed: root//tmp/work/upload/3b446ffe-23ae-4c2f-8801-517b05c8079e
File changed: root//tmp/cas/7c7d23fb-0fed-41da-990f-08cc16686099
42 additional file change events
Build ID: a8233c76-c3f7-408f-8265-908114ccec49
Network: (GRPC-SESSION-ID)
Jobs completed: 12. Time elapsed: 3.8s.
Cache hits: 0%. Commands: 1 (cached: 0, remote: 1, local: 0)
BUILD SUCCEEDED

λ git restore frontend/public/icons 

λ buck2 build frontend:assets_pack
File changed: root//tmp/cas/1d2c4833-5d3a-4c32-b2e6-db13d3bce4e4
File changed: root//tmp/cas/cas/0a/0a735d55159999d4db9f3460b43d2e24e6116af22998f6f7aada76d7cfb36416
File changed: root//tmp/cas/e3f9ae6a-89be-40e1-8733-c48114f22217
1094 additional file change events
Build ID: 5b6ddadf-aa0a-41d5-a06a-d0044f8ff168
Network: (GRPC-SESSION-ID)
Jobs completed: 12. Time elapsed: 3.6s.
Cache hits: 0%. Commands: 1 (cached: 0, remote: 1, local: 0)
BUILD SUCCEEDED

Also, I noticed that the symlinks are not preserved in the buck-out/v2/gen/root/524f8da68ea2a374/frontend/__assest__ directory:

ls -lhd buck-out/v2/gen/root/524f8da68ea2a374/frontend/__assets__/assets/public/{webfonts,icons}
drwxr-xr-x 1 claudio users 1.2K Feb  8 10:46 buck-out/v2/gen/root/524f8da68ea2a374/frontend/__assets__/assets/public/icons
drwxr-xr-x 1 claudio users  718 Feb  8 10:46 buck-out/v2/gen/root/524f8da68ea2a374/frontend/__assets__/assets/public/webfonts

Is this to be expected?

BTW, I am using buck2 aa5cc9e36218b3afcad06608d91f9e8baa1d5c88e0b2a2f561b1b695a320afc7, the 2024-01-02 pre-release.

cc: @aherrmann

@JakobDegen
Copy link
Contributor

JakobDegen commented Feb 9, 2024

So I don't know if this is the cause of this specific error, but symlinks in sources are basically completely unsupported. They sometimes kind of work, but you're in basically untested ground here so I'm not surprised that something has broken.

We do support symlinks in outputs, depending on what exactly it is you're doing it's possible that that may offer a path to working around this limitation

@avdv
Copy link
Contributor Author

avdv commented Feb 9, 2024

Thanks for your quick response!

So I don't know if this is the cause of this specific error, but buck2 doesn't support symlinks in sources are basically completely unsupported. They sometimes kind of work, but you're in basically untested ground here so I'm not surprised that something has broken.

OK, fair enough. In this case it seems to indicate a problem with handling the http2 responses gracefully. I would guess that the upstream server "sees" that some files are already uploaded and replies with cancelling the stream, which just should be ignored perhaps?!

We do support symlinks in outputs, depending on what exactly it is you're doing it's possible that that may offer a path to working around this limitation

The same problem turns up when we use the output of yarn install as input to another action running remotely. My current workaround is to create a tarball inside a local action, and then explicitly unpack the tarball before doing the real work of the action.

@JakobDegen
Copy link
Contributor

The same problem turns up when we use the output of yarn install as input to another action running remotely. My current workaround is to create a tarball inside a local action, and then explicitly unpack the tarball before doing the real work of the action.

Oof, yeah this sounds very likely to be a bug. This probably requires figuring out exactly what the buck2-RE communication looks like and which of the two is out of spec (guessing that it's us is a good default). I think we probably have logging that can help with that?

@avdv
Copy link
Contributor Author

avdv commented Feb 29, 2024

I turned up logging for grpc calls in the bazel-remote-worker (it's explicitly disabled in code in order to avoid "Received DATA frame for an unknown stream 1521" error messages) and got this:

240229 08:43:39.770:WT 17 [io.grpc.netty.NettyServerStream$TransportState.deframeFailed] Exception processing message
io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: gRPC message exceeds maximum size 4194304: 4594008
        at io.grpc.Status.asRuntimeException(Status.java:526)
        at io.grpc.internal.MessageDeframer.processHeader(MessageDeframer.java:391)
        at io.grpc.internal.MessageDeframer.deliver(MessageDeframer.java:271)
        at io.grpc.internal.MessageDeframer.request(MessageDeframer.java:161)
        at io.grpc.internal.AbstractStream$TransportState$1RequestRunnable.run(AbstractStream.java:236)
        at io.grpc.netty.NettyServerStream$TransportState$1.run(NettyServerStream.java:202)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)

The maximum batch size is set to 4 * 1000 * 1000. I don't know why the message is so much larger than that limit. Maybe it's because symlinks are involved?

@avdv
Copy link
Contributor Author

avdv commented Mar 4, 2024

The maximum batch size is set to 4 * 1000 * 1000. I don't know why the message is so much larger than that limit. Maybe it's because symlinks are involved?

Oh, it's probably just because of the overhead. For each datum, there are at least 72 extra Bytes needed to transmit the hash and the compressor enum value. For one example request that I looked at, there were 7410 entries in one batch; which already adds up to 533520 Bytes. Plus a few Bytes needed for encoding every element of the requests field.

We have increased the max inbound message size for the bazel-remote-worker to an arbitrarily high number in order to workaround that issue and have not seen this error again.

Closing, since I think nothing is to be done here.

@avdv avdv closed this as completed Mar 4, 2024
@avdv avdv closed this as not planned Won't fix, can't repro, duplicate, stale Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants