New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scylla appears to have crashed in test test_wasm.py::test_fib_called_on_null (on aarch64) #9387
Comments
Cannot decode the backtrace since the build failed and the reloc packages weren't uploaded. |
@bhalevy can the Jenkins script be changed to upload the packages if a coredump occurs, so that next time it will be possible to decode? Without a decoded backtrace, and given that the crash happend on aarch, pinpointing this issue would be quite hard. |
@psarna I suggest you open an issue on scylla-pkg. I think it should be possible. |
@psarna you have access to an aarch64 machine. It's a lot better to reproduce, fix, and test locally than to incur continuous integration latency. |
by default we delete the workspace after the build finished. i am running now https://jenkins.scylladb.com/job/scylla-master/job/build/765/ with |
@psarna @bhalevy https://github.com/scylladb/scylla-pkg/pull/2416 - this is the fix So it appears that we do upload the reloc to S3 sright after we finish building Scylla. we just didn't publish the metadata file at the same stage. This PR is fixing it |
@psarna @bhalevy it looks like https://jenkins.scylladb.com/view/master/job/scylla-master/job/build/760/artifact/testlog/aarch64_release/run.9.log was removed from Jenkins due to our history limitation. But i think that https://jenkins.scylladb.com/view/master/job/scylla-master/job/build/764/ is the same (also marked this build as Let me know if you need anymore information |
@yaronkaikov the link to the relocatable package at downloads.scylladb.com gives me a |
In any case, I proceeded with working directly on the aarch64 machine and managed to reproduce the issue after running wasm tests in a tight loop for a few minutes. The backtrace actually varies, but here's one of them:
I'm worried by the fact that the problem arises from |
... and the reason why it's the |
Another error that sometimes happen is an "OOM" during wasm compilation triggering an abort(), while the memory used by the Scylla binary is nowhere near oom conditions...
|
According to http://docs.wasmtime.dev/api/src/wasmtime_runtime/traphandlers/unix.rs.html, wasmtime traps are implemented with a |
Meanwhile, I ran the same test in a loop on my x86_64 - it kept passing for 40+ minutes |
There seems to be a problem with libwasmtime.a dependency on aarch64, causing occasional segfaults during tests - specifically, tests which exercise the path for halting wasm execution due to fuel exhaustion. As a temporary measure, wasm is disabled on this architecture to unblock the flow. Refs scylladb#9387
There seems to be a problem with libwasmtime.a dependency on aarch64, causing occasional segfaults during tests - specifically, tests which exercise the path for halting wasm execution due to fuel exhaustion. As a temporary measure, wasm is disabled on this architecture to unblock the flow. Refs scylladb#9387
There seems to be a problem with libwasmtime.a dependency on aarch64, causing occasional segfaults during tests - specifically, tests which exercise the path for halting wasm execution due to fuel exhaustion. As a temporary measure, wasm is disabled on this architecture to unblock the flow. Refs #9387 Closes #9414
This series adds the implementation and usage of rust wasmtime bindings. The WASM UDFs introduced by this patch are interruptable and use memory allocated using the seastar allocator. This series includes #11102 (the first two commits) because #11102 required disabling wasm UDFs completely. This patch disables them in the middle of the series, and enables them again at the end. After this patch, `libwasmtime.a` can be removed from the toolchain. This patch also removes the workaround for ##9387 but it hasn't been tested with ARM yet - if the ARM test causes issues I'll revert this part of the change. Closes #11351 * github.com:scylladb/scylladb: build: remove references to unused c bindings of wasmtime test: assert that WASM allocations can fail without crashing wasm: limit memory allocated using mmap wasm: add configuration options for instance cache and udf execution test: check that wasmtime functions yield wasm: use the new rust bindings of wasmtime rust: add Wasmtime bindings rust: add build profiles more aligned with ninja modes rust: adjust build according to cxxbridge's recommendations tools: toolchain: dbuild: prepare for sharing cargo cache
Appears to have been addressed by 02c9968 |
Seen in https://jenkins.scylladb.com/view/master/job/scylla-master/job/build/760/artifact/testlog/aarch64_release/run.9.log
The text was updated successfully, but these errors were encountered: