-
Notifications
You must be signed in to change notification settings - Fork 14
Sanitizers do not work with JNI #34
Comments
I started with investigation how ASAN worked for omniscidb |
@alexbaden do you have smth in mind already? |
There are link errors when linking ArrowBasedExecuteTest with ASAN, namely _start absence in Scrt.o; comparing the linking command line to the one which works |
@ienkovich and I suspect JNI will be the biggest problem getting ASAN to work with HDK. So, our idea was to drop Calcite out of the unit tests when compiling with ASAN or TSAN on. We will intercept the payload from Calcite (optimized RA tree), write it to disk, then run the ASAN binary using that payload (bypassing SQL parse and Calcite optimization). A first PR to support this was already merged: https://github.com/intel-ai/omniscidb/pull/430 |
Right. So far it is supported only in ArrowBasedExecuteTest. You can run it with |
There are some old reports that ASAN works with JNI with instrumentation, I wonder if we can get something out of it. google/sanitizers#271 |
With ArrowBasedExecuteTest related tests excluded and ASAN_OPTIONS="detect_leaks=0" most other tests pass. I can commit this to CI if no one objects to the exclusion. |
What errors are you getting with detect leaks on? Only limited to JNI? We might try a suppression in that case. |
Most of leaks are reported in libjvm.so and originate from standard jni memory allocation routines; there is a report that jvm can be rebuilt in a way that the leaks are shown correctly. |
Did you try suppressing all of libjvm? https://clang.llvm.org/docs/AddressSanitizer.html#suppressing-reports-in-external-libraries |
thanks for the link, will try |
Tried suppressions feature, it did not work for me yet. Also tried LSAN_OPTIONS Looking further |
We can avoid JNI by using cached Calcite responses. So far it is supported only in ArrowBasedExecuteTest. You can run it with |
I have tried both options, total Calcite parsing time decreased from 28263 ms to 3753 ms; seems working |
ASAN aborted after the first testcase of
|
How to reproduce:
|
I've rerun the tests with ASAN_OPTIONS="detect_leaks=0" and ArrowBasedExecuteTest excluded. A new memory problem was found in UdfTest.
|
The error in |
Here is a generic test run log |
Ok, thanks! I have an idea of where the first issue is coming from, but fixing it opened up some additional issues. Will look at the UDF test issue afterwards. |
TSAN run results |
ASAN should be fixed w/ https://github.com/intel-ai/omniscidb/commit/676b290418968a1d4a45b5d54cd5666718ff849e |
@alexbaden thanks, now all test cases of
|
The guys from TBB blame |
@aregm said tbb had improved, so maybe we need to update. We could also add a suppression: https://github.com/intel-ai/omniscidb/blob/jit-engine/config/asan.suppressions for TSAN I would prefer not to do that, but for ASAN it's probably ok. |
With exception to TBB and JNI it seems there are no ASAN issues in most tests. The following tests have JNI issues, @ienkovich maybe Ilya's option is needed here:
|
Let's split ASAN into two remaining tasks:
|
|
I've heard there are plans to merge |
There are plans to do that, but I don't know that it will happen in the next few weeks. |
Here is a PR which is required to add tests to HDK build, https://github.com/intel-ai/omniscidb/pull/581 |
When running tests in HDK, I get multiple problems, see the log below. Some of them are related to function |
The implementation of |
The following patch seems to fix the issue
|
The next problem is at
I need to correct the class path somehow Upd. This patch fixes the class path
|
The problem with paths is moved to #120 |
It was good @Garra1980 suggested to try launching ASAN with omniscidb. Another problem manifested itself - Github virtual machines do not have enough virtual memory for ASAN to start due to their virtualization settings.
|
I have tried a small test and it worked; is it possible to scale a data size via option for https://github.com/leshikus/ghtest/actions/runs/3585577524/jobs/6033734506 |
ASAN works now |
Needs investigation.
The text was updated successfully, but these errors were encountered: