Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server crashes when running tests on postgres built with ASAN #520

Closed
saygoodbyye opened this issue Apr 17, 2024 · 2 comments
Closed

Server crashes when running tests on postgres built with ASAN #520

saygoodbyye opened this issue Apr 17, 2024 · 2 comments

Comments

@saygoodbyye
Copy link

Hello! On postgres (REL_16_STABLE) built with following configure options:

CC="clang" CPPFLAGS="-Og -fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -fno-sanitize=nonnull-attribute -fstack-protector" \
LDFLAGS='-fsanitize=address -fsanitize=undefined' \
./configure --enable-tap-tests --enable-debug --enable-cassert --quiet --prefix="$PGPREFIX"

I get the server crash when trying to execute tests of pgvector (master) using the way bellow:

ASAN_OPTIONS=detect_leaks=0:abort_on_error=1:halt_on_error=1:disable_coredump=0:strict_string_checks=1:check_initialization_order=1:strict_init_order=1:detect_odr_violation=0 make installcheck

regression.out:

# +++ regress install-check in  +++
# using postmaster on Unix socket, default port
ok 1         - bit_functions                              12 ms
ok 2         - btree_halfvec                              21 ms
ok 3         - btree_sparsevec                            20 ms
ok 4         - btree_vector                               19 ms
ok 5         - cast                                       25 ms
ok 6         - copy                                       22 ms
ok 7         - halfvec_functions                          25 ms
ok 8         - halfvec_input                              12 ms
ok 9         - hnsw_bit_hamming                           22 ms
ok 10        - hnsw_bit_jaccard                           17 ms
ok 11        - hnsw_halfvec_cosine                        20 ms
ok 12        - hnsw_halfvec_ip                            19 ms
ok 13        - hnsw_halfvec_l2                            22 ms
ok 14        - hnsw_options                               16 ms
ok 15        - hnsw_sparsevec_cosine                      19 ms
ok 16        - hnsw_sparsevec_ip                          19 ms
ok 17        - hnsw_sparsevec_l2                          30 ms
ok 18        - hnsw_unlogged                              19 ms
ok 19        - hnsw_vector_cosine                         20 ms
ok 20        - hnsw_vector_ip                             20 ms
ok 21        - hnsw_vector_l2                             22 ms
not ok 22    - ivfflat_bit_hamming                       464 ms
# (test process exited with exit code 2)
not ok 23    - ivfflat_bit_jaccard                        38 ms
# (test process exited with exit code 2)
not ok 24    - ivfflat_halfvec_cosine                      7 ms
# (test process exited with exit code 2)
not ok 25    - ivfflat_halfvec_ip                          9 ms
# (test process exited with exit code 2)
not ok 26    - ivfflat_halfvec_l2                          8 ms
# (test process exited with exit code 2)
not ok 27    - ivfflat_options                             6 ms
# (test process exited with exit code 2)
not ok 28    - ivfflat_unlogged                            7 ms
# (test process exited with exit code 2)
not ok 29    - ivfflat_vector_cosine                       6 ms
# (test process exited with exit code 2)
not ok 30    - ivfflat_vector_ip                           6 ms
# (test process exited with exit code 2)
not ok 31    - ivfflat_vector_l2                           6 ms
# (test process exited with exit code 2)
not ok 32    - sparsevec_functions                         9 ms
# (test process exited with exit code 2)
not ok 33    - sparsevec_input                             7 ms
# (test process exited with exit code 2)
not ok 34    - vector_functions                            7 ms
# (test process exited with exit code 2)
ok 35        - vector_input                               21 ms
1..35
# 13 of 35 tests failed.

backtrace:

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140640746571712) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140640746571712) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140640746571712, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007fe97ba69476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007fe97ba4f7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000563aeff36807 in __sanitizer::Abort() ()
#6  0x0000563aeff346a1 in __sanitizer::Die() ()
#7  0x0000563aeff48ff3 in __ubsan_handle_type_mismatch_v1_abort ()
#8  0x00007fe97907d3b8 in hamming_distance (fcinfo=<optimized out>) at src/bitvector.c:49
#9  0x0000563af122b4fb in FunctionCall2Coll (flinfo=<optimized out>, collation=<optimized out>, arg1=<optimized out>, arg2=<optimized out>) at fmgr.c:1132
#10 0x00007fe9790b198d in InitCenters (index=0x7fe979449fc8, samples=0x625000055600, centers=0x625000055268, lowerBound=0x625000052a50) at src/ivfkmeans.c:57
#11 ElkanKmeans (index=0x7fe979449fc8, samples=0x625000055600, centers=0x625000055268, type=IVFFLAT_TYPE_BIT) at src/ivfkmeans.c:497
#12 IvfflatKmeans (index=0x7fe979449fc8, samples=0x625000055600, centers=0x625000055268, type=IVFFLAT_TYPE_BIT) at src/ivfkmeans.c:755
#13 0x00007fe9790a8f49 in ComputeCenters (buildstate=0x7ffcace47d80) at src/ivfbuild.c:466
#14 BuildIndex (heap=heap@entry=0x7fe9794471d8, index=0x7fe979449fc8, indexInfo=<optimized out>, buildstate=buildstate@entry=0x7ffcace47d80, forkNum=forkNum@entry=MAIN_FORKNUM) at src/ivfbuild.c:1027
#15 0x00007fe9790a884b in ivfflatbuild (heap=0x7fe9794471d8, index=<optimized out>, indexInfo=<optimized out>) at src/ivfbuild.c:1046
#16 0x0000563af026d89c in index_build (heapRelation=<optimized out>, indexRelation=0x7fe979449fc8, indexInfo=<optimized out>, isreindex=<optimized out>, parallel=<optimized out>) at index.c:3042
#17 0x0000563af0269d49 in index_create (heapRelation=0x1edddd, indexRelationName=<optimized out>, indexRelationId=24998, parentIndexRelid=<optimized out>, parentConstraintId=<optimized out>, relFileNumber=<optimized out>, indexInfo=<optimized out>,
    indexColNames=<optimized out>, accessMethodObjectId=<optimized out>, tableSpaceId=<optimized out>, collationObjectId=<optimized out>, classObjectId=<optimized out>, coloptions=<optimized out>, reloptions=<optimized out>, flags=<optimized out>,
    constr_flags=<optimized out>, allow_system_table_mods=<optimized out>, is_internal=<optimized out>, constraintId=<optimized out>) at index.c:1265
#18 0x0000563af04dc8e2 in DefineIndex (relationId=<optimized out>, stmt=<optimized out>, indexRelationId=<optimized out>, parentIndexId=<optimized out>, parentConstraintId=<optimized out>, total_parts=<optimized out>, is_alter_table=<optimized out>,
    check_rights=<optimized out>, check_not_in_use=<optimized out>, skip_build=<optimized out>, quiet=<optimized out>) at indexcmds.c:1166
#19 0x0000563af0de32f2 in ProcessUtilitySlow (pstate=<optimized out>, pstmt=<optimized out>, queryString=<optimized out>, context=<optimized out>, params=<optimized out>, params@entry=0x0, queryEnv=<optimized out>, queryEnv@entry=0x0,
    dest=<optimized out>, qc=0x7ffcace495e0) at utility.c:1553
#20 0x0000563af0ddf5fb in standard_ProcessUtility (pstmt=0x6250000087e8, queryString=<optimized out>, readOnlyTree=<optimized out>, context=2074860028, params=<optimized out>, queryEnv=<optimized out>, dest=<optimized out>, qc=<optimized out>)
    at utility.c:1078
#21 0x0000563af0dde67d in ProcessUtility (pstmt=0x1edddd, pstmt@entry=0x6250000087e8, queryString=0x1edddd <error: Cannot access memory at address 0x1edddd>, readOnlyTree=<optimized out>, context=2074860028, params=0x7ffcace47520, queryEnv=0x1,
    dest=<optimized out>, qc=<optimized out>) at utility.c:530
#22 0x0000563af0ddd46d in PortalRunUtility (portal=0x625000028218, pstmt=0x6250000087e8, isTopLevel=<optimized out>, setHoldSnapshot=false, dest=0x625000008aa8, qc=0x7ffcace495e0) at pquery.c:1158
#23 0x0000563af0ddb479 in PortalRunMulti (portal=0x1edddd, portal@entry=0x625000028218, isTopLevel=68, setHoldSnapshot=108, dest=dest@entry=0x625000008aa8, altdest=altdest@entry=0x625000008aa8, qc=0x7ffcace495e0) at pquery.c:1315
#24 0x0000563af0dd9935 in PortalRun (portal=portal@entry=0x625000028218, count=count@entry=9223372036854775807, isTopLevel=false, run_once=true, dest=0x7fe97babd9fc <__GI___pthread_kill+300>, dest@entry=0x625000008aa8, altdest=0x7ffcace47520,
    altdest@entry=0x625000008aa8, qc=0x7ffcace495e0) at pquery.c:791
#25 0x0000563af0dd5421 in exec_simple_query (query_string=<optimized out>) at postgres.c:1274
#26 0x0000563af0dcfb03 in PostgresMain (dbname=dbname@entry=0x62900001b370 "contrib_regression", username=username@entry=0x62900001b358 "test") at postgres.c:4633
#27 0x0000563af0badd69 in BackendRun (port=port@entry=0x615000011b00) at postmaster.c:4464
#28 0x0000563af0baab1b in BackendStartup (port=0x615000011b00) at postmaster.c:4192
#29 ServerLoop () at postmaster.c:1782
#30 0x0000563af0ba56b6 in PostmasterMain (argc=<optimized out>, argv=<optimized out>) at postmaster.c:1466
#31 0x0000563af07e1036 in main (argc=3, argv=0x603000000340) at main.c:198

And if i start prove_installcheck using the same options:

test/t/034_distance_functions.pl ............ ok
test/t/035_ivfflat_bit_build_recall.pl ...... Dubious, test returned 29 (wstat 7424, 0x1d00)
No subtests run

Test Summary Report
-------------------
test/t/035_ivfflat_bit_build_recall.pl    (Wstat: 7424 Tests: 0 Failed: 0)
  Non-zero exit status: 29
  Parse errors: No plan found in TAP output
Files=35, Tests=854, 557 wallclock secs ( 0.21 usr  0.07 sys + 86.69 cusr 34.91 csys = 121.88 CPU)
Result: FAIL
make: *** [Makefile:60: prove_installcheck] Error 1

035_ivfflat_bit_build_recall.log:

src/ivfkmeans.c:485:4: runtime error: member access within misaligned address 0x6250000508b7 for type 'varattrib_4b', which requires 4 byte alignment
0x6250000508b7: note: pointer points here
 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00
             ^
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/ivfkmeans.c:485:4 in
2024-04-17 04:49:27.083 UTC [2022607] LOG:  server process (PID 2022661) was terminated by signal 6: Aborted
2024-04-17 04:49:27.083 UTC [2022607] DETAIL:  Failed process was running: CREATE INDEX idx ON tst USING ivfflat (v bit_hamming_ops);

Best regards,
Egor Chindyaskin
Postgres Professional: https://postgrespro.com/

@ankane
Copy link
Member

ankane commented Apr 17, 2024

Hi @saygoodbyye, thanks for reporting! Added that functionality earlier today (04af15c), but isn't great that this wasn't caught. Can you see if the patch below fixes it?

--- a/src/ivfbuild.c
+++ b/src/ivfbuild.c
@@ -342,7 +342,7 @@ GetItemSize(IvfflatType type, int dimensions)
        else if (type == IVFFLAT_TYPE_HALFVEC)
                return HALFVEC_SIZE(dimensions);
        else if (type == IVFFLAT_TYPE_BIT)
-               return VARBITTOTALLEN(dimensions);
+               return MAXALIGN(VARBITTOTALLEN(dimensions));
        else
                elog(ERROR, "Unsupported type");
 }

@ankane ankane closed this as completed in cf57081 Apr 17, 2024
@ankane
Copy link
Member

ankane commented Apr 17, 2024

Confirmed that was the issue. Pushed a fix in the commit above, and will add UBSan to CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants