Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fix use-after-free case on nmslib search path #1304

Closed
jmazanec15 opened this issue Nov 13, 2023 · 0 comments
Closed

[BUG] Fix use-after-free case on nmslib search path #1304

jmazanec15 opened this issue Nov 13, 2023 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@jmazanec15
Copy link
Member

jmazanec15 commented Nov 13, 2023

Description

When querying and using the nmslib engine, we ran into a segfault with the following error:


# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f4a9aa3e4a4, pid=26, tid=1428
#
# JRE version: OpenJDK Runtime Environment Temurin-17.0.6+10 (17.0.6+10) (build 17.0.6+10)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (17.0.6+10, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libopensearchknn_nmslib.so+0xa14a4]  similarity::Hnsw<float>::Search(similarity::KNNQuery<float>*, int) const+0x4
#
# Core dump will be written. Default location: /opt/amazon/distribution/local/opensearch-1.0.0/core.26
#
# An error report file with more information is saved as:
# logs/hs_err_pid26.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

This maps to this line in nmslib: https://github.com/nmslib/nmslib/blob/master/similarity_search/src/method/hnsw.cc#L721. In the disassembled lib, we see:

00000000000a14a0 <_ZNK10similarity4HnswIfE6SearchEPNS_8KNNQueryIfEEi>:
   a14a0:   48 8b 47 08             mov    0x8(%rdi),%rax
   a14a4:   48 8b 48 08             mov    0x8(%rax),%rcx

which suggests its segfaulting on call to:

        if (this->data_.empty()...

A good explanation for this is that we are passing in a local variable for this->data_ to an arg that requires a reference: https://github.com/opensearch-project/k-NN/blob/main/jni/include/nmslib_wrapper.h#L51. This creates a use-after-free scenario where the variable is popped off the stack but then referenced later.

this->data_ is actually a dummy variable that does not store any data. This is because for nmslib the graph construction is optimized to contain vectors interleaved with level 1 neighbors. Never the less, it is used here as an empty check.

The commit where this was introduced was a refactoring I had done awhile ago related to clang tidy warnings: opendistro-for-elasticsearch/k-NN@7640186. Im still digging into why I made this change.

Suggested fix

Just move data back from local to struct variable. This will tie the lifetime of data with the IndexWrapper.

Open Questions

This error never seems to occur. We have run millions of queries without seeing any segfaults, but it seems like a pretty clear this is a bug. The question is why don't we see this issue more frequently. A couple guesses:

  1. The stack memory is very rarely reclaimed by OS and overwritten by process - the former seems reasonable, but the latter seems a little bit more far-fetched. However, it is possible that the call this->_data.empty() typically will return false when junk is provided in the bits. This would explain infrequency of seeing this error.
  2. The compiler optimizes away this pass by reference. Seems unlikely but not sure.

Another open question is why I made the change in the first place. Seems clang-tidy may have suggested it or I may have misinterpreted clang-tidy warnings. I will investigate this as well.

Will need to do more investigation here.

@jmazanec15 jmazanec15 added the bug Something isn't working label Nov 13, 2023
@jmazanec15 jmazanec15 self-assigned this Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant