Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk allocate objects for nmslib index creation #773

Merged
merged 3 commits into from Apr 26, 2023

Conversation

jmazanec15
Copy link
Member

Description

For nmslib, we need to allocate objects that wrap vectors with metadata. This PR updates the objects to be allocated in one buffer as opposed to allocating objects individually. This was shown to prevent memory fragmentation.

For this change, I ran the same experiments as were run for #772 and they showed significant improvements in the RSS of the process after the indexing workload was run.

Configuration RSS beginning (KB) RSS after indexing (KB) Vectors Indexed (vectors)
nmslib-default (baseline) 8984748 11125772 648500
nmslib with bulk update 8972132 9439540 628000

Specifically, the change in RSS decreases by about 78% with this change in the experiment above.

In addition to this, fix bug in CMakeLists.txt file preventing unit tests from being created.

Issues Resolved

#772

Check List

  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@jmazanec15 jmazanec15 added Enhancements Increases software capabilities beyond original client specifications backport 2.x labels Feb 20, 2023
@jmazanec15 jmazanec15 requested a review from a team February 20, 2023 21:49
@codecov-commenter
Copy link

codecov-commenter commented Feb 20, 2023

Codecov Report

Merging #773 (3c1f120) into main (52ea93f) will not change coverage.
The diff coverage is n/a.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff            @@
##               main     #773   +/-   ##
=========================================
  Coverage     85.14%   85.14%           
  Complexity     1079     1079           
=========================================
  Files           151      151           
  Lines          4370     4370           
  Branches        390      390           
=========================================
  Hits           3721     3721           
  Misses          472      472           
  Partials        177      177           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@@ -66,7 +66,7 @@ target_include_directories(${TARGET_LIB_COMMON} PRIVATE ${CMAKE_CURRENT_SOURCE_D
set_target_properties(${TARGET_LIB_COMMON} PROPERTIES SUFFIX ${LIB_EXT})
set_target_properties(${TARGET_LIB_COMMON} PROPERTIES POSITION_INDEPENDENT_CODE ON)

if (WIN32)
if (NOT "${WIN32}" STREQUAL "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is if (WIN32) and if (NOT "${WIN32}" STREQUAL "") are same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIN32 is not a variable it is just the string "WIN32". "${WIN32}" is a variable.

@jmazanec15 jmazanec15 requested a review from vamshin March 28, 2023 18:30
Signed-off-by: John Mazanec <jmazane@amazon.com>
For nmslib, we need to allocate objects that wrap vectors with metadata.
This commit updates the objects to be allocated in one buffer as opposed
to allocating objects individually. This was shown to prevent memory
fragmentation.

Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
@jmazanec15
Copy link
Member Author

Failure on windows related to: opensearch-project/OpenSearch#7080. Please review

@navneet1v
Copy link
Collaborator

Let's make sure that we run the perf tests to see how much gain we are getting.

@jmazanec15 jmazanec15 merged commit ef44fca into opensearch-project:main Apr 26, 2023
29 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 26, 2023
For nmslib, we need to allocate objects that wrap vectors with metadata.
This commit updates the objects to be allocated in one buffer as opposed
to allocating objects individually. This was shown to prevent memory
fragmentation.

Signed-off-by: John Mazanec <jmazane@amazon.com>
(cherry picked from commit ef44fca)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Enhancements Increases software capabilities beyond original client specifications
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants