Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: Improve GetVectorById of Sparse Float Vector #33209

Merged
merged 1 commit into from
Jun 12, 2024

Conversation

zhengbuqian
Copy link
Collaborator

@zhengbuqian zhengbuqian commented May 21, 2024

issue: #29419

  • sparse float vector to support raw data mmap

For get vector from chunk cache, I added a unit test but marking it as skipped due to a known issue. I have tested it locally.

@sre-ci-robot sre-ci-robot added the size/L Denotes a PR that changes 100-499 lines. label May 21, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement labels May 21, 2024
Copy link

codecov bot commented May 21, 2024

Codecov Report

Attention: Patch coverage is 19.48052% with 62 lines in your changes missing coverage. Please review.

Project coverage is 82.18%. Comparing base (ee73e62) to head (46570e0).
Report is 106 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #33209      +/-   ##
==========================================
+ Coverage   82.15%   82.18%   +0.02%     
==========================================
  Files        1012     1003       -9     
  Lines      128957   129221     +264     
==========================================
+ Hits       105944   106195     +251     
- Misses      19038    19040       +2     
- Partials     3975     3986      +11     
Files Coverage Δ
internal/core/src/segcore/SegmentSealedImpl.h 80.00% <100.00%> (ø)
internal/core/src/storage/Util.cpp 78.60% <100.00%> (-0.65%) ⬇️
internal/core/src/mmap/Utils.h 68.29% <0.00%> (-15.58%) ⬇️
internal/core/src/storage/ChunkCache.cpp 69.23% <15.38%> (-9.35%) ⬇️
internal/core/src/mmap/Column.h 84.71% <21.05%> (-6.33%) ⬇️
internal/core/src/segcore/SegmentSealedImpl.cpp 75.74% <9.37%> (-1.39%) ⬇️

... and 219 files with indirect coverage changes

@buqian-zilliz buqian-zilliz force-pushed the mmap-sparse-raw branch 2 times, most recently from 2ee55e6 to 2b4a913 Compare May 22, 2024 09:31
Copy link
Contributor

mergify bot commented May 22, 2024

@zhengbuqian E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@zhengbuqian
Copy link
Collaborator Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented May 23, 2024

@zhengbuqian E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@zhengbuqian zhengbuqian added the PR | need cherry-pick need cherry pick to other branches label May 23, 2024
Copy link
Contributor

mergify bot commented May 23, 2024

@zhengbuqian E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@zhengbuqian
Copy link
Collaborator Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented May 23, 2024

@zhengbuqian E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@zhengbuqian
Copy link
Collaborator Author

/run-cpu-e2e

@zhengbuqian
Copy link
Collaborator Author

zhengbuqian commented May 24, 2024

e2e blocked by #33268

just passed, seems the test is flaky

@zhengbuqian
Copy link
Collaborator Author

ut in test_sealed.cpp is skipped due to #33210

@@ -123,6 +128,7 @@ class ColumnBase {
AssertInfo(data_ != MAP_FAILED,
"failed to create file-backed map, err: {}",
strerror(errno));
madvise(data_, mapped_size, MADV_WILLNEED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

madvise here may lead to memory increase. Perhaps it's best not to add it for now.

Copy link
Collaborator Author

@zhengbuqian zhengbuqian May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, removed.

virtual const char*
Data() const {
return data_;
}

// MmappedData() returns the mmaped address
const char*
MmappedData() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any difference compared to the Data()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data() points at an array of elements, while MmappedData() points at the entire mmap-ed memory. They are the same for dense vectors but not the same for sparse vectors. SparseFloatColumn overrides Data() to return something different.

@bigsheeper
Copy link
Contributor

/lgtm for the GetVector part

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this pull request Jun 6, 2024
issue: #29419
pr: #33209 

codecov will fail due to newly added ut in test_sealed.cpp skipped due
to #33210

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
Copy link
Contributor

@congqixia congqixia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: congqixia, zhengbuqian

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yanliang567 yanliang567 added ci-passed manual-pass manually set pass before ci-passed labeled labels Jun 12, 2024
@sre-ci-robot sre-ci-robot merged commit 8cb3505 into milvus-io:master Jun 12, 2024
14 of 15 checks passed
@zhengbuqian zhengbuqian deleted the mmap-sparse-raw branch June 12, 2024 02:10
@czs007 czs007 removed the PR | need cherry-pick need cherry pick to other branches label Jun 25, 2024
yellow-shine pushed a commit to yellow-shine/milvus that referenced this pull request Jul 2, 2024
issue: milvus-io#29419

* sparse float vector to support raw data mmap

For get vector from chunk cache, I added a unit test but marking it as
skipped due to a known issue. I have tested it locally.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved ci-passed dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement lgtm manual-pass manually set pass before ci-passed labeled size/L Denotes a PR that changes 100-499 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants