Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: Avoid merging insert data when buffering insert msgs #33562

Merged

Conversation

congqixia
Copy link
Contributor

@congqixia congqixia commented Jun 3, 2024

See also #33561

This PR:

  • Use zero copy when buffering insert messages
  • Make storage.InsertCodec support serialize multiple insert data chunk into same batch binlog files

Signed-off-by: Congqi Xia congqi.xia@zilliz.com

@sre-ci-robot sre-ci-robot added the size/L Denotes a PR that changes 100-499 lines. label Jun 3, 2024
@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: congqixia

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mergify mergify bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement labels Jun 3, 2024
Copy link
Contributor

mergify bot commented Jun 3, 2024

@congqixia E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

1 similar comment
Copy link
Contributor

mergify bot commented Jun 3, 2024

@congqixia E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@congqixia
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Jun 3, 2024

@congqixia E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

return ib.currentBuffer().buffer
}
// no error assumed, buffer created before
result, _ := storage.NewInsertDataWithCap(ib.collSchema, int(ib.rows))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we don't really need to return a real *storage.InsertData ?
can we instead return a insertDataInterator? so we can avoid one extra copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Working on it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shall be a major refactory in storage. we shall implement iterator in next PR

See also milvus-io#33561

This PR:
- Adds a new param item for insert buffer chunk size
- Pre-allocate for each insert buffer preventing frequent `growslice`

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
@congqixia congqixia force-pushed the enhance/preallocate_insert_buffer branch from b48248e to ca7bae6 Compare June 11, 2024 03:33
Copy link
Contributor

mergify bot commented Jun 11, 2024

@congqixia E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Copy link

codecov bot commented Jun 11, 2024

Codecov Report

Attention: Patch coverage is 72.02073% with 54 lines in your changes missing coverage. Please review.

Project coverage is 80.90%. Comparing base (a1232fa) to head (2605393).
Report is 23 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #33562      +/-   ##
==========================================
+ Coverage   80.88%   80.90%   +0.01%     
==========================================
  Files        1051     1058       +7     
  Lines      134863   135201     +338     
==========================================
+ Hits       109088   109387     +299     
- Misses      21590    21621      +31     
- Partials     4185     4193       +8     
Files Coverage Δ
internal/datanode/importv2/util.go 77.48% <100.00%> (+1.32%) ⬆️
internal/datanode/syncmgr/serializer.go 100.00% <100.00%> (ø)
internal/datanode/syncmgr/storage_serializer.go 91.86% <100.00%> (+0.41%) ⬆️
internal/datanode/writebuffer/insert_buffer.go 100.00% <100.00%> (ø)
internal/datanode/writebuffer/segment_buffer.go 96.42% <100.00%> (+3.57%) ⬆️
internal/datanode/writebuffer/write_buffer.go 88.37% <100.00%> (+0.54%) ⬆️
internal/indexnode/chunkmgr_mock.go 42.39% <ø> (ø)
internal/storage/insert_data.go 89.23% <100.00%> (+0.04%) ⬆️
internal/util/importutilv2/binlog/field_reader.go 63.63% <100.00%> (ø)
internal/datanode/syncmgr/storage_v2_serializer.go 82.22% <50.00%> (+0.13%) ⬆️
... and 1 more

... and 49 files with indirect coverage changes

@mergify mergify bot added the ci-passed label Jun 11, 2024
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
@mergify mergify bot removed the ci-passed label Jun 11, 2024
@congqixia congqixia changed the title enhance: Pre-allocate insert buffer capacity for writebuffer enhance: Avoid merging insert data when buffering insert msgs Jun 11, 2024
Copy link
Contributor

mergify bot commented Jun 11, 2024

@congqixia ut workflow job failed, comment rerun ut can trigger the job again.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
@sre-ci-robot sre-ci-robot added size/XL Denotes a PR that changes 500-999 lines. and removed size/L Denotes a PR that changes 100-499 lines. labels Jun 11, 2024
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
@tedxu
Copy link
Collaborator

tedxu commented Jun 12, 2024

/lgtm

@yanliang567 yanliang567 added ci-passed manual-pass manually set pass before ci-passed labeled labels Jun 13, 2024
@sre-ci-robot sre-ci-robot merged commit 512ea6b into milvus-io:master Jun 13, 2024
14 of 15 checks passed
congqixia added a commit to congqixia/milvus that referenced this pull request Jun 13, 2024
See also milvus-io#33561 milvus-io#33562

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
congqixia added a commit to congqixia/milvus that referenced this pull request Jun 13, 2024
See also milvus-io#33561 milvus-io#33562

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
sre-ci-robot pushed a commit that referenced this pull request Jun 17, 2024
See also #33561 #33562

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
congqixia added a commit to congqixia/milvus that referenced this pull request Jun 26, 2024
…-io#33562)

See also milvus-io#33561

This PR:
- Use zero copy when buffering insert messages
- Make `storage.InsertCodec` support serialize multiple insert data
chunk into same batch binlog files

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
congqixia added a commit to congqixia/milvus that referenced this pull request Jun 26, 2024
yellow-shine pushed a commit to yellow-shine/milvus that referenced this pull request Jul 2, 2024
…-io#33562)

See also milvus-io#33561

This PR:
- Use zero copy when buffering insert messages
- Make `storage.InsertCodec` support serialize multiple insert data
chunk into same batch binlog files

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
yellow-shine pushed a commit to yellow-shine/milvus that referenced this pull request Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved ci-passed dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement lgtm manual-pass manually set pass before ci-passed labeled size/XL Denotes a PR that changes 500-999 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants