Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: Add a config item for partition name as regexp feature and disable it by default #29154

Merged

Conversation

congqixia
Copy link
Contributor

@congqixia congqixia commented Dec 12, 2023

See also #29177
Add a config item for partition name as regexp feature and disable it by default

Partition name work as regexp is legacy logic from 1.x and shall not be
supported any more

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
@sre-ci-robot sre-ci-robot added size/S Denotes a PR that changes 10-29 lines. approved labels Dec 12, 2023
@mergify mergify bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement labels Dec 12, 2023
Copy link
Contributor

mergify bot commented Dec 12, 2023

@congqixia E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@congqixia
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Dec 12, 2023

@congqixia E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@congqixia
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Dec 12, 2023

@congqixia E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@congqixia
Copy link
Contributor Author

/run-cpu-e2e

Copy link

codecov bot commented Dec 13, 2023

Codecov Report

Merging #29154 (c969891) into master (cb75e73) will decrease coverage by 0.06%.
Report is 10 commits behind head on master.
The diff coverage is 91.66%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #29154      +/-   ##
==========================================
- Coverage   82.02%   81.96%   -0.06%     
==========================================
  Files         862      861       -1     
  Lines      121926   122118     +192     
==========================================
+ Hits       100004   100091      +87     
- Misses      18600    18683      +83     
- Partials     3322     3344      +22     
Files Coverage Δ
pkg/util/paramtable/component_param.go 98.18% <100.00%> (+<0.01%) ⬆️
internal/proxy/task_search.go 79.51% <89.28%> (+0.28%) ⬆️

... and 28 files with indirect coverage changes

Copy link
Contributor

mergify bot commented Dec 13, 2023

@congqixia E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Copy link
Contributor

mergify bot commented Dec 13, 2023

@congqixia E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
@congqixia congqixia added the behavior-change pr has behavior change label Dec 13, 2023
@mergify mergify bot added the ci-passed label Dec 13, 2023
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: congqixia

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot added size/L Denotes a PR that changes 100-499 lines. and removed size/S Denotes a PR that changes 10-29 lines. labels Dec 13, 2023
Copy link
Contributor

mergify bot commented Dec 13, 2023

@congqixia Please associate the related issue to the body of your Pull Request. (eg. “issue: #”)

@congqixia congqixia changed the title enhance: remove legacy partition regexp logic enhance: Add a config item for partition name as regexp feature and disable it by default Dec 13, 2023
@mergify mergify bot added the ci-passed label Dec 13, 2023
Copy link
Member

@yah01 yah01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@sre-ci-robot sre-ci-robot merged commit 9407be6 into milvus-io:master Dec 14, 2023
14 checks passed
congqixia added a commit to congqixia/milvus that referenced this pull request Dec 14, 2023
…isable it by default (milvus-io#29154)

See also milvus-io#29177
Add a config item for partition name as regexp feature and disable it by
default

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
congqixia added a commit to congqixia/milvus that referenced this pull request Dec 14, 2023
…isable it by default (milvus-io#29154)

See also milvus-io#29177
Add a config item for partition name as regexp feature and disable it by
default

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
sre-ci-robot pushed a commit that referenced this pull request Dec 14, 2023
… feature (#29154) (#29183)

Cherry pick from master
pr: #29154 
See also #29177
Add a config item for partition name as regexp feature and disable it by
default

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
@NiftyliuS
Copy link

NiftyliuS commented Feb 2, 2024

In the release notes you have:
To reduce resource consumption, regular expression searches in partitions have been discontinued. However, this feature can be re-enabled through configuration (see #29154 for details).

Can you please add an explanation of how to re-enable this feature? I cant update to 2.3.4 because this breaks a filter i have on a partition key field...

PS: this disables automatic partitions expr

@xiaofan-luan
Copy link
Collaborator

In the release notes you have: To reduce resource consumption, regular expression searches in partitions have been discontinued. However, this feature can be re-enabled through configuration (see #29154 for details).

Can you please add an explanation of how to re-enable this feature? I cant update to 2.3.4 because this breaks a filter i have on a partition key field...

PS: this disables automatic partitions expr

you need to set proxy.partitionNameRegexp to true

@xiaofan-luan
Copy link
Collaborator

In fact we don't recommend to enable this features since this might a serious performnace problem

@NiftyliuS
Copy link

"you need to set proxy.partitionNameRegexp to true" - you mean in the mivlus.yaml? is there an env param to override this?

Yes i get it that its not recommended,
I did read the notes on performance, the problem is that currently i have milvus setup using automatic partition feature.
Disabling thing means i cant filter by document_id at all, for any value.

And since i initially setup the partitions automatically using is_partition_key it means i have to redo the code and migrate the collection to a new one entirely...

For me this is a breaking change that came in a patch version...

Speaking of - what is the correct way to address document_id in my case? i intend to have 60.000 + books vectorized and i do want to be able to search all of them at once ( standard search ) as well as a search in a specific book only.
I cant create partition per book, the limit is 10k partitions right?

Should i just set 128 partitions and hash the document_id to one of them and use that for search?
help is much appreciated

@xiaofan-luan
Copy link
Collaborator

"you need to set proxy.partitionNameRegexp to true" - you mean in the mivlus.yaml? is there an env param to override this?

Yes i get it that its not recommended, I did read the notes on performance, the problem is that currently i have milvus setup using automatic partition feature. Disabling thing means i cant filter by document_id at all, for any value.

And since i initially setup the partitions automatically using is_partition_key it means i have to redo the code and migrate the collection to a new one entirely...

For me this is a breaking change that came in a patch version...

Speaking of - what is the correct way to address document_id in my case? i intend to have 60.000 + books vectorized and i do want to be able to search all of them at once ( standard search ) as well as a search in a specific book only. I cant create partition per book, the limit is 10k partitions right?

Should i just set 128 partitions and hash the document_id to one of them and use that for search? help is much appreciated

you have to config this in milvus.yaml

@xiaofan-luan
Copy link
Collaborator

can if you use partition keys, then each time you should just search with one partition key right?
you can directly specify which partition key you are gonna to use and there is no need to use a regular expression

@NiftyliuS
Copy link

In the auto partition key mode the new flag disables searching by document_id entirely.
It creates 64 partitions named _default0-63

when not specifying partitions document_id is ignored and i get 0 results.
when specifying partitions i get an error manually setting partitions is not allowed when using partition mode
(i will update with the exact message once i get to the laptop)

I think you are right, setting up a new collection with manual partitions might be the way to go...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved behavior-change pr has behavior change ci-passed dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement lgtm size/L Denotes a PR that changes 100-499 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants