Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: regex query can't handle text with newline #32569

Merged
merged 5 commits into from
Apr 26, 2024

Conversation

longjiquan
Copy link
Contributor

issue: #32482

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
@sre-ci-robot sre-ci-robot added the size/L Denotes a PR that changes 100-499 lines. label Apr 24, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/bug Issues or changes related a bug labels Apr 24, 2024
Copy link

codecov bot commented Apr 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.75%. Comparing base (dcc15e3) to head (8dbac4d).
Report is 19 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #32569      +/-   ##
==========================================
- Coverage   81.83%   81.75%   -0.08%     
==========================================
  Files         999      991       -8     
  Lines      124070   124622     +552     
==========================================
+ Hits       101529   101890     +361     
- Misses      18666    18850     +184     
- Partials     3875     3882       +7     
Files Coverage Δ
internal/core/src/common/RegexQuery.cpp 100.00% <100.00%> (ø)
internal/core/src/common/RegexQuery.h 100.00% <100.00%> (ø)
internal/core/src/exec/expression/UnaryExpr.cpp 83.38% <100.00%> (-0.01%) ⬇️
internal/core/src/exec/expression/UnaryExpr.h 82.08% <100.00%> (+4.94%) ⬆️

... and 236 files with indirect coverage changes

@mergify mergify bot added the ci-passed label Apr 24, 2024
@alexanderguzhva
Copy link
Contributor

#31758

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
@mergify mergify bot removed the ci-passed label Apr 24, 2024
}

std::string
quote_meta(const std::string& s) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always allocate new memory many times, do a precheck like go implementation should be better?
https://cs.opensource.google/go/go/+/refs/tags/go1.22.2:src/regexp/regexp.go;l=726

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link the reference code in comment?

} else {
if (c == '\\') {
escapeMode = true;
escape_mode = true;
} else if (c == src) {
result += replacement;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too much string concatenation.

Copy link
Contributor

@chyezh chyezh Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should replace % but not \\% with [\\s\\S]*?
\\% should be %.

But current implementation did't do this at my view.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too much string concatenation.
use std::stringstream ?

@czs007
Copy link
Contributor

czs007 commented Apr 25, 2024

use boost::regrex as @alexanderguzhva suggested

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
is_special(char c);

std::string
quote_meta(const std::string& s);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant code

};

template <>
inline bool
RegexMatcher::operator()(const std::string& operand) {
return std::regex_match(operand, r_);
return boost::regex_match(operand, r_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment boost corner case?
. don't match \n.
but .* match \n.

const std::string& replacement) {
std::string result;
translate_pattern_match_to_regex(const std::string& pattern) {
std::string r;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use string builder to decrease memory allocation

Copy link
Contributor

mergify bot commented Apr 25, 2024

@longjiquan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
@longjiquan
Copy link
Contributor Author

fixed, cc @chyezh

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
@chyezh
Copy link
Contributor

chyezh commented Apr 25, 2024

\lgtm

@chyezh chyezh added the lgtm label Apr 25, 2024
Copy link
Contributor

mergify bot commented Apr 25, 2024

@longjiquan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@longjiquan
Copy link
Contributor Author

/run-cpu-e2e

@czs007
Copy link
Contributor

czs007 commented Apr 26, 2024

/approve
/lgtm

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: czs007, longjiquan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit ccce1e9 into milvus-io:master Apr 26, 2024
15 checks passed
@longjiquan longjiquan deleted the fix-regex-newline branch April 26, 2024 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved ci-passed dco-passed DCO check passed. kind/bug Issues or changes related a bug lgtm size/L Denotes a PR that changes 100-499 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants