Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sql): fix backslash escape mechanism in LIKE and ILIKE operator #3006

Merged
merged 13 commits into from
Mar 8, 2023

Conversation

SiyaoIsHiding
Copy link
Contributor

Regarding this issue: #2623 (comment) by @bziobrowski
Feature: Enable the backslash escape before wildcards _ and % to make them literal. Even number of backslashes will not make them literal.

Test cases explained as follows:

Value
Value Stored The path is \_ignore
Test 1
Pattern Searched The path is \_ignore
Outcome Empty
Explanation A backslash makes the underscore following it become literal
Test 2
Pattern Searched The path is \\_ignore
Outcome The path is \_ignore
Explanation A backslash makes another backslash following it become literal
Test 3
Pattern Searched The path is \\\_ignore
Outcome The path is \_ignore
Explanation Both backslash and underscore become literal
Value
Test 4
Value stored \\?\D:\path
Pattern Searched \\\\_\\%
Outcome \\?\D:\path
Explanation Even number of backslashes will not make the wild cards become literal

@SiyaoIsHiding SiyaoIsHiding changed the title Fixing the Backslash Escape Mechanism in LIKE and ILIKE Operator fix(sql): the Backslash Escape Mechanism in LIKE and ILIKE Operator Feb 23, 2023
@bziobrowski
Copy link
Contributor

Please merge from master and fix conflict .

# Conflicts:
#	core/src/test/java/io/questdb/griffin/engine/functions/regex/LikeFunctionFactoryTest.java
@SiyaoIsHiding
Copy link
Contributor Author

Thank you very much for your suggestions!

Additionally, I am afraid I do not fully understand your requirement so I wrote and tested three versions that all pass the tests just in case you want me to change to another one.

  1. The version in the pull request is what you said "move the whole condition just before the last else".
  2. Here is "handle second character not in "_%" in nested if/else ."
        for (int i = 0; i < len; i++) {
            char c = pattern.charAt(i);
            if (c == '\\') {
                if (i + 1 < len){
                    i += 1;
                    c = pattern.charAt(i);
                    if (c == '_' || c == '%') {
                        sink.put(c);
                    } else if(c == '\\'){
                        sink.put("\\\\");
                    } else if ("[](){}.*+?$^|#".indexOf(c) != -1) {
                        sink.put("\\\\\\");
                        sink.put(c);
                    } else  {
                        sink.put("\\\\");
                        sink.put(c);
                    }
                } else {
                    sink.put("\\\\"); // the backslash is the last character
                }
            } else if (c == '_')
                sink.put('.');
            else if (c == '%')
                sink.put(".*?");
            else if ("[](){}.*+?$^|#".indexOf(c) != -1) {
                sink.put("\\");
                sink.put(c);
            } else
                sink.put(c);
        }
  1. Here is another version that works:
        for (int i = 0; i < len; i++) {
            char c = pattern.charAt(i);
            if (c == '_')
                sink.put('.');
            else if (c == '%')
                sink.put(".*?");
            else if( c == '\\' && i+1 < len && "_%".indexOf(pattern.charAt(i+1)) != -1){
                i += 1;
                sink.put(pattern.charAt(i));
            }
            else if (c == '\\' && i+1 < len && pattern.charAt(i+1) == '\\'){
                i += 1;
                sink.put("\\\\");
            }
            else if ("[](){}.*+?$^|#\\".indexOf(c) != -1) {
                sink.put("\\");
                sink.put(c);
            } else
                sink.put(c);
        }

@bziobrowski
Copy link
Contributor

What I meant is that logic could be optimized .
Once you find \ there are following options depending on next character :

  • if end of string - throw error like PostgreSQL
  • if next char is \ - add \
  • else print next char (includes '_' and '%' )

bziobrowski
bziobrowski previously approved these changes Mar 2, 2023
@SiyaoIsHiding
Copy link
Contributor Author

Thx for your detailed explanation. I will modify it today.

@bluestreak01 bluestreak01 changed the title fix(sql): the Backslash Escape Mechanism in LIKE and ILIKE Operator fix(sql): fix backslash escape mechanism in LIKE and ILIKE operator Mar 8, 2023
@bluestreak01 bluestreak01 merged commit 3dcd5f4 into questdb:master Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants