Skip to content

POSIX regular expressions operators#41

Merged
roji merged 1 commit intonpgsql:devfrom
PSeON:dev
Jul 8, 2016
Merged

POSIX regular expressions operators#41
roji merged 1 commit intonpgsql:devfrom
PSeON:dev

Conversation

@PSeON
Copy link
Copy Markdown
Contributor

@PSeON PSeON commented Jul 5, 2016

Hello. This pull request adds two POSIX regular expressions operators ("" and "*").
https://www.postgresql.org/docs/current/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP
Tests included.
Please let me know what do you think. Thanks.

@roji
Copy link
Copy Markdown
Member

roji commented Jul 5, 2016

Thanks for submitting this!

The basic work looks good, but there are several important discrepancies between .NET regex and PostgreSQL regex, and it's important for the EF6 provider to provide .NET behavior.

I recommend you take a look at the work I've done on regex in the EFCore provider (see NpgsqlRegexIsMatchTranslator).

{"operator_tsquery_contains",Operator.QueryContains},
{"operator_tsquery_is_contained",Operator.QueryIsContained}
{"operator_tsquery_is_contained",Operator.QueryIsContained},
{"regex_is_match",Operator.RegexIsMatch},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more consistent to change regex_is_match to operator_regex_is_match.

@PSeON
Copy link
Copy Markdown
Contributor Author

PSeON commented Jul 5, 2016

Agree. operator_regex_is_match looks more appropriate.

@PSeON
Copy link
Copy Markdown
Contributor Author

PSeON commented Jul 6, 2016

Regular expressions functions now match .NET behaviour. Like in the EFCore provider.

@roji
Copy link
Copy Markdown
Member

roji commented Jul 6, 2016

@PSeON thanks, I promise to review ASAP. In the meantime can you please squash the two commits and push force for a cleaner history?

@PSeON
Copy link
Copy Markdown
Contributor Author

PSeON commented Jul 6, 2016

Done.

@roji
Copy link
Copy Markdown
Member

roji commented Jul 6, 2016

@rwasef1830, if you can help review this and provide more comments that would be greatly appreciated!

/// otherwise, <see langword="false"/>.
/// </returns>
[DbFunction("Npgsql", "regex_is_match")]
public static bool RegexIsMatch(string input, string pattern)
Copy link
Copy Markdown
Contributor

@rwasef1830 rwasef1830 Jul 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if this (and its overload) was named MatchRegex it would be consistent the "Match" method which generates the @@ operator.

I would also rename the values in the operators and DbFunction attributes ... etc. to make it consistent with the match operator. (eg: operator_regex_is_match changes to operator_regex_match).

Also, I don't think the documentation for each parameter and return value adds any additional value. In order to free ourselves from the burden of keeping it in sync with postgresql docs, the link to the manual in the <summary> is enough. This also makes it consistent with the rest of the methods in this class.

@rwasef1830
Copy link
Copy Markdown
Contributor

rwasef1830 commented Jul 6, 2016

@PSeON in addition to my notes on the commit, I find the behavior difference between the overload without RegexOptions and the one with to be slightly misleading.

There are several sets of semantics exposed in a confusing way here: Operator "" vs "*" and behavior between both overloads (with and without .NET RegexOptions) and lining up with PostgreSQL semantics.

According to postgresql docs:

An ARE can begin with embedded options: a sequence (?xyz) (where xyz is one or more alphabetic characters) specifies options affecting the rest of the RE. These options override any previously determined options — in particular, they can override the case-sensitivity behavior implied by a regex operator, or the flags parameter to a regex function.

I think it is redundant to use both operators and in the case of one of them override the meaning of the operator with embedded options. To match the .NET semantics (which is case-sensitive by default unless overridden by RegexOptions), both functions should emit the case-sensitive version of the PostgreSQL operator.

In the case of the second one, the meaning of the operator will be overridden by the RegexOptions, and in case of RegexOptions.None (or calling the overload that doesn't have a RegexOptions parameter), it should behave as .NET would behave (and both methods should behave the same).

According to MS docs, RegexOptions.None gives the following behavior:

  • The pattern is interpreted as a canonical rather than an ECMAScript regular expression.
  • The regular expression pattern is matched in the input string from left to right.
  • Comparisons are case-sensitive.
  • The ^ and $ language elements match the beginning and end of the input string.
  • The . language element matches every character except \n.
  • Any white space in a regular expression pattern is interpreted as a literal space character.
  • The conventions of the current culture are used when comparing the pattern to the input string.
  • Capturing groups in the regular expression pattern are implicit as well as explicit.

There should be also testcases that confirm this behavior.

@roji What do you think about this and my notes ?

}

[Test]
public void RegexIsMatchOptions()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be clearer / more readable using [TestCase].

@PSeON
Copy link
Copy Markdown
Contributor Author

PSeON commented Jul 6, 2016

@rwasef1830 thanks for the comments. I am very grateful to you for your help. I will fix these problems ASAP.

@PSeON
Copy link
Copy Markdown
Contributor Author

PSeON commented Jul 7, 2016

Finished implementation of recommendations. Please review this version. Thanks.

args[1].Accept(this));
}

string flags = "(?";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a StringBuilder rather than string concatenation

@roji
Copy link
Copy Markdown
Member

roji commented Jul 7, 2016

I've taken a quick look (sorry, don't have much spare time at the moment), it's after @PSeON's latest modifications so @rwasef1830, I'm not sure exactly what you were referring to before - although what you say sounds right. The code as it is now seems to correspond pretty well to what I wrote for EF Core. I remember I spent some time understanding exactly how PostgreSQL does regexp and how .NET does them, and I think everything should be OK.

@rwasef1830, if you thing any discrepancy still remains please let us know (I'll fix the EF Core implementation too if necessary), otherwise I'm OK for merging (modulu the nit-picking StringBuilder comment). @rwasef1830 I'll wait for your confirmation before doing so.

{"operator_tsquery_contains",Operator.QueryContains},
{"operator_tsquery_is_contained",Operator.QueryIsContained}
{"operator_tsquery_is_contained",Operator.QueryIsContained},
{"operator_regex_match",Operator.RegexMatch},
Copy link
Copy Markdown
Contributor

@rwasef1830 rwasef1830 Jul 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename operator_regex_match_case to operator_regex_match (and remove the other one). This implementation should be only using 1 operator and that's the case sensitive one (and the behavior is then overridden with postgresql modifiers according to RegexOptions).

@rwasef1830
Copy link
Copy Markdown
Contributor

@PSeON Thanks for your work! In addition to my latest comments, I think you should add a test for negative matching (maybe a second query in the same tests that selects the mismatching input).

@roji Will negating this query execute and be treated the same way as using the "not regex match" operator in PostgreSQL or does this need to be separately implemented as well ?

@roji
Copy link
Copy Markdown
Member

roji commented Jul 7, 2016

@rwasef1830, good question - it's worth testing. I'm guessing that without any extra handling, a negative regex match should render something like NOT (x ~ y), which is correct but maybe a tiny bit ugly. I'm not sure it's worth the extra effort to render x !~ y...

@roji
Copy link
Copy Markdown
Member

roji commented Jul 7, 2016

@rwasef1830 and thanks for the very valuable and through reviewing!

@rwasef1830
Copy link
Copy Markdown
Contributor

@PSeON I forgot to mention, please add a unit test that very explicitly makes sure that using MatchRegex(string, string) and MatchRegex(string, string, RegexOptions.None) produce exactly the same results.

@PSeON
Copy link
Copy Markdown
Contributor Author

PSeON commented Jul 8, 2016

@rwasef1830 This code (line 75) checks that both methods produce same results. Am I wrong?

@PSeON
Copy link
Copy Markdown
Contributor Author

PSeON commented Jul 8, 2016

@roji Checked query rendering.
NpgsqlTextFunctions.MatchRegex(b.Name, pattern) renders "Extent1"."Name" ~ (E'(?p)' || $1) = TRUE
and
!NpgsqlTextFunctions.MatchRegex(b.Name, pattern) renders "Extent1"."Name" ~ (E'(?p)' || $1) != TRUE
Of course it is possible to add proposed optimization, but I think it is out of scope of this PR.

@rwasef1830
Copy link
Copy Markdown
Contributor

rwasef1830 commented Jul 8, 2016

@PSeON You're right I missed that. Maybe the variable could be clarified, perhaps you could rename the pgResult and pgResultOpt to pgMatchResult and pgMatchWithOptionsResult so that they line up with your already existing netMatchResult and the intention is clearer.

(it is better to have the variables clearly named than to write clarifying comments).

If negating the existing operator and the negative operator behave exactly the same then no need to implement it.

@roji
Copy link
Copy Markdown
Member

roji commented Jul 8, 2016

Agree on the negation, the only impact is slightly better SQL readability which really isn't that important here.

@roji
Copy link
Copy Markdown
Member

roji commented Jul 8, 2016

@rwasef1830, whenever you're OK let me know and I'll merge.

@PSeON
Copy link
Copy Markdown
Contributor Author

PSeON commented Jul 8, 2016

Added proposed improvements.

[TestCase("^blog$", "blog", "some \nblog\n name", TestName = "MatchRegex ^ and $ match beginning and end")]
[TestCase("some .* name", "some blog name", "some \n name", TestName = "MatchRegex . matches all except \\n")]
[TestCase("some blog name", "some blog name", "someblogname", TestName = "MatchRegex whitespace not ignored in pattern")]
public void MatchRegex(string pattern, string matchingInput, string mismatchingInput)
Copy link
Copy Markdown
Contributor

@rwasef1830 rwasef1830 Jul 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove MatchRegex from TestName property. In the unit test runner the test case names will appear under the test's main method name in a tree structure so no need to mention it again in the value of the TestName property. (eg: Case-sensitive).

@rwasef1830
Copy link
Copy Markdown
Contributor

@PSeON With the minor final changes I proposed to improve the test-case readability, if @roji has no further comments, I think this is OK for merge.

@PSeON
Copy link
Copy Markdown
Contributor Author

PSeON commented Jul 8, 2016

Done.

@roji roji added the feature label Jul 8, 2016
@roji roji added this to the 3.2.0 milestone Jul 8, 2016
@roji roji merged commit 7c5a096 into npgsql:dev Jul 8, 2016
@roji
Copy link
Copy Markdown
Member

roji commented Jul 8, 2016

Thanks to both of you for this!

@PSeON
Copy link
Copy Markdown
Contributor Author

PSeON commented Jul 8, 2016

@roji, @rwasef1830 Thanks!

@danbopes
Copy link
Copy Markdown

Any chance we can get a new version pushed with these changes?

@roji
Copy link
Copy Markdown
Member

roji commented Sep 10, 2017

@danbopes sorry a new version hasn't been released in so long. @rwasef1830, what do you think? As I'm really unlikely to do any work on the EF6 provider anytime soon (have little time for Npgsql in general at the moment), can you take a look at other issues/PRs and tell me if you intend to do any work? If not I can do a release soon.

@rwasef1830
Copy link
Copy Markdown
Contributor

rwasef1830 commented Sep 10, 2017

@roji I've been waiting to get a chance to work on them, but didn't get any time I'm afraid :-( I'll take a quick look at the outstanding pull requests today.

@roji
Copy link
Copy Markdown
Member

roji commented Sep 10, 2017

@rwasef1830 so what do you think, should I release or do you want a bit more time?

@rwasef1830
Copy link
Copy Markdown
Contributor

rwasef1830 commented Sep 10, 2017

@roji these 2 pull requests seem almost ready, I'll do a final pass on them today: #75 and #74.

I think that a few days to merge both these and then confirm / investigate #46, so yeah I want a bit more time :-)

@roji
Copy link
Copy Markdown
Member

roji commented Sep 10, 2017

Great, I'll hold off then. Will probably have a bit more time for a release next Sunday, hopefully you can get everything done by then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants