New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
postgresql / sqlite / mysql /oracle regular expression operators #1390
Comments
Michael Trier (@empty) wrote: Except you'll have to specify something other than match_op since we're using that for the Full Text Searching. |
Anonymous wrote: Now I see Indeed. http://www.postgresql.org/docs/8.3/static/textsearch-features.html You're right. It works better by using fts anyway. Thanks. ''cherry on top out of initial topic''
|
Michael Bayer (@zzzeek) wrote: demo (using attached patch):
|
Changes by Michael Bayer (@zzzeek):
|
Changes by Michael Bayer (@zzzeek):
|
Changes by Michael Bayer (@zzzeek):
|
Changes by Michael Bayer (@zzzeek):
|
Changes by Michael Bayer (@zzzeek):
|
Changes by Michael Bayer (@zzzeek):
|
Michael Bayer (@zzzeek) wrote: nobody is looking for these much, keep pushing them... |
Changes by Michael Bayer (@zzzeek):
|
Michael Bayer (@zzzeek) wrote: the current gerrit for this is at https://gerrit.sqlalchemy.org/#/c/101/ . this is largely ready-ish. |
Michael Bayer (@zzzeek) wrote: 1.3 for the moment, the gerrit needs some work and this is not crtiical |
Changes by Michael Bayer (@zzzeek):
|
Changes by Michael Bayer (@zzzeek):
|
Michael Bayer (@zzzeek) wrote: still not critical |
Changes by Michael Bayer (@zzzeek):
|
note from #5447 this is also possible on other backends:
|
I'm happy to look into this. Can you explain how the Gerrit patch relates to this? |
I would say you can look at that patch to see if there's anything relevant to use. it proposes new operators for PostgreSQL's version of CHAR, VARCHAR, and TEXT, but if these operators are generally available on many backends, it would instead be made as part of the base String class in sqlalchemy/sql/sqltypes.py . as far as contributing code we use the normal pull request process. I merge code through gerrit but outside contributors don't need to worry about it, we synchronize to your pull request. |
also I would note we've written various kinds of background on helping with development at https://www.sqlalchemy.org/develop.html. This is certainly an important feature that we want to do, so if you arent able to work on it, no worries it will get done eventually. |
Great! Thanks for the info, I'll have a look. I'd certainly like for the regex operators to be available generally, but I'm worried that tiny differences in the regex implementations will make this tricky. For instance, I also would like regex flags (e.g. case insensitivity) to be implemented, but some backends, like SQLite, don't implement flags at all. How do you think I should handle this? |
I think they're sufficiently similar that we should try to unify them. For example, |
Since its really only those two/three settings that are commonly controlled by flags, we could just add some boolean arguments to the match function that abstract away the flags themselves, for example |
I don't think we've ever had a new feature with such a great roundtable discussion going on to make sure we get it right (and that I can other things done while it happens! :) ) thanks all !!! |
I'm 👎 on the boolean argument to specify the flags, since they are hard to change/scale. In case of regexp is not unusual to specify the flag as a string, so I would propose to just use a string that is rendered in the sql. |
In theory I would prefer that too, but as discussed above, the multiline behaviour is quite different between DBMS. If you called I can't think of a better way to provide abstraction without using boolean arguments. Python's |
I believe that the same argument could be made for Of the two I would prefer something like Also regarding flags, if That said, I'm not sure we need to make it backend independent, at leas as a first implementation. |
what if the regex operator just accepts the core arguments, and there is an ancillary operator/factory for customizing the operations via a dict or callable... so the experience is more inline with the custom-compiles? for example...
and
or
|
Not exactly, there is a common subset of all these regex implementations, which I think would cover 95% of cases. The same isn't true for flags, since the multiline flags are quite different Both of the |
since we have a lot of flags and things here, the interface might best be done in the style of "variant",a little tricky to get it right, but it looks like:
i haven't looked closely at the discussion here so don't fret if I'm missing the point entriely |
Is there an helper/mixin for the |
it's very specific to TypeEngine. the approach can potentially be emulated. for the moment we could move forward with this without doing all that, however. most people using regex are probably targeting just one database. |
As I've mentioned elsewhere, I think it's quite important that we do come up with a portable solution. The whole reason I've investigated this issue is actually because I wanted to do a regex match using the same codebase on Postgres and SQLite. If I wanted to only support Postgres I'd use the I don't mind the |
@zzzeek's @TMiguelT writing a regex that works across multiple database backends is really outside the scope of what SQLAlchmey does or promises to do. I usually use SQLite for unit-tests and Postgres for integrated tests and production – my projects are filled with custom |
OK arent the differences for the typical regexp between a database like PG and SQLite mostly going to be the case insensitive part? or are basic syntaxes within the regexp strings different between PG / SQLite for example? |
because we definitely can't be parsing and tokenizing regex strings, if the regex syntaxes are truly differnet and you wanted to make it so the user didn't have to know that, that's what that would entail, that's out of scope for SQLAlchemy (a third party extension could certainly do it however). |
Flags- The multiline and newline modes are slightly different (see my chart) Actual regex- TLDR; they don't really support the same syntax. they can get close, but that involves changing the pattern. it's possible there is a session option. |
I've used regular expressions for literally 25 years and i couldnt give you one difference between PCRE and POSIX, so I found this: https://gist.github.com/CMCDragonkai/6c933f4a7d713ef712145c5eb94a1816 |
Federico Caselli has proposed a fix for this issue in the master branch: Implement regexp operator https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/2092 |
Actually that gerrit is more the logic of with_variant |
In the end it's easier to implement the with_variant construct for each boolean expression that making it a regexp only thing |
Seems that maria db does not have a clear way of setting the flags that I could find, they are embedded in the pattern. https://mariadb.com/kb/en/pcre/#option-setting Also it does not seem to be the case with mysql, that has https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-like and different flags |
Federico Caselli has proposed a fix for this issue in the master branch: Implement regexp operator https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/2096 |
I've added a first implementation here https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/2096 to figure out:
|
I've update the changset https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/2096 to use Feedbacks welcome |
Migrated issue, originally created by Anonymous
match operator not implemented
in sqlalchemy/databases/postgres.py 727
just add
sql_operators.match_op: lambda x, y, escape=None: '%s ~ %s' % (x, y) + (escape and ' ESCAPE '%s'' % escape or ''),
and it works for me
CAVEAT sql might crash if improper regexp syntax entered.
(these matching functions are still used with same syntax)
http://www.postgresql.org/docs/7.4/interactive/functions-matching.html
Attachments: 1390.patch
Coping this recap from #5447 (comment)
The call
column.regex_match('[a-z]*')
would then evaluate to:column ~ "[a-z]*"
column REGEXP "[a-z]*"
column REGEXP "[a-z]*"
REGEXP_LIKE(column, "[a-z]*")
The text was updated successfully, but these errors were encountered: