Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String comparison functions #1857

Closed
etcd opened this issue Feb 19, 2024 · 8 comments
Closed

String comparison functions #1857

etcd opened this issue Feb 19, 2024 · 8 comments
Assignees
Labels

Comments

@etcd
Copy link
Contributor

etcd commented Feb 19, 2024

Is your feature request related to a problem? Please describe.
String comparison functions do not exist in Manticore (other than equality). It is not possible to perform the following query:

CREATE TABLE mytable(mytitle string);
SELECT * FROM `mytable` WHERE `mytitle` < 'foo';

Describe the solution you'd like
The implementation of the following string comparison operators: <, <=, >, >=.

Additional context
This functionality is essential to implement relatively mainstream functionalities, such as cursor-based pagination on string columns.

@sanikolaev
Copy link
Collaborator

Thanks for the feature request, @etcd

@tomatolog As discussed, please prepare instructions for @etcd detailing the areas that require modifications, as they would like to implement these changes themselves.

@tomatolog
Copy link
Contributor

In general you need create a new filter instance at sphCreateFilter similar to FilterString_c but one that can compare strings with lesser and greater operators.

You also should add new members to CommonFilterSettings_t for parser to fill. The parser located at bison and flex files I'd start filter_item

And after all works well locally you need to add send and receive of the new filter structure into distributed index SearchRequestBuilder_c::SendQuery and SearchReplyParser_c::ParseReply

Feel free to ask for any help

@etcd
Copy link
Contributor Author

etcd commented Feb 28, 2024

Hey @tomatolog @sanikolaev, I've spent a few hours familiarizing myself with the codebase. I looked into modeling these string filters after Filter_Range or Filter_FloatRange (both in sphinxfilter.cpp). However there are existing architectural choices that make this route of implementation complicated.

For example, in several places, ranges are assumed to be numerical (either ints or floats - so adding strings to the concept of "ranges" would be challenging). This is a non-exhaustive list:

So implementing these string filters the same way seems like a somewhat error-prone route, given the number of places that might make this assumption. SPH_FILTER_RANGE appears 81 times in 22 files, and SPH_FILTER_FLOATRANGE appears 54 times in 19 files.

Not to mention there are more subtle cases where ranges are assumed to be numerical, like in EvalRange (sphinxfilter.h), where numerical comparison operators are used - but if we want to compare strings, we need to genericize the comparators so we can pass in strcmp.

There might be simpler ways of adding these string filters that don't involve expanding the existing "range" abstraction and having to ensure that all semantics continue to hold. I'll defer to your judgements.

@tomatolog
Copy link
Contributor

I'd add a new FILTER enum similar to SPH_FILTER_STRING \ SPH_FILTER_STRING_LIST these do not match to SPH_FILTER_VALUES and make separate code that handles only this case - no need to add string into existed SPH_FILTER_RANGE. It could be better to create complete new code without touching most of existed code.

@etcd
Copy link
Contributor Author

etcd commented Feb 29, 2024

I'd find it helpful if you could also outline which files are deprecated/unused and should not be touched (if there's an easy way to do that).

For example, is mysqlse/ha_sphinx.cc just dead code? It contains a enum ESphFilter, but it seems to me that the enum of the same name in src/sphinxdefs.h is the more complete enum that's actually used.

Also, how about api/ruby/? According to component-licenses/README.md, the Ruby client is old. It is most likely very outdated. Am I supposed to update it? It is supposed to track known filter types, but if this client is unmaintained, then it's probably not worth updating.

In general, it might be a good idea to go through an anti-entropy effort to remove dead code and make development easier (esp for outside contributors).

@etcd
Copy link
Contributor Author

etcd commented Mar 1, 2024

Hi @tomatolog , please check out the draft PR. I've made updates to all places mentioned in your outline. I've left it as a draft PR because I think there may be additional changes necessary that I would like to double check first.

@tomatolog
Copy link
Contributor

you do not need to implement any API or the mysqlse as it is another kind of the API as we already has features work only via SphinxQL or HTTP JSON

you need just to implement code:

  • in the daemon for the local index
  • in the daemon for distributed index (pass query into remote agent from master) and increase protocol version

It also worth to add test case as a new test/test_4XX for testing with the uberstest suite

@tomatolog
Copy link
Contributor

implemented at PR#1900

@sanikolaev sanikolaev added the rel::upcoming Upcoming release label Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants