Extract filter processors #20

zluiten · 2021-02-19T14:09:23Z

Currently it is not so easy to implement custom filters because the filter processing logic is embedded in the main PostgresDocumentStore class.

Idea: extract the logic of the (private) PostgresDocumentStore::filterToWhereClause function to a filter processor class which gets injected in the PostgresDocumentStore.

The text was updated successfully, but these errors were encountered:

codeliner · 2021-02-19T16:38:55Z

What custom filters do you have in mind?

I'm not against the change, as long as we keep BC: the filter processor should be an optional parameter and if not provided a default one should be initialized.
However, using a custom filter processor binds your code to the specific document store implementation. When filter processing cannot be modified, we can better control that different document store implementations behave the same. That's especially important for tests, where you can use the in-memory document store implementation instead of PostgresDocumentStore.

So maybe the better option is to add a new filter type and support it in the in-memory document store as well?

martin-schilling · 2021-02-19T21:33:51Z

I‘ve had this problem in the past too. In one project using Postgres LIKE/ILIKE was required and I ended up going around the DocumentStore by using PDO directly to implement this. Having a way of doing this through the DocumentStore, even if it isn‘t optimal, seems superior to me.

I generally agree with @codeliner that this is problematic regarding portability. The general goal is that by using the DocumentStore, the specific database is abstracted away and may be changed at any point. Different DBMS are, however, very different and offer lots of powerful features that could be useful for certain circumstances. Therefore, I think that offering an escape hatch to access these platform specific features is an improvement over going around the DocumentStore entirely. By using said escape hatch, the user also gets the responsibility of either keeping platform independence intact manually or explicitly discarding it.

Adding a new filter would be the superior option if possible but I think that it is not a general solution for this problem.

zluiten · 2021-02-22T13:57:57Z

I see why it implicates portability negatively and I agree that introducing a new filter type, the first thing to look at is whether it can be a generic filter type.

Reason for raising this issue was that I was looking for a full text search filter. Postgres has a @@ operator for example. It might not be easy to create such a generic filter type, but it would be very nice though.

codeliner · 2021-02-22T19:28:45Z

full text search 🤔
I always had MongoDB in mind as an alternative document store implementation. I think it also provides full text search, but maybe behavior is different from Postgres. I'm not an expert ...
@martin-schilling made some good points, too. So @Netiul if you want to provide a PR for your suggested changes, I'm happy to review and merge. Would be nice if you can check the in-memory document store implementation as well.

martin-schilling · 2021-02-22T20:09:32Z

I did think about it some more and I think that even supporting custom filters doesn't solve this problem entirely because there are likely still situations where there is no alternative to going around the DocumentStore entirely. But supporting these kinds of features shouldn't be a responsibility of the DocumentStore anyway I think. So I guess implementing the changes @Netiul suggested would provide some additional freedom without major drawbacks.

MongoDB requires a specific index for full text search if I recall correclty. That would be the first of many (probably) problems 😃

zluiten · 2021-02-23T09:48:53Z

I'm neither an expert on the topic of full text search and the support of it in the known document stores. It was just that I need it in some form and going on what I read about it, Postgresql happens to offer some decent capabilities. So I wanted to try something but quickly adding a filter wasn't really possible, hence the issue.

Like @martin-schilling points out for MongoDB, it's probably hard to properly support this in a way the full potential of the used document store can be utilized. In case of Postgres, you don't need an index to use it, but for speedy results you would need one.
There are several approaches possible. One would be to create an extra column of the tsvector data type which contains a vector of lexeme tokens of the fields to search.

Basically a TextSearch filter would need a column identifier (or something to match against) and a query string. How it would match is then up to the document store implementation.

To give a global idea of how I would expect this to work, here is some not tested pseude code using a stored generated column to keep its contents up to date with the source data:

$collectionName = 'collection_of_docs';
$docId = Uuid::uuid4()->toString();

$doc = [
    'title' => 'Text searching with Postgres',
    'body' => 'Full text searching with Postgres using tsvectors',
];

$index = new MetadataColumnIndex(
    new RawSqlIndexCmd("CREATE INDEX textsearch_idx ON $collectionName USING gin(textsearchable_index_col)"),
    new Column("textsearchable_index_col TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', coalesce(doc->>'title', ''), coalesce(doc->>'body', ''))) STORED")
);

$this->documentStore->addCollection($collectionName, $index);

$documentStore->addDoc(
    $collectionName,
    $docId,
    $doc,
);

$docs = $documentStore->filterDocs(
    $collectionName, 
    new TextSearch('metadata.textsearchable_index_col', 'text search')
);

The Postgres implementation would convert this then to the query

SELECT * FROM collection_of_docs WHERE textsearchable_index_col @@ to_tsquery('text search')

codeliner · 2021-02-23T10:23:52Z

@Netiul Let's use your original suggestion and add a way for users to add custom filters so that we don't need to worry about database specific behavior.

zluiten · 2021-02-24T10:23:33Z

Popping another question... what do you guys think about adding support for expression based (non prop based) order by clauses?

martin-schilling · 2021-02-24T20:53:16Z

What exactly do you have in mind? Supporting arbitrary expressions would lead to the same problem regarding DBMS abstraction. Supporting specific expressions like sorting by length (ORDER BY LENGTH(attribute)) would probably be possible to implement for any given DBMS but seems like a lot of work and complexity to me.

zluiten · 2021-02-25T10:14:11Z

Currently we can sort by existing props in the document or a metadata column. Most of the time that is perfectly fine because with projections we can even store derived properties (like the length of an attribute) in the document and just sort by that prop.
That is not possible when the sorting depends on user provided values, f.e. the relevance of a document based on a search term.

I would not propose an abstraction of every possible valid expression, that would be indeed way to complex. But perhaps a RawOrderBy class can be introduced allowing a raw OrderBy SQL string. (A bit like RawSqlIndexCmd in this library regarding indices.)

codeliner · 2022-04-03T16:54:35Z

Closed by #21

@Netiul is your RawOrderBy use case still valid? If so, you could create a separate issue for it and maybe also have an idea for a PR?

zluiten mentioned this issue Feb 23, 2021

Extract filter processing logic to a seperate class #21

Merged

codeliner closed this as completed Apr 3, 2022

zluiten mentioned this issue Apr 4, 2022

Extract order by processing logic #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract filter processors #20

Extract filter processors #20

zluiten commented Feb 19, 2021 •

edited

Loading

codeliner commented Feb 19, 2021 •

edited

Loading

martin-schilling commented Feb 19, 2021 •

edited

Loading

zluiten commented Feb 22, 2021 •

edited

Loading

codeliner commented Feb 22, 2021

martin-schilling commented Feb 22, 2021

zluiten commented Feb 23, 2021 •

edited

Loading

codeliner commented Feb 23, 2021

zluiten commented Feb 24, 2021 •

edited

Loading

martin-schilling commented Feb 24, 2021

zluiten commented Feb 25, 2021

codeliner commented Apr 3, 2022

Extract filter processors #20

Extract filter processors #20

Comments

zluiten commented Feb 19, 2021 • edited Loading

codeliner commented Feb 19, 2021 • edited Loading

martin-schilling commented Feb 19, 2021 • edited Loading

zluiten commented Feb 22, 2021 • edited Loading

codeliner commented Feb 22, 2021

martin-schilling commented Feb 22, 2021

zluiten commented Feb 23, 2021 • edited Loading

codeliner commented Feb 23, 2021

zluiten commented Feb 24, 2021 • edited Loading

martin-schilling commented Feb 24, 2021

zluiten commented Feb 25, 2021

codeliner commented Apr 3, 2022

zluiten commented Feb 19, 2021 •

edited

Loading

codeliner commented Feb 19, 2021 •

edited

Loading

martin-schilling commented Feb 19, 2021 •

edited

Loading

zluiten commented Feb 22, 2021 •

edited

Loading

zluiten commented Feb 23, 2021 •

edited

Loading

zluiten commented Feb 24, 2021 •

edited

Loading