Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract filter processors #20

Closed
zluiten opened this issue Feb 19, 2021 · 11 comments
Closed

Extract filter processors #20

zluiten opened this issue Feb 19, 2021 · 11 comments

Comments

@zluiten
Copy link
Contributor

zluiten commented Feb 19, 2021

Currently it is not so easy to implement custom filters because the filter processing logic is embedded in the main PostgresDocumentStore class.

Idea: extract the logic of the (private) PostgresDocumentStore::filterToWhereClause function to a filter processor class which gets injected in the PostgresDocumentStore.

@codeliner
Copy link
Contributor

codeliner commented Feb 19, 2021

What custom filters do you have in mind?

I'm not against the change, as long as we keep BC: the filter processor should be an optional parameter and if not provided a default one should be initialized.
However, using a custom filter processor binds your code to the specific document store implementation. When filter processing cannot be modified, we can better control that different document store implementations behave the same. That's especially important for tests, where you can use the in-memory document store implementation instead of PostgresDocumentStore.

So maybe the better option is to add a new filter type and support it in the in-memory document store as well?

@martin-schilling
Copy link
Contributor

martin-schilling commented Feb 19, 2021

I‘ve had this problem in the past too. In one project using Postgres LIKE/ILIKE was required and I ended up going around the DocumentStore by using PDO directly to implement this. Having a way of doing this through the DocumentStore, even if it isn‘t optimal, seems superior to me.

I generally agree with @codeliner that this is problematic regarding portability. The general goal is that by using the DocumentStore, the specific database is abstracted away and may be changed at any point. Different DBMS are, however, very different and offer lots of powerful features that could be useful for certain circumstances. Therefore, I think that offering an escape hatch to access these platform specific features is an improvement over going around the DocumentStore entirely. By using said escape hatch, the user also gets the responsibility of either keeping platform independence intact manually or explicitly discarding it.

Adding a new filter would be the superior option if possible but I think that it is not a general solution for this problem.

@zluiten
Copy link
Contributor Author

zluiten commented Feb 22, 2021

I see why it implicates portability negatively and I agree that introducing a new filter type, the first thing to look at is whether it can be a generic filter type.

Reason for raising this issue was that I was looking for a full text search filter. Postgres has a @@ operator for example. It might not be easy to create such a generic filter type, but it would be very nice though.

@codeliner
Copy link
Contributor

full text search 🤔
I always had MongoDB in mind as an alternative document store implementation. I think it also provides full text search, but maybe behavior is different from Postgres. I'm not an expert ...
@martin-schilling made some good points, too. So @Netiul if you want to provide a PR for your suggested changes, I'm happy to review and merge. Would be nice if you can check the in-memory document store implementation as well.

@martin-schilling
Copy link
Contributor

I did think about it some more and I think that even supporting custom filters doesn't solve this problem entirely because there are likely still situations where there is no alternative to going around the DocumentStore entirely. But supporting these kinds of features shouldn't be a responsibility of the DocumentStore anyway I think. So I guess implementing the changes @Netiul suggested would provide some additional freedom without major drawbacks.

MongoDB requires a specific index for full text search if I recall correclty. That would be the first of many (probably) problems 😃

@zluiten
Copy link
Contributor Author

zluiten commented Feb 23, 2021

I'm neither an expert on the topic of full text search and the support of it in the known document stores. It was just that I need it in some form and going on what I read about it, Postgresql happens to offer some decent capabilities. So I wanted to try something but quickly adding a filter wasn't really possible, hence the issue.

Like @martin-schilling points out for MongoDB, it's probably hard to properly support this in a way the full potential of the used document store can be utilized. In case of Postgres, you don't need an index to use it, but for speedy results you would need one.
There are several approaches possible. One would be to create an extra column of the tsvector data type which contains a vector of lexeme tokens of the fields to search.


Basically a TextSearch filter would need a column identifier (or something to match against) and a query string. How it would match is then up to the document store implementation.

To give a global idea of how I would expect this to work, here is some not tested pseude code using a stored generated column to keep its contents up to date with the source data:

$collectionName = 'collection_of_docs';
$docId = Uuid::uuid4()->toString();

$doc = [
    'title' => 'Text searching with Postgres',
    'body' => 'Full text searching with Postgres using tsvectors',
];

$index = new MetadataColumnIndex(
    new RawSqlIndexCmd("CREATE INDEX textsearch_idx ON $collectionName USING gin(textsearchable_index_col)"),
    new Column("textsearchable_index_col TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', coalesce(doc->>'title', ''), coalesce(doc->>'body', ''))) STORED")
);

$this->documentStore->addCollection($collectionName, $index);

$documentStore->addDoc(
    $collectionName,
    $docId,
    $doc,
);

$docs = $documentStore->filterDocs(
    $collectionName, 
    new TextSearch('metadata.textsearchable_index_col', 'text search')
);

The Postgres implementation would convert this then to the query

SELECT * FROM collection_of_docs WHERE textsearchable_index_col @@ to_tsquery('text search')

@codeliner
Copy link
Contributor

@Netiul Let's use your original suggestion and add a way for users to add custom filters so that we don't need to worry about database specific behavior.

@zluiten
Copy link
Contributor Author

zluiten commented Feb 24, 2021

Popping another question... what do you guys think about adding support for expression based (non prop based) order by clauses?

@martin-schilling
Copy link
Contributor

What exactly do you have in mind? Supporting arbitrary expressions would lead to the same problem regarding DBMS abstraction. Supporting specific expressions like sorting by length (ORDER BY LENGTH(attribute)) would probably be possible to implement for any given DBMS but seems like a lot of work and complexity to me.

@zluiten
Copy link
Contributor Author

zluiten commented Feb 25, 2021

Currently we can sort by existing props in the document or a metadata column. Most of the time that is perfectly fine because with projections we can even store derived properties (like the length of an attribute) in the document and just sort by that prop.
That is not possible when the sorting depends on user provided values, f.e. the relevance of a document based on a search term.

I would not propose an abstraction of every possible valid expression, that would be indeed way to complex. But perhaps a RawOrderBy class can be introduced allowing a raw OrderBy SQL string. (A bit like RawSqlIndexCmd in this library regarding indices.)

@codeliner
Copy link
Contributor

Closed by #21

@Netiul is your RawOrderBy use case still valid? If so, you could create a separate issue for it and maybe also have an idea for a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants