Add and and or to Query #287

PSU3D0 · 2024-05-18T01:20:55Z

Very new to Tantivy and Rust, so if I missed something apologies!

It would be excellent to combine queries using syntax like &, |.

I maintain a library, Docprompt, that uses Tantivy internally, and having these operators available would make my usage easier, as well as other developers that are trying to compose queries declarativly using python syntax.

I understand that are a lot of query types in Tantivy. If there are other areas/directions that need to be implemented before we can expose the python API to developers, let me know and I'm happy to implement. Cheers!

PS:

I think a natural followup would be a python-space QueryParser that allows Django/SqlAlchemy like ORM usage for constructing queries. To cite the unit tests:

Old:

query = index.parse_query("title:men AND body:winter", ["title", "body"])

New:

query = index.create_query(title="men") & index.create_query(body="winter")

This PR would be the very beginnings of a python API in that direction!

adamreichold · 2024-05-18T08:19:51Z

src/query.rs

@@ -65,6 +65,26 @@ impl Query {
        Ok(format!("Query({:?})", self.get()))
    }

+    pub(crate) fn __and__(&self, other: Query) -> Query {
+        let inner = tv::query::BooleanQuery::from(vec![
+            (tv::query::Occur::Must, self.inner.box_clone()),


Could we check if self or other already is a BooleanQuery of all-Must and apend ourselves to that instead of building trees of BooleanQuery?

Is there any reason why would you want this feature? Complicated syntaxes like (query1 | query2) & query3 can be very troublesome to implement. Its also common to nest queries in Lucene.

I fear for the reduced efficiency compared to having a flat list of subqueries. AFAIU, Tantivy will not optimize that query tree and its execution will directly mirror the structure of the input query.

If we need to distinguish the presence of a boolean request, I afraid we need to setup dedicated Query struct but it will lead to big change in the API.

I was rather thinking about downcasting here. I also think we can keep this internal as an optimization instead of changing the API, i.e. downcast to see if we already have a boolean query with matching modifiers and append a clause instead of nesting.

While I think the idea is fine, the optimization cannot be too deep due to the Occur implementation.

Currently we have 3 variants of Occur: Should, Must, MustNot. The optimization can be done when we have a tuple of (Occur, BooleanQuery) where the boolean query might contain one or more groups of Should, Must, MustNot.

Their mathematics goes like the symmetric matrix (Take the header as the index 0 Occur of the (Occur, BooleanQuery) tuple, while the rows Occur as the groups inside the BooleanQuery):

Should Must MustNot

Should Should (Must, [Should, ...]) (MustNot, [Should, ...])

Must (Should, [Must, ...]) Must MustNot

MustNot (Should, [MustNot, ...]) MustNot Must

This means that we can only flatten 1 layer those 5 types of (Occur, InnerOccurGroup), and for the flattening we need to extract the boolean query's inner occur group, am I right?

P.S. Why I remain (Must, [Should, ...]) and (MustNot, [Should, ...]) though they can be deduced by DeMorgan's rule is that when transforming them, the resultant query still have the same degree of nesting and there is no significant benefits so I just remain them as it is.

My comment was only aimed at keeping the status quo, i.e. chaining clauses using & and | should result in a single boolean query as if the boolean_query method is used instead of a unbalanced tree.

I also think we can keep this internal as an optimization instead of changing the API.

So do you mean you expect whenever __and__ and __or__ are used, the above optimization evaluation rule applies while the original Query.boolean_query remains the status quo?

So do you mean you expect whenever and and or are used, the above optimization evaluation rule applies while the original Query.boolean_query remains the status quo?

No, I just want a & b & c and boolean_query([(Must, a), (Must, b), (Must, c)]) to produce the same internal representation using a single boolean query instead of the first form using two boolean queries.

But I think the & and | are binary operations that they must evaluate to something first, no matter the order? Say for a & b & c then we must either evaluate to d & c where d = a & b or vice versa, and there must be nesting.

In case we optimistically flatten the queries with the & or | syntax, say a & b & c, while any of the queries a, b, c are boolean, the resultant query will be different from a boolean_query([(Must, a), (Must, b), (Must, c)]).

If we really want to adapt this syntax while producing the same internal representation, the Query.boolean_query method should also be flattened.

Add AND and OR support

3dd5108

adamreichold reviewed May 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add and and or to Query #287

Add and and or to Query #287

PSU3D0 commented May 18, 2024

adamreichold May 18, 2024

alex-au-922 May 20, 2024

adamreichold May 20, 2024

adamreichold May 20, 2024

alex-au-922 May 20, 2024 •

edited

Loading

adamreichold May 20, 2024

alex-au-922 May 20, 2024

adamreichold May 20, 2024

alex-au-922 May 20, 2024 •

edited

Loading

	`Should`	`Must`	`MustNot`
`Should`	`Should`	`(Must, [Should, ...])`	`(MustNot, [Should, ...])`
`Must`	`(Should, [Must, ...])`	`Must`	`MustNot`
`MustNot`	`(Should, [MustNot, ...])`	`MustNot`	`Must`

Add __and__ and __or__ to Query #287

Are you sure you want to change the base?

Add __and__ and __or__ to Query #287

Conversation

PSU3D0 commented May 18, 2024

adamreichold May 18, 2024

Choose a reason for hiding this comment

alex-au-922 May 20, 2024

Choose a reason for hiding this comment

adamreichold May 20, 2024

Choose a reason for hiding this comment

adamreichold May 20, 2024

Choose a reason for hiding this comment

alex-au-922 May 20, 2024 • edited Loading

Choose a reason for hiding this comment

adamreichold May 20, 2024

Choose a reason for hiding this comment

alex-au-922 May 20, 2024

Choose a reason for hiding this comment

adamreichold May 20, 2024

Choose a reason for hiding this comment

alex-au-922 May 20, 2024 • edited Loading

Choose a reason for hiding this comment

Add and and or to Query #287

Add and and or to Query #287

alex-au-922 May 20, 2024 •

edited

Loading

alex-au-922 May 20, 2024 •

edited

Loading