Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the disjunction of multiple values to match a single search parameter (i.e. name is joe or john) #292

Closed
Tracked by #524
jingtang10 opened this issue Mar 9, 2021 · 11 comments · Fixed by #827
Assignees
Labels
P1 High priority issue type:enhancement New feature or request type:research Research or prototype something new

Comments

@jingtang10
Copy link
Collaborator

#281 (comment)

https://www.hl7.org/fhir/search.html#composite

@jingtang10 jingtang10 added this to To do in FHIR engine library via automation Mar 9, 2021
@epicadk
Copy link
Contributor

epicadk commented Mar 21, 2021

I'd like to work on this.

@jingtang10 jingtang10 added type:enhancement New feature or request type:research Research or prototype something new labels Mar 24, 2021
@rkodev
Copy link
Contributor

rkodev commented Apr 13, 2021

@jingtang10 My thoughts

Referencing the FHIR Spec

The composite search parameter needs are able to generate AND and OR parameters in any possible number of combinations essentially an NxM matrix of clauses

/Patient?language=FR,NL can either be represented as

A) union on 2 distinct queries i.e

SELECT resourceId FROM StringIndexEntity WHERE resourceType = ? AND index_name = ? AND index_value = 'FR' COLLATE NOCASE UNION SELECT resourceId FROM StringIndexEntity WHERE resourceType = ? AND index_name = ? AND index_value = 'NL' COLLATE NOCASE

or
B) generating an or statement

SELECT resourceId FROM StringIndexEntity WHERE ( resourceType = ? AND index_name = ? AND index_value = 'FR' COLLATE NOCASE) OR ( resourceType = ? AND index_name = ? AND index_value = 'NL' COLLATE NOCASE)

Approach A
Combining multiple search queries using the union operator. This allows for more flexibility in writing complex nested search queries
The downside is the performance impact. PS union ALL allows for duplicates but gives a marginal performance improvement

Approach B
Concatenate the different conditions required for a search as part of the where clause and use a single optimized search on the index.
Downside: Having searches on different tables i.e StringIndex and ReferenceIndex e.t.c will result in

The proposal is to generate search queries by generating a tree holding multiple conditions
To represent a complex search that has N conditions with each conditions having the option to be a collection of nested conditions, we can represent the data structure as a SearchClause Tree

interface SearchClause {
      fun generateQuery();
      fun and(clause: SearchClause);
      fun or(clause: SearchClause);
}

// Tree of Conditions
public class Tree<T> {
    private Node<T> root;

    public Tree(T rootData) {
        root = new Node<T>();
        root.data = rootData;
        root.children = new ArrayList<Node<T>>();
    }

    public static class Node<T> {
        private T data;
        private Node<T> parent;
        private List<Node<T>> children;
    }
}

Each of the nodes represents a pair of brackets ( ) that essentially generates the search as a combination of the children as we generate the query using PostOrder Traversal.
AND and OR parameters can be generated using UNION and INTERSECT in any possible combinations

The benefit of having a tree holding interfaces is that we can generate queries that take advantage of different Index tables using approach A above albeit taking the performance hit

NB
This might force a number of changes to the search API, i.e at least introducing helper methods to compose the SearchClauses from different parameter types

@rkodev
Copy link
Contributor

rkodev commented Apr 13, 2021

We can generate this statement from the Tree of Conditions

resourceId IN (
  (
    SELECT resourceId FROM StringIndexEntity
    WHERE resourceType = ? AND index_name = ? AND index_value = ? COLLATE NOCASE
    UNION
    (
      SELECT resourceId FROM StringIndexEntity
      WHERE resourceType = ? AND index_name = ? AND index_value = ? COLLATE NOCASE
      INTERSECT 
      SELECT resourceId FROM ReferenceIndexEntity
      WHERE resourceType = ? AND index_name = ? AND index_value = ?
    )
  )
  INTERSECT
  (
    SELECT resourceId FROM TokenIndexEntity
    WHERE resourceType = ? AND index_name = ? AND index_value = ? COLLATE NOCASE
  )
)

@jingtang10
Copy link
Collaborator Author

jingtang10 commented Apr 14, 2021

Thanks @rkodev for the udpate. As you rightly pointed out, approach B is not going to work for filters on different index tables. So I think we should focus on approach A.

However, I'm not entirely convinced we need to design something totally generic which allows many levels of filters combined by and or or. In fact, I think we can maintain expressiveness by restricting the form to be an conjunction of disjunctions (or in fact disjunction of conjuctions). In other words, the form:

A and (B or (C and (D or E))) can always be rewritten in the form (.. or ..) and (.. or ..) and ... or (.. and ..) or (.. and ..) or ...

In other words we only need to introduce another level.

EDIT: https://en.wikipedia.org/wiki/Conjunctive_normal_form is what I was alluding to. In particular: Every propositional formula can be converted into an equivalent formula that is in CNF. We're not dealing with propositionoal formula. But the principle is the same. For example, language=A and (B or (C and D)) can just be rewritten to language= (A and B) or (A and C and D). So we only need to allow one level or "OR" on top level.

@jingtang10 jingtang10 changed the title Handle composite search parameters in the indexing code Handle "OR" in search filters Apr 14, 2021
@Tarun-Bhardwaj
Copy link

@rkodev @ekigamba , while the PR #438 for this issue has been closed, is there a reason why the issue is still open?

@Tarun-Bhardwaj Tarun-Bhardwaj added this to the Q3 2021 milestone Jul 12, 2021
@ekigamba
Copy link
Contributor

@Tarun-Bhardwaj reviewing this

@ekigamba
Copy link
Contributor

ekigamba commented Jul 27, 2021

@jingtang10 @epicadk https://www.hl7.org/fhir/search.html#composite and https://hapifhir.io/hapi-fhir/docs/server_plain/rest_operations_search.html#multi-valued-andor-parameters clearly states that the FHIR spec only supports AND combining multiple OR relationships and not the other way around eg. ("name" = ("joe" or "john")) AND ("age" = (11 or 12)) is supported and not ("language" = ("en" AND "fr") OR ("address" = ("Canada" AND "Quebec")).

The OR represented with , separates multiple values for a single parameter to symbolise an OR relationship. For example:

  • /Patient?language=FR,NL
  • /Patient?language=FR,NL&language=EN

Therefore, we only need to support OR for a single parameter.

Proposed solution

  1. Create an abstract class BaseFilter that will provide an extension function or and the property values that is a mutable list of T
  2. Inherit BaseFilter for all search filter data classes
  3. Convert the single value in all Search filters to a mutable list of values. This will hold the multiple values in the OR relationship
  4. Implement the or extension function for BaseFilter
  5. Update all <T>Filter.query functions to box the possible values and join with an OR in case the values are more than 1

ekigamba added a commit that referenced this issue Jul 28, 2021
See #292
- Change all Filters to allow list of possible values
- Add an or(Type) extension for all Filters
- Update all Filter.query functions to handle the multiple values
ekigamba added a commit that referenced this issue Jul 28, 2021
See #292
- Change all Filters to allow list of possible values
- Add an or(Type) extension for all Filters
- Update all Filter.query functions to handle the multiple values
ekigamba added a commit that referenced this issue Jul 28, 2021
See #292
- Change all Filters to allow list of possible values
- Add an or(Type) extension for all Filters
- Update all Filter.query functions to handle the multiple values
@jingtang10
Copy link
Collaborator Author

Discussed with @ekigamba @f-odhiambo @aditya-07 and @fredhersch yesterday. We identified 2 different use cases

  1. while searching using a search parameter (i.e. given name), provide multiple possible matches that are joined together as a disjunction/OR. This is the example @ekigamba provided ("name" = ("joe" or "john")) AND ("age" = (11 or 12)) and this is what the FHIR RESTful APIs already support (not that we must follow the RESTful spec in our offline search API)
  2. having multiple search parameters in a disjunction (OR) rather than conjunction (AND). At the moment when you list multiple search parameters in the search API, we add them on top of each other. In other words we apply them as filters and each filter further narrows down the resources. @ekigamba gave the example - we might want to search for patients whose name is XXX OR whose patient id is XXX. This is useful when in the UI we use 1 search box that searches for patients using multiple fields. In our discussion, we identified a way to get around this which is simply issue multiple search queries. However, I think in more complex cases the need for the search API to handle this properly is going to be more important (so the onus of deduplicating is not on the developer).

OK, so in this issue - let's focus on number 1. I will create another issue for number 2.

@jingtang10 jingtang10 changed the title Handle "OR" in search filters Allow the disjunction of multiple values to match a single search parameter (i.e. name is joe or john) Aug 17, 2021
@jingtang10
Copy link
Collaborator Author

This is also related to #609. We will need this for searching for observations with different codes.

@Tarun-Bhardwaj
Copy link

@ekigamba , @jingtang10 , assigning this issue to @aditya-07 .

@ekigamba
Copy link
Contributor

Cool, @aditya-07 see if the POC here is sound and usable #678

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment