Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: treating untagged words or phrases as "full text search" across multiple (or all) fields #39

Open
seandavi opened this issue Mar 3, 2019 · 6 comments

Comments

@seandavi
Copy link

seandavi commented Mar 3, 2019

Luqum is working great for me and my test users, but one thing that the test users miss is the behavior of query_string to do a full-text search across all fields when no field is specified (eg., "London") . I see the ability to specify a default fields, but this results in a simple match query. I guess I am looking to convert these to multi-match with all available text fields? Any suggestions?

@alexgarel
Copy link
Member

Hi @seandavi

You're right this is not a supported scenario, but it is an interesting one.

Two solutions:

  • you modify your luqum search tree (using a TreeTransformer) before giving it to the elasticsearch query builder, to multiply the SearchField node and use a OR to join them.
  • you take courage and you modify the query builder so that if you pass a list as default_field, it builds a multi-match query ! In which case a pull request is welcome :-)

If you help in some way, just ask !

@seandavi
Copy link
Author

seandavi commented Mar 4, 2019

For the time being, I'm going the cheap route and specifying _all as the default field for the match query for now. Users seem happy with the basic query_string behavior which appears to pretty much use the _all approach.

If I have a little time, I may play with the multi-match approach. If I get into trouble, I'll let you know.

As usual, thanks for taking the time to answer and clarify.

@seandavi seandavi closed this as completed Mar 4, 2019
@seandavi
Copy link
Author

seandavi commented Nov 9, 2019

I know it has been a while on this one. I noticed a per-field version of multi_match was recently implemented. I'd like to revisit the idea of multi_match on a set of default fields for bare words. I like your idea of converting to multi_match when default_field is a list. Could you give me some hints on where to focus if I want to implement? No urgency, but I thought I would ask.

@seandavi seandavi reopened this Nov 9, 2019
@seandavi seandavi changed the title Question: treating untagged words or phrases as "full text search" across all fields Question: treating untagged words or phrases as "full text search" across multiple (or all) fields Nov 9, 2019
@seandavi
Copy link
Author

seandavi commented Nov 9, 2019

Just leaving a note here that to do this right would involve bare Word() and Phrase(), the latter requiring a different multi_match type.

@seandavi
Copy link
Author

seandavi commented Nov 9, 2019

After a little playing with luqum.utils.LuceneTreeTransformer, this seems to do what I need. Note that multi_match is roughly translated to a bunch of OR queries across single-field match. The same is true of multi_match with phrases, except that match_phrase

class BareTextTransformer(luqum.utils.LuceneTreeTransformer):
    """Convert bare Words or Phrases to full text search

    In cases where a query string has bare text (no field
    association), we want to construct a DSL query that includes
    all fields in an OR configuration to perform the full
    text search against all fields. 
    This class can walk the tree and convert bare Word 
    nodes into the required set of SearchField objects. Note 
    that this is entirely equivalent to `multi_match` in terms
    of performance, etc. 
    """
    def __init__(self, fields=['title','abstract']):
        """Create a new BareTextTransformer
        Parameters
        ----------
        fields: list of str
            This is the list of fields that will used to 
            create the composite SearchField objects that
            will be OR'ed together to simulate full text
            search.
        
        Returns
        -------
        None. The tree is modified in place.
        """
        super()
        self.fields = fields
    
    def visit_word(self, node, parent):
        if(len(parent)>0 and (
                isinstance(parent[-1], luqum.tree.SearchField) or
                isinstance(parent[-1], luqum.tree.Range))):
            return node
        else:
            search_list = [SearchField(f, node) for f in self.fields]
            return Group(OrOperation(*search_list))

    def visit_phrase(self, node, parent):
        if(len(parent)>0 and (
                isinstance(parent[-1], luqum.tree.SearchField) or
                isinstance(parent[-1], luqum.tree.Range))):
            return node
        else:
            search_list = [SearchField(f, node) for f in self.fields]
            return Group(OrOperation(*search_list))

And, to use:

  tree = parser.parse(q)
  transformer = BareTextTransformer()
# tree below now has expanded Group(OrOperations....) for each
# field in the BareTextTransformer `fields`
  tree = transformer.visit(tree)

@thpica
Copy link

thpica commented Jul 13, 2021

Using a multi_match for the * field seems to work for me.

es_query_builder = ElasticsearchQueryBuilder(
    **schema_analyzer.query_builder_options(),
    field_options={"*": {"match_type": "multi_match"}},
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants