Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AND operator for joining terms #264

Closed
mike1808 opened this issue May 14, 2017 · 20 comments
Closed

Add AND operator for joining terms #264

mike1808 opened this issue May 14, 2017 · 20 comments

Comments

@mike1808
Copy link
Contributor

mike1808 commented May 14, 2017

As was mentioned in #261 lunr.js currently doesn't support joining search terms with AND. It can has interface similar to what Elasticsearch has. You specify what is default operator (OR or AND) and the terms are joined with the specified operator. Otherwise it can look similar to this:

index.query(function(q) {
    q.terms('AND')
       .term('hello', fields: ['field1'], boost: 10)
       .term('world', fields: ['field2'], boost: 1);

    q.terms('OR')
       .term('term1')
       .term('term2');
});

Or with .search() it can look like this:

index.search('term1 term2', { operator: 'AND'} )

Or more sophisticated with new modified query syntax:

index.search('term1 AND term2 OR term3')

What do you think?

@meltuhamy
Copy link

meltuhamy commented May 31, 2017

I'm also looking for similar functionality.. I like the q.terms approach. @olivernn what are your thoughts?

@danjarvis
Copy link

I was able to achieve a hacky AND implementation by inspecting the metadata of the results before returning them back to the user:

function search(query) {
    var terms = query.split(' ');
    return index.search(query).filter(function(result) {
        return Object.keys(result.matchData.metadata).length == terms.length;
     });
}

search('hello world');

This is a trimmed down version of the actual implementation I am using. You can always take into account whether or not one of terms is contained within the invertedIndex and/or verify that the matched metadata terms are the same as the original search terms (i.e. were not matched from stemming).

@schmaluk
Copy link

Would be really great to get this feature. Tried out elasticlunr. But this has no wildcards.

@nikolas
Copy link

nikolas commented Jul 28, 2017

Replying to @olivernn's comment here: #261 (comment)

AND is not implemented yet, though its definitely on something I want to add. The implementation isn't so much the problem, rather the query interface, if you have any suggestions on what the API would look like I'd be interested to hear them.

I can imagine the query interface working something like this:

var results = index.query(
    function(q) {
        q.term(mainTerm);
        searchParams.forEach(function(param) {
            var k = param[0];
            var v = param[1];
            q.term(v, { fields: [k] });
        });
    }, {
	bool: 'AND'
    });

This is similar to how elasticlunr.js defines boolean behavior between fields: http://elasticlunr.com/example/index.html

So, now I'm digging into the index#query() method to see how this can be implemented.

@drzraf
Copy link

drzraf commented Oct 14, 2017

See also the API for search-index:

q.query = [                   // Each array element is an OR condition
  {
    AND: {             
      'title': ['reagan'],    // 'reagan' AND 'ussr'   
      'body':  ['ussr']
    },
    NOT: {
      'body':  ['usa']        // but NOT 'usa' in the body field
    }
  },
  {                           // OR this condition
    AND: {                  
      'title': ['gorbachev'], // 'gorbachev' AND 'ussr'
      'body':  ['ussr']
    },
    NOT: {
      'body':  ['usa']        // NOT 'usa' in the body field
    }
  }
}

Another examples from the above documentation:

query: {
  AND: {
    'description': ['swiss', 'watch'],
    'price': [{
       gte: '1000',
       lte: '8'
    }]
  }
}

@myalgo
Copy link

myalgo commented Oct 18, 2017

@olivernn is boolean search part of the roadmap?

@olivernn
Copy link
Owner

This is still something that I want to add, and will be the next large feature that I work on in Lunr, probably landing in 2.2, at some point.

My current thinking is to implement this at a slightly lower level to begin with, and rather than implement AND specifically, provide some lower level primitives that can be used to achieve the same thing:

I want to add a single property to a search term, lets call it "presence" for now. The default value of this property is optional, which is effectively what all terms have now, I will then add two other values, required and prohibited. Setting a terms presence attribute to required would mean that any documents returned must have this term, similarly setting it to prohibited would mean the term must not exist in a document.

Fitting this into the current API is simpler, e.g. for the search string I think it would be a prefix, similar to Lucene, e.g:

+foo bar -baz - "foo" is required, "bar" is optional, "baz" is prohibited. As for searches constructed with the query method, I think it is just a option with three values, e.g.

idx.query(function (q) {
  q.term("foo", { presence: lunr.Query.presence.REQUIRED })
  q.term("bar", { presence: lunr.Query.presence.OPTIONAL })
  q.term("baz", { presence: lunr.Query.presence.PROHIBITED })
})

I think taking this approach means that we can punt on grouping of query terms for now, without restricting their implementation in the future.

There is some interesting discussion on the Lucene wiki about some of confusion that can arise when using boolean operators - https://wiki.apache.org/lucene-java/BooleanQuerySyntax#Changing_Your_Mindset

Thoughts?

@drzraf
Copy link

drzraf commented Oct 30, 2017

The workaround of @danjarvis is interesting, but as soon as one of the terms appears in multiple fields it "artificially" increases matchData.metadata.length.

I fear there are no other workaround than issuing one query per term and then intersect the various result sets and finally average the scores.

@olivernn
Copy link
Owner

I have a working implementation of term presence queries, I haven't implemented the search string parser yet but it is supported in the programatic query interface. I'll try and put up a PR with the changes in the next couple of days if anyone is interested in doing some testing.

@drzraf
Copy link

drzraf commented Nov 22, 2017

For what is worth, here is my AND search workaround:

    _searchAnd(terms) {
	var result_per_term = [];
	terms.forEach((t) => {
	    result_per_term.push(lunr.query((q) => {
		if (t.indexOf(':') > 0) {
		    var [key, val] = t.split(':');
		    q.term(val, { boost: 100, fields: ["fields" /*key*/] });
		}
		else if (t.indexOf('=') == 0) {
		    q.term(t.replace(/^=/, ''), { boost: 100, wildcard: 0 });
		}
		else if (stopwords.has(t)) {
		    return;
		}
		else {
			q.term(t, { boost: 100 });
			q.term(t, { boost: 10, usePipeline: true, wildcard: lunr.Query.wildcard.LEADING | lunr.Query.wildcard.TRAILING });
			// q.term(t, { boost: 1, usePipeline: false, editDistance: 1 });
		}
	    }));
	});
	// map doc id
	var ids_per_term = result_per_term.map((e) => { return e.map(f => f.ref); });
	// keep trace of terms not found
	this.terms_not_found.splice(0);
	for (const k in ids_per_term) if (ids_per_term[k].length == 0) this.terms_not_found.push(terms[k]);
	// if a term is not found don't account for it (ignored from search query)
	ids_per_term = ids_per_term.filter(n => n.length > 0);
	var common_ids = new Set( _.intersection(...ids_per_term) );
	var last_search = new Map();
	for (const a of result_per_term) {
	    for (const result of a) {
		if (! common_ids.has(result.ref)) continue;
		if (last_search.has(result.ref)) {
		    var res = last_search.get(result.ref);
		    Object.assign(res.matchData.metadata, result.matchData.metadata);
		    res.score += result.score;
		    last_search.set(result.ref, res);
		} else {
		    last_search.set(result.ref, result);
		}
	    }
	}
	return Array.from(last_search.values());
    }

@pmccloghrylaing
Copy link

pmccloghrylaing commented Dec 18, 2017

The +|-, REQUIRED|OPTIONAL|PROHIBITED format looks good to me. It's a better format for querying than AND/OR:

  • 'A AND B' = '+A +B'
  • 'A OR B' = 'A B'

Seems like this would work with brackets for grouping as well:

  • '+(subquery) ...' all results MUST match subquery
  • '-(subquery) ...' all results MUST NOT match subquery
  • '(subquery) ...' results includes any matches for subquery that also match all REQUIRED or PROHIBITED terms / sub-queries.

I also like the suggestion in #310 for ONLYONE but I'm curious how that would work - it seems like this could be done by supplying a cap for the number of matches?

@hafffe
Copy link

hafffe commented Jan 23, 2018

Any updates on this? would love to see this feature!

@olivernn
Copy link
Owner

Sorry for the lack of updates on this (and other) issues, my son was born at the end of last year so I haven't had a lot of free time recently 😅

I did push a branch with the changes to support this, its all there apart from the additions to the search string parser. So you can programatically construct queries with lunr.Index#query to test it out.

I'll put together a WIP PR with a bit more detail to help people try out what is there currently, getting some feedback would be good. Thanks for the patience on this feature too!

@olivernn
Copy link
Owner

olivernn commented Mar 5, 2018

I've just opened a PR with the changes to support term presence queries. There is an alpha release available on npm too (lunr@2.2.0-alpha.1). Please take a look and let me know any feedback or comments you have. If all goes well 2.2.0 stable will be released within the week.

@olivernn
Copy link
Owner

2.2.0 is now released, try it out, let me know if there are any issues! I'll update the guides shortly with some more examples including term presence.

@tuanluu-agilityio
Copy link

Term presence is great, but do we have any approach for this case: ('A AND B') OR ('A AND C') ?

@unknown2019
Copy link

Term presence is great, but do we have any approach for this case: ('A AND B') OR ('A AND C') ?

@olivernn, Is it possible to use '(A AND B) OR C' search case with current version?

@luisenaguero
Copy link

Term presence is great, but do we have any approach for this case: ('A AND B') OR ('A AND C') ?

@olivernn, Is it possible to use '(A AND B) OR C' search case with current version?

Has an approach like this been added ?

@jasdeep-compro
Copy link

jasdeep-compro commented May 19, 2021

Term presence is great, but do we have any approach for this case: ('A AND B') OR ('A AND C') ?

@olivernn, Is it possible to use '(A AND B) OR C' search case with current version?

Any workaround available for this ?

@emmaindal
Copy link

Term presence is great, but do we have any approach for this case: ('A AND B') OR ('A AND C') ?

@olivernn, Is it possible to use '(A AND B) OR C' search case with current version?

Any workaround available for this ?

Also interested if there are any workarounds for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests