Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: get basic weight with CALL PQ() results #267

Open
barryhunter opened this issue Sep 26, 2019 · 1 comment

Comments

@barryhunter
Copy link
Contributor

@barryhunter barryhunter commented Sep 26, 2019

It would be nice if when run a percolate CALL PQ() query, to get a basic weight for each match.
Manticore is of course running each the queries in the percolate index, against the supplied document(s) - so could compute a weight.

This would be useful to 'rank' or sort the returned queries (being able to sort/limit the results in weight order might be nice, but not necessary. ) ... ultimately could have have the percolate contain many fuzzy queries (quorum, or with MAYBE) and so want to then only get the 'best' matching queries, not necessarily all of them.

Even a simple ‘word count’ weight would probably be good enough. (ie wouldnt be able to use full IDF or whatever ranking, because wouldn’t have term frequencies in the document corpus - as percolate query is just supplying the few document(s))

But also lccs (larged commons subsequence) would be useful too.

a bit convoluted example:

INSERT INTO pqp(id,query) VALUES(1, 'fresh apple');
INSERT INTO pqp(id,query) VALUES(2, 'orange tree');

CALL PQ('pqp','dead apple trees near fresh pear and orange trees', 0 AS docs_json, 1 AS query, 'sum(word_count+lccs)' as weight);
+---------+--------------+------+---------+--------+
| id      | query        | tags | filters | weight |
+---------+--------------+------+---------+--------+
|       1 | fresh apple  |      |         |      2 |
|       2 | orange tree  |      |         |      4 |
+---------+--------------+------+---------+--------+

In theory the 'orange tree' is a better match, because common phrase. 'fresh apple' does still match, but it not a common phrase.

if supplying multiple documents, guess weight could augment the 'documents' column

1[34], 2[16]

sort of thing. Weight per matching document.

@tomatolog

This comment has been minimized.

Copy link
Contributor

@tomatolog tomatolog commented Sep 27, 2019

I think it needs an option 'column_name direction' as order_by , like

CALL PQ ( ... , 'weight asc' as order_by );
CALL PQ ( ... , 'id desc' as order_by );

ranker mode, ranker expression or maybe select list expression that could be user for sorting could be implemented later at another ticket

As it is not quite clear should these (mode, expressions) sould be a part of stored query or document at CALL PQ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.