Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible BUG: HTTP (JSON) offset and limit affects facet results #652

Closed
milanmadar opened this issue Oct 20, 2021 · 6 comments
Closed

Possible BUG: HTTP (JSON) offset and limit affects facet results #652

milanmadar opened this issue Oct 20, 2021 · 6 comments
Labels

Comments

@milanmadar
Copy link

milanmadar commented Oct 20, 2021

Describe the bug

Using the HTTP (JSON) interface, the request limit and offset (doc) affects the 'aggregations' (facet) results.
SQL interface behaves as expected.

To Reproduce

Create and fill the index

CREATE TABLE la8_test (  
    title text indexed,  
    description text indexed,  
    str_val string,  
    int_val int,  
    bigint_val bigint,  
    float_val float,  
    bool_val bool,  
    time_val timestamp,  
    json_val json,  
    mva_int_val multi  
);  
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (1, 'A great title 1', 'A long description 1', 'tagone', 11, 10000001, 1001.5, 1, 1634549851, '{\"a\":1,\"b\":2}', (3,5,7,9))  
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (2, 'A great title 2', 'A long description 2', 'tagone', 12, 10000002, 1002.5, 1, 1634549850, '{\"a\":2,\"b\":3}', (4,6,8,10))  
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (3, 'A great title 3', 'A long description 3', 'tagone', 13, 10000003, 1003.5, 1, 1634549849, '{\"a\":3,\"b\":4}', (5,7,9,11))  
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (4, 'A great title 4', 'A long description 4', 'tagtwo', 14, 10000004, 1004.5, 1, 1634549848, '{\"a\":4,\"b\":5}', (6,8,10,12))  
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (5, 'A great title 5', 'A long description 5', 'tagtwo', 15, 10000005, 1005.5, 1, 1634549848, '{\"a\":5,\"b\":6}', (7,9,11,13))  
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (6, 'A great title 6', 'A long description 6', 'tagtwo', 16, 10000006, 1006.5, 1, 1634549847, '{\"a\":6,\"b\":7}', (8,10,12,14))  
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (7, 'A great title 7', 'A long description 7', 'tagtwo', 17, 10000007, 1007.5, 1, 1634549846, '{\"a\":7,\"b\":8}', (9,11,13,15))  
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (8, 'A great title 8', 'A long description 8', 'tagtwo', 18, 10000008, 1008.5, 1, 1634549845, '{\"a\":8,\"b\":9}', (10,12,14,16))  
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (9, 'A great title 9', 'A long description 9', 'tagtwo', 19, 10000009, 1009.5, 1, 1634549844, '{\"a\":9,\"b\":10}', (11,13,15,17))  
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (10, 'A great title 10', 'A long description 10', 'tagtwo', 20, 10000010, 1010.5, 1, 1634549843, '{\"a\":10,\"b\":11}', (12,14,16,18))  

HTTP

request with offset and limit:

{  
	"index": "la8_test",  
	"offset": 3,  
	"limit": 3,  
	"aggs": {  
		"str_val": {  
			"terms": {  
				"field": "str_val"  
			}  
		}  
	},  
	"query": {  
		"bool": {  
			"must": [{  
				"query_string": "great long description"  
			}]  
		}  
	}  
}  

response (hits are removed for clarity) aggregations.str_val.buckets is empty, but there should be 2 elements:

{  
	"took": 0,  
	"timed_out": false,  
	"hits": {  
		"total": 10,  
		"hits": [ ... removed from this post ... ]  
	},  
	"aggregations": {  
		"str_val": {  
			"buckets": []  
		}  
	}  
}  

SQL

The same query in SQL returns the expcted 2 rows in the facet

$ mysql -h 0 -P 9306  
mysql > SELECT * FROM la8_test WHERE MATCH('great long description') LIMIT 3,3 FACET str_val;  
 ------ --------- ------------ ------------ ---------- -------------------- ------------- --------- ----------------- -------------   
| id   | int_val | bigint_val | time_val   | bool_val | count_optimization | float_val   | str_val | json_val        | mva_int_val |  
 ------ --------- ------------ ------------ ---------- -------------------- ------------- --------- ----------------- -------------   
|    5 |      15 |   10000005 | 1634734135 |        1 |                  1 | 1005.500000 | tagtwo  | {"a":5,"b":6}   | 7,9,11,13   |  
|   10 |      20 |   10000010 | 1634734130 |        1 |                  1 | 1010.500000 | tagtwo  | {"a":10,"b":11} | 12,14,16,18 |  
|    9 |      19 |   10000009 | 1634734131 |        1 |                  1 | 1009.500000 | tagtwo  | {"a":9,"b":10}  | 11,13,15,17 |  
 ------ --------- ------------ ------------ ---------- -------------------- ------------- --------- ----------------- -------------   
3 rows in set (0,00 sec)  
  
 --------- ----------   
| str_val | count(*) |  
 --------- ----------   
| tagtwo  |        7 |  
| tagone  |        3 |  
 --------- ----------   
2 rows in set (0,00 sec)  

PHP

The official php api also produces the wrong results: when ->offset() and ->limit() are used, the ->facet('str_val') resultset is empty (but should have 2 elements).

You can examine the facet result as you change the ->offset() and ->limit() values.

$client = new \Manticoresearch\Client(['host'=>'127.0.0.1','port'=>9308]);  
$search = new \Manticoresearch\Search($client);  
  
$res = $search  
    ->setIndex('la8_test')  
    ->search('great long description')  
    ->offset(3)->limit(3) // change these to see the facet results change  
    ->facet('str_val')  
    ->get();  
  
print_r($res->getResponse()->getResponse()['hits']['hits']);  
print_r($res->getFacets()['str_val']['buckets']);  

Expected behavior

HTTP response aggregations.str_val.buckets should have 2 elements, just like the SQL version.

Describe the environment:

 - Manticore 4.0.2 af497f245@210921 release (columnar 1.11.2 69d3780@210921)  
 - Kubuntu 20: Linux PCNAME 5.11.0-37-generic #41-Ubuntu SMP Mon Sep 20 16:39:20 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux  
@jlawrence-yellostudio
Copy link

I'm seeing the same thing on 3.5.2 so it doesn't appear to be a new issue

@jlawrence-yellostudio
Copy link

jlawrence-yellostudio commented Jul 27, 2022

@sanikolaev Do you know if there is any movement on this?

Not many of our projects have search pagination (with an emphasis on relevance) but in the rare circumstances we do I've had to write an SQL only version because of this facet issue and it would be nice to know if i should ditch the HTTP JSON API entirely.

@sanikolaev
Copy link
Collaborator

Hi @jlawrence-yellostudio I'll check with the team on how difficult it's to change the behaviour and what we can do in general.

@tomatolog
Copy link
Contributor

tomatolog commented Jul 28, 2022

seems like a bug as each aggs query copy offset, limit from the main query when rewrite limit from the size property of the aggs part that is wrong and aggs should not copy offset from the main

@sanikolaev sanikolaev added the bug label Jul 28, 2022
@RevZer0
Copy link

RevZer0 commented Aug 18, 2022

I have the same issue with pagination using HTTP API.
Also this bug is not related the PHP lib, or Python lib (which one Im using).
I made cURL requests directly to the Manticore API and can confirm that offset affects aggregation buckets

@sanikolaev
Copy link
Collaborator

Fixed in 60401eb

Note though that when size is not set for a facet it gets inherited from the main query's limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants