Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant JSON overhead compared to SQL #1794

Closed
sanikolaev opened this issue Feb 5, 2024 · 7 comments
Closed

Significant JSON overhead compared to SQL #1794

sanikolaev opened this issue Feb 5, 2024 · 7 comments
Assignees
Labels
bug rel::upcoming Upcoming release

Comments

@sanikolaev
Copy link
Collaborator

sanikolaev commented Feb 5, 2024

In some case the same query which takes about 80ms via SQL can take 250ms via JSON.

MRE (dev2):

wget https://raw.githubusercontent.com/zeraye/names-surnames-list/master/male-names-list.txt -O names.txt
wget https://raw.githubusercontent.com/zeraye/names-surnames-list/master/female-names-list.txt -O fnames.txt
cat fnames.txt >> names.txt
wget https://raw.githubusercontent.com/zeraye/names-surnames-list/master/surnames-list.txt -O surnames.txt
wget https://gist.githubusercontent.com/sanikolaev/5f4cea361b6869ee9ca7eb815be4fb76/raw/15655adc2dcb3a0c7680d07127fa50a3528a8d73/load_us_names.php -O load_us_names.php
php load_us_names.php 10000 10 10000000 1

snikolaev@dev2:~$ time curl -sX POST http://localhost:9308/search -d '{"index" : "name", "query" : {"match" : {"_all": "on"} }, "limit": 0}'
{"took":245,"timed_out":false,"hits":{"total":1719457,"total_relation":"eq","hits":[]}}
real	0m0.250s
user	0m0.003s
sys	0m0.003s

/* Mon Feb 05 10:48:48.253 2024 conn 2087997 real 0.244 wall 0.245 found 1719457 */  /*{"index" : "name", "query" : {"match" : {"_all": "on"} }, "limit": 0} */

snikolaev@dev2:~$ time mysql -P9306 -h0 -e "select * from name where match('on'); show meta;"
+---------+-------------------+
| id      | username          |
+---------+-------------------+
| 9750002 | LAVONIA HAIRSTON  |
| 9160015 | CLAYTON STOCKTON  |
| 9930016 | WILSON EVERSON    |
| 9930026 | WINSTON CINTRON   |
| 9880033 | DESMOND HARDISON  |
| 9660043 | DIONE JOHNSTON    |
| 9080055 | EDISON MCKINNON   |
| 9810061 | SON NIELSON       |
| 9750062 | SIMONNE LONG      |
| 9880068 | SONA ALSTON       |
| 9840085 | LEONOR DRAYTON    |
| 9260098 | LONA LONGORIA     |
| 9420101 | MARION CLIFTON    |
| 9580103 | RONDA DRUMMOND    |
| 9390104 | ASUNCION XIONG    |
| 9420113 | CLINTON PATTERSON |
| 9340117 | LAVONA DESIMONE   |
| 9530126 | ALISON SON        |
| 9390130 | JUDSON CONNELLY   |
| 9080134 | QUINTON ARAGON    |
+---------+-------------------+
+----------------+---------+
| Variable_name  | Value   |
+----------------+---------+
| total          | 20      |
| total_found    | 1719457 |
| total_relation | eq      |
| time           | 0.075   |
| keyword[0]     | *on*    |
| docs[0]        | 1794910 |
| hits[0]        | 1794951 |
| keyword[1]     | on      |
| docs[1]        | 0       |
| hits[1]        | 0       |
+----------------+---------+

real	0m0.082s
user	0m0.000s
sys	0m0.005s

/* Mon Feb 05 10:48:05.657 2024 conn 2087618 real 0.074 wall 0.075 found 1719457 */ SELECT * FROM name WHERE MATCH('on');

When there's only one disk chunk the difference is even larger (the likely related issue is #1767):

snikolaev@dev2:~$ mysql -P9306 -h0 -e "optimize table name option sync=1, cutoff=1;"

snikolaev@dev2:~$ time curl -sX POST http://localhost:9308/search -d '{"index" : "name", "query" : {"match" : {"_all": "on"} }, "limit": 0}'
{"took":583,"timed_out":false,"hits":{"total":1717748,"total_relation":"eq","hits":[]}}
real	0m0.589s
user	0m0.000s
sys	0m0.006s

snikolaev@dev2:~$ time mysql -P9306 -h0 -e "select * from name where match('on') limit 0; show meta;"
+----------------+---------+
| Variable_name  | Value   |
+----------------+---------+
| total          | 0       |
| total_found    | 1717748 |
| total_relation | eq      |
| time           | 0.088   |
| keyword[0]     | *on*    |
| docs[0]        | 1793344 |
| hits[0]        | 1793383 |
| keyword[1]     | on      |
| docs[1]        | 0       |
| hits[1]        | 0       |
+----------------+---------+

real	0m0.094s
user	0m0.000s
sys	0m0.006s

Version:

snikolaev@dev2:~$ searchd -v
Manticore 6.2.13 4950f3d02@24020213 dev (columnar 2.2.5 214ce90@240115) (secondary 2.2.5 214ce90@240115) (knn 2.2.5 214ce90@240115)
@sanikolaev
Copy link
Collaborator Author

Same in manticoresearch/manticore:4.0.2, i.e. not a new regression. Didn't check earlier versions.

@sanikolaev
Copy link
Collaborator Author

Interesting that if I disable auto-expansion, then in case of the on* query, the response time is the same:

snikolaev@dev2:~$ time curl -sX POST http://localhost:9308/search -d '{"index" : "name", "query" : {"match" : {"_all": "on*"} }}'
{"took":2,"timed_out":false,"hits":{"total":17482,"total_relation":"eq","hits":[{"_id":"1559568","_score":1670,"_source":{"username":"ONIE ONEAL"}},{"_id":"8901204","_score":1660,"_source":{"username":"ONEIDA ONG"}},{"_id":"9558471","_score":1660,"_source":{"username":"ONA ONEAL"}},{"_id":"860282","_score":1624,"_source":{"username":"DENAE ONEIL"}},{"_id":"860385","_score":1624,"_source":{"username":"ONITA TIERNEY"}},{"_id":"60418","_score":1624,"_source":{"username":"ONIE GUFFEY"}},{"_id":"860591","_score":1624,"_source":{"username":"MICHA ONEIL"}},{"_id":"860642","_score":1624,"_source":{"username":"ONITA CLEGG"}},{"_id":"860969","_score":1624,"_source":{"username":"KAROLE ONEILL"}},{"_id":"861828","_score":1624,"_source":{"username":"PHILLIP ONEAL"}},{"_id":"62159","_score":1624,"_source":{"username":"ONEIDA BRILL"}},{"_id":"862854","_score":1624,"_source":{"username":"ONITA OLMSTEAD"}},{"_id":"63206","_score":1624,"_source":{"username":"EULAH ONTIVEROS"}},{"_id":"63252","_score":1624,"_source":{"username":"RAUL ONEILL"}},{"_id":"863311","_score":1624,"_source":{"username":"ONA COBB"}},{"_id":"63396","_score":1624,"_source":{"username":"JESUS ONG"}},{"_id":"63751","_score":1624,"_source":{"username":"LUCIANA ONEAL"}},{"_id":"863775","_score":1624,"_source":{"username":"ONEIDA ROARK"}},{"_id":"864149","_score":1624,"_source":{"username":"KEELEY ONEAL"}},{"_id":"864572","_score":1624,"_source":{"username":"ONIE PUCKETT"}}]}}
real	0m0.008s
user	0m0.000s
sys	0m0.006s

snikolaev@dev2:~$ time mysql -P9306 -h0 -e "select * from name where match('on*'); show meta;"
+---------+-----------------+
| id      | username        |
+---------+-----------------+
| 1559568 | ONIE ONEAL      |
| 8901204 | ONEIDA ONG      |
| 9558471 | ONA ONEAL       |
|  860282 | DENAE ONEIL     |
|  860385 | ONITA TIERNEY   |
|   60418 | ONIE GUFFEY     |
|  860591 | MICHA ONEIL     |
|  860642 | ONITA CLEGG     |
|  860969 | KAROLE ONEILL   |
|  861828 | PHILLIP ONEAL   |
|   62159 | ONEIDA BRILL    |
|  862854 | ONITA OLMSTEAD  |
|   63206 | EULAH ONTIVEROS |
|   63252 | RAUL ONEILL     |
|  863311 | ONA COBB        |
|   63396 | JESUS ONG       |
|   63751 | LUCIANA ONEAL   |
|  863775 | ONEIDA ROARK    |
|  864149 | KEELEY ONEAL    |
|  864572 | ONIE PUCKETT    |
+---------+-----------------+
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| total          | 20    |
| total_found    | 17482 |
| total_relation | eq    |
| time           | 0.002 |
| keyword[0]     | on*   |
| docs[0]        | 17493 |
| hits[0]        | 17493 |
+----------------+-------+

real	0m0.008s
user	0m0.006s
sys	0m0.000s

but *on* is still slow.

So it looks like the issue has smth do with infix search.

@tomatolog
Copy link
Contributor

could be different default values for query via SQL and JSON endpoints

@sanikolaev
Copy link
Collaborator Author

Another example with a different keyword which makes the difference much bigger: 49 sec vs 671 ms

wget https://raw.githubusercontent.com/Raven-SL/ru-pnames-list/master/lists/male_names_rus.txt
wget https://raw.githubusercontent.com/Raven-SL/ru-pnames-list/master/lists/male_surnames_rus.txt
wget https://gist.githubusercontent.com/sanikolaev/5f4cea361b6869ee9ca7eb815be4fb76/raw/c203ad550ede2a0e1b5dab19a780e833b2367827/load_names.php -O load_names.php
php load_names.php 10000 10 10000000 1
mysql -P9306 -h0 -e "optimize table name option sync=1, cutoff=1;"
echo 'eyJpbmRleCIgOiAibmFtZSIsICJxdWVyeSIgOiB7Im1hdGNoIiA6IHsiX2FsbCI6ICLQvtCyIn19
fQo='|base64 -d > /tmp/query

snikolaev@dev2:~$ curl -vsX POST http://localhost:9308/search -d @/tmp/query
*   Trying 127.0.0.1:9308...
* Connected to localhost (127.0.0.1) port 9308 (#0)
> POST /search HTTP/1.1
> Host: localhost:9308
> User-Agent: curl/7.81.0
> Accept: */*
> Content-Length: 58
> Content-Type: application/x-www-form-urlencoded
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: 6.2.13 4950f3d02@24020213 dev (columnar 2.2.5 214ce90@240115) (secondary 2.2.5 214ce90@240115) (knn 2.2.5 214ce90@240115) (buddy v2.1.3)
< Content-Type: application/json; charset=UTF-8
< Content-Length: 1844
<
{"took":49781,"timed_out":false,"hits":{"total":4054337,"total_relation":"eq","hits":[{"_id":"9740003","_score":1539,"_source":{"username":"Джованни Галаганов"}},{"_id":"9960072","_score":1539,"_source":{"username":"Овадья Циолковский"}},{"_id":"9610093","_score":1539,"_source":{"username":"Овадья Недобитов"}},{"_id":"9700096","_score":1539,"_source":{"username":"Джованни Вислоухов"}},{"_id":"9900125","_score":1539,"_source":{"username":"Джованни Басунов"}},{"_id":"9880131","_score":1539,"_source":{"username":"Иов Хухряков"}},{"_id":"9980148","_score":1539,"_source":{"username":"Иов Сафьянов"}},{"_id":"9900166","_score":1539,"_source":{"username":"Людовик Калиновский"}},{"_id":"9610185","_score":1539,"_source":{"username":"Яков Сабашников"}},{"_id":"9930195","_score":1539,"_source":{"username":"Овадья Осонов"}},{"_id":"9660208","_score":1539,"_source":{"username":"* Connection #0 to host localhost left intact
Иов Безруков"}},{"_id":"9580212","_score":1539,"_source":{"username":"Джованни Аббакумов"}},{"_id":"9750238","_score":1539,"_source":{"username":"Людовик Пешников"}},{"_id":"9780246","_score":1539,"_source":{"username":"Яков Флегентов"}},{"_id":"9930279","_score":1539,"_source":{"username":"Джованни Филипович"}},{"_id":"9900294","_score":1539,"_source":{"username":"Яков Грабаров"}},{"_id":"9930296","_score":1539,"_source":{"username":"Яков Кованько"}},{"_id":"9750298","_score":1539,"_source":{"username":"Иов Годовалов"}},{"_id":"9600301","_score":1539,"_source":{"username":"Овадья Пестряков"}},{"_id":"9880310","_score":1539,"_source":{"username":"Джованни Анцифиров"}}]}}

snikolaev@dev2:~$ time mysql -P9306 -h0 -e "select * from name where match('ов'); show meta;"
+---------+---------------------------------------+
| id      | username                              |
+---------+---------------------------------------+
| 9740003 | Джованни Галаганов                    |
| 9960072 | Овадья Циолковский                    |
| 9610093 | Овадья Недобитов                      |
| 9700096 | Джованни Вислоухов                    |
| 9900125 | Джованни Басунов                      |
| 9880131 | Иов Хухряков                          |
| 9980148 | Иов Сафьянов                          |
| 9900166 | Людовик Калиновский                   |
| 9610185 | Яков Сабашников                       |
| 9930195 | Овадья Осонов                         |
| 9660208 | Иов Безруков                          |
| 9580212 | Джованни Аббакумов                    |
| 9750238 | Людовик Пешников                      |
| 9780246 | Яков Флегентов                        |
| 9930279 | Джованни Филипович                    |
| 9900294 | Яков Грабаров                         |
| 9930296 | Яков Кованько                         |
| 9750298 | Иов Годовалов                         |
| 9600301 | Овадья Пестряков                      |
| 9880310 | Джованни Анцифиров                    |
+---------+---------------------------------------+
+----------------+---------+
| Variable_name  | Value   |
+----------------+---------+
| total          | 20      |
| total_found    | 4054337 |
| total_relation | eq      |
| time           | 0.671   |
| keyword[0]     | *ов*    |
| docs[0]        | 4081690 |
| hits[0]        | 4081690 |
| keyword[1]     | ов      |
| docs[1]        | 0       |
| hits[1]        | 0       |
+----------------+---------+

real    0m0.689s
user    0m0.000s
sys 0m0.009s

@klirichek
Copy link
Contributor

That is dupe of https://sphinxsearch.com/bugs/view.php?id=1253
(appliable to http queries)

@sanikolaev
Copy link
Collaborator Author

I confirm it looks ok now:

snikolaev@dev2:~$ time (for n in `seq 1 100`; do mysql -P9306 -h0 -e "select * from name where match('on'); show meta;"; done > /dev/null)

real	0m8.881s
user	0m0.261s
sys	0m0.289s
snikolaev@dev2:~$ time (for n in `seq 1 100`; do curl -sX POST http://localhost:9308/search -d '{"index" : "name", "query" : {"match" : {"_all": "on"} }, "limit": 0}'; done > /dev/null)

real	0m8.713s
user	0m0.184s
sys	0m0.379s

@PavelShilin89 let's try to cover this with a small CLT test and see how stable it is in the github runners environment. Notes:

  1. You can use the issue's body as the way to populate Manticore with data for the test
  2. It's better to copy the script and the names/surnames from my gist directly to the test dir
  3. Call it github-manticoresearch-1794-curl-vs-mysql-performance, so we know what issue it relates to and what it is about.
  4. To check the performance metrics you can do this:
# Capture the current time in nanoseconds before starting the curl operations
start=$(date +%s%N)

# Execute a curl POST request 100 times in a loop, discard the output
for n in `seq 1 100`; do curl -sX POST http://localhost:9308/search -d '{"index" : "name", "query" : {"match" : {"_all": "on"} }, "limit": 0}'; done > /dev/null

# Capture the current time in nanoseconds after completing the curl operations
end=$(date +%s%N)

# Calculate the duration of the curl operations in milliseconds
curl=$(( (end - start) / 1000000 ))

# Capture the current time in nanoseconds before starting the MySQL operations
start=$(date +%s%N)

# Execute a MySQL query 100 times in a loop, discard the output
for n in `seq 1 100`; do mysql -P9306 -h0 -e "select * from name where match('on'); show meta;"; done > /dev/null

# Capture the current time in nanoseconds after completing the MySQL operations
end=$(date +%s%N)

# Calculate the duration of the MySQL operations in milliseconds
mysql=$(( (end - start) / 1000000 ))

# Print the durations of MySQL and curl operations in milliseconds
echo "Duration: ${mysql} ${curl} ms"

# Use PHP to calculate the percentage difference between the curl and MySQL durations and check if it's >= 1%
# Exits with 0 if the condition is true, otherwise exits with 1
php -r "exit(round(abs($curl - $mysql) / ($curl + $mysql)/2 * 100, 0) >= 1);"

# Print the exit status of the PHP script (0 if percentage difference >= 1%, 1 otherwise)
echo $?

Result on dev2:

Duration: 7847 7782 ms
0

Please create a separate task for yourself about it and put a link here.

@PavelShilin89
Copy link
Contributor

PavelShilin89 commented Mar 11, 2024

donhardman pushed a commit that referenced this issue Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug rel::upcoming Upcoming release
Projects
None yet
Development

No branches or pull requests

4 participants