TLOG experiences performance issues with large recordset #9

amclain · 2018-06-04T19:22:36Z

After restarting a single-node Jylis server with disk persistence, GETting a large TLOG key (>100k items) causes Jylis resource starvation, CPU goes to 100%, Jylis can't process other queries, and the executing query doesn't seem to return. This query took about 1000ms before the restart. A 100 item limited GET succeeds, as does adding new items with INS.

Jylis at rest after starting up and ingesting the TLOG from disk:

Jylis after sending the TLOG GET:

I tried easing up the item count and it had interesting results. I did this to try to keep a batch of items "hot" in memory to see if it would speed up the next query. 10k items worked fine. Then I bumped up the count to 20k and that was fine. I worked up to 80k items this way before Jylis became excessively unresponsive.

Then I restarted Jylis and the web app and tried a 45k item batch. It took 57 seconds to return from the database, which was close to the 60 second app timeout limit. I pulled the same 45k batch again and Jylis responded in 431ms. If I wait for several minutes with the processes still running, the long request happens again, followed by shorter requests after that. I'm not sure if this is intentionally being cached in Jylis or if it's a side effect of the garbage collector, but I thought I should point out the behavior.

Additional Notes

I am using the Hiredis Ruby gem. I store the current connection in Connection, but other than that I'm calling the Hiredis read and write methods.

Connection.current.write ["TLOG", "GET", "temperature"]
result = Connection.current.read

Using redis-cli to execute TLOG GET temperature doesn't experience the problem.
If the app server is stopped/reloaded/times out, the query continues to run in Jylis.
No other Jylis queries can be made while the problem query is running.
No other clients can connect to Jylis while the problem query is running.

I realize this use of Jylis (>100k items in a TLOG) may be abusive to Jylis' ideal use case and pushing it beyond its intended limits. The intent of this test was to find the breaking point of Jylis. If that's the case, I have no problem trimming the TLOG at this point (or setting a limit until the cursor is implemented). However, if you would like to make Jylis performant in this scenario, I can certainly share the TLOG data so that the issue can be reproduced.

I'm a little concerned that the extreme delays started happening after a deploy (a new Jylis image may have been pulled in addition to the service restarting). Not sure if this is a regression or if I just got lucky from running Jylis for so long without a restart.

This also exposes a couple issues that I am concerned about that are not related to a large data set: One running query seems to tie up the Jylis connection. No other queries can be made and no clients can connect when a query is running.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TLOG experiences performance issues with large recordset #9

TLOG experiences performance issues with large recordset #9

amclain commented Jun 4, 2018

TLOG experiences performance issues with large recordset #9

TLOG experiences performance issues with large recordset #9

Comments

amclain commented Jun 4, 2018

Additional Notes