Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLOG experiences performance issues with large recordset #9

Open
amclain opened this issue Jun 4, 2018 · 0 comments
Open

TLOG experiences performance issues with large recordset #9

amclain opened this issue Jun 4, 2018 · 0 comments

Comments

@amclain
Copy link
Contributor

amclain commented Jun 4, 2018

After restarting a single-node Jylis server with disk persistence, GETting a large TLOG key (>100k items) causes Jylis resource starvation, CPU goes to 100%, Jylis can't process other queries, and the executing query doesn't seem to return. This query took about 1000ms before the restart. A 100 item limited GET succeeds, as does adding new items with INS.

Jylis at rest after starting up and ingesting the TLOG from disk:

image

Jylis after sending the TLOG GET:

image

I tried easing up the item count and it had interesting results. I did this to try to keep a batch of items "hot" in memory to see if it would speed up the next query. 10k items worked fine. Then I bumped up the count to 20k and that was fine. I worked up to 80k items this way before Jylis became excessively unresponsive.

Then I restarted Jylis and the web app and tried a 45k item batch. It took 57 seconds to return from the database, which was close to the 60 second app timeout limit. I pulled the same 45k batch again and Jylis responded in 431ms. If I wait for several minutes with the processes still running, the long request happens again, followed by shorter requests after that. I'm not sure if this is intentionally being cached in Jylis or if it's a side effect of the garbage collector, but I thought I should point out the behavior.

Additional Notes

  • I am using the Hiredis Ruby gem. I store the current connection in Connection, but other than that I'm calling the Hiredis read and write methods.
Connection.current.write ["TLOG", "GET", "temperature"]
result = Connection.current.read
  • Using redis-cli to execute TLOG GET temperature doesn't experience the problem.

  • If the app server is stopped/reloaded/times out, the query continues to run in Jylis.

  • No other Jylis queries can be made while the problem query is running.

  • No other clients can connect to Jylis while the problem query is running.


I realize this use of Jylis (>100k items in a TLOG) may be abusive to Jylis' ideal use case and pushing it beyond its intended limits. The intent of this test was to find the breaking point of Jylis. If that's the case, I have no problem trimming the TLOG at this point (or setting a limit until the cursor is implemented). However, if you would like to make Jylis performant in this scenario, I can certainly share the TLOG data so that the issue can be reproduced.

I'm a little concerned that the extreme delays started happening after a deploy (a new Jylis image may have been pulled in addition to the service restarting). Not sure if this is a regression or if I just got lucky from running Jylis for so long without a restart.

This also exposes a couple issues that I am concerned about that are not related to a large data set: One running query seems to tie up the Jylis connection. No other queries can be made and no clients can connect when a query is running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant