-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redis decode_responses rq incompatibility #6424
Comments
Thanks, that looks like an important bug to investigate / fix. 😄 |
Our latest development code is using rq Lines 47 to 48 in c2e7df0
Latest versions are currently:
We should be able to bump them up to much newer, and hopefully also adjust the connection options without too much hassle. |
This seems to be where redash/redash/settings/helpers.py Lines 47 to 66 in c2e7df0
Looks like that was originally added during the migration of Redash from Python 2 to Python 3: 246eca1
|
Not setting that option might just be as simple as removing the helper function and removing the calls to it: redash/redash/settings/__init__.py Line 9 in c2e7df0
redash/redash/settings/__init__.py Line 22 in c2e7df0
Am not finding any info about why the option was added in the first place. It was likely done for a reason, though what that might be I have no idea. We can probably try removing that option then see if our CI tests pass. That will at least let us know that Redash might work without it. We'd ideally want to see it operate without causing errors in practise though. |
Ahhh. Looks like the rq docs specifically call out that option as unsupported these days too: https://github.com/rq/rq/pull/1833/files Yeah, lets try removing it. 😄 |
Looks like just removing that function completely isn't going to work as-is. When doing so just now in a test on my local computer, the backend unit tests start throwing errors:
The errors like @guidopetri Any interest in this one? 😄 |
Thinking about it more, the existing code base already has the connection to
So Whereas So, my reading of things is that we should definitely try upgrading the rq and rq-scheduler libraries to see if it helps your problem. The Am testing the upgraded libraries on my local computer at the moment. If they pass our CI tests on my local computer, then I'll make a PR out of it. 😄 |
I've just created PR #6426, which upgrades the rq related dependencies as far as they can go without needing changes to our code base or tests. @SchrosCat2013 Would you be ok to try adjusting your And yeah, I'm aware you've been trying a slightly newer version of |
Thanks, I'll give that a go and let you know how it turns out |
This does indeed probably stem from Redis, but I agree that it's probably unrelated to the original error. AFAIK Redis by default returns bytes, and you have to decode with utf-8 to get |
Unfortunately, swapping to the suggested versions seems to have exacerbated the problem, if anything. Part of this might be our environment - we're currently running an older version of Redis (v3.2.10). The suggested upgrade of In the mean time I'm going to see if I can stand up and reproduce the issues in a Redash development environment. |
Interesting. Hopefully we can all get this solved sooner rather than later. 😄
For the dev environment setup, please use the wiki page documentation rather than the knowledge base pages: https://github.com/getredash/redash/wiki/Local-development-setup We need to update the knowledge base pages to include the wiki stuff, but haven't gotten around to it yet. |
I've had a chance to look into this. I found one place where the wrong connection was being passed to an RQ function leading to the UnicodeDecodeError. #6539 While looking for the problem, I noticed that the enqueue_query method is creating a Redis pipeline to watch a value, but the pipeline isn't being reset. I don't think this is related to the issue at hand, but seems like something that should be fixed. I've created a PR #6540 that calls reset inside a finally block. The other option is to change line 39 - I believe the actual source of the problem is the HardLimitingWorker. This is performing a heartbeat on the Worker, but not on the Job. Once a query is picked off the queue for execution, RQ is giving the job a default timeout of 60s. Normally this would be updated every time the job heartbeat is called. Nothing happens immediately after this timeout expires. The next time the RQ The
Rather than adding a call to the job heartbeat from within the HardLimitingWorker, I believe the best fix in this case is to remove the HardLimitingWorker entirely. It looks like this was cloned from an early version of RQ, and its purpose is to hard-kill a job that runs for too long without respecting its time limit. This functionality has since been baked directly into RQ. It was added in version 1.2.1, and has undergone a number of iterations of bugfixes. I have managed to reproduce this locally for ad-hoc queries. I'm not certain if this is also the cause of the scheduled queries running multiple times yet, but will give the fixes a go and get back to you with the results. |
Sounds like really well done investigation. Awesome. 😄 |
Want to throw together a PR for this? Also, any ideas on how we can definitively test the broken behaviour beforehand (smoking gun style), and test that it's fixed afterwards? |
Happy to throw a PR together for this. I'd have to give some thought about how you would go about testing this. As the broken behaviour is only exhibited once the RQ initial timeout of 60s has passed, you'd either have to have a test that waited 60s, messed with the internals of RQ, or directly modified the timeout for the job's underlying entry in Redis. None of those sound particularly appealing, though I could be persuaded on the third option. |
Of the three, that one sounds most reasonable for our purposes here. But, lets ask @guidopetri as he's a lot better with Python than me. 😄 |
imo waiting for 60s for a test isn't that big of a deal, I'd be fine with that. Messing with the internals of Redis sounds more error-prone and in my opinion doesn't give enough lift that it's worth doing instead. |
Redash is setting the flag
decode_responses
for its Redis connection. This is incompatible with the rq library - rq/rq#1188.We are having some issues, and I suspect this might be the underlying cause. Primarily:
The second issue eventually consumes the entire query pool and prevents any query from running. When this happens, we have to manually flush the Redis cache.
We are running Redash v10.1.0 from the redash/redash:10.1.0.b50633 docker image.
We found #5801, which looks like the same issue we are having. We have upgraded rq to 1.10.1 and rq-scheduler to 0.10.0 in our environment. This does seem to have alleviated the stuck queries issue. It doesn't seem to have changed the scheduled queries multiple executions. However, we are now seeing the error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c in position 1: invalid start byte
repeated throughout the logs. The stack trace here matches the linked issue in the rq library. I am not sure if upgrading rq caused these errors - I more suspect that upgrading rq has improved the error logging and is surfacing an existing issue.
The text was updated successfully, but these errors were encountered: