-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SpiffWorkflow node not sending messages #5195
Comments
Error reported from UI is:
And if we look at the time when this was generated: Nothing in Utterly useless. |
This is the only thing I can find that matches:
|
Error appears to becoming from here: status-go/protocol/messenger_peersyncing.go Lines 359 to 364 in 6f1b829
And more specifically from: status-go/protocol/common/message_segmentation.go Lines 35 to 37 in 6f1b829
If it is indeed the correct error I'm researching, considering most of logs is errors. |
The logs are full of all kinds of other errors:
No idea how relevant or if it's just noise. |
There's also shit like this:
|
But I sent a test message to someone using this command:
And it worked. So the message delivery works fine. |
But I did manage to get it to timeout a few times:
Which suggests an issue indeed exists. But logs show me nothing useful. |
What's interesting is I stopped the
Which suggests both |
There are lots of different types of errors:
No idea which are relevant or not. The fact that message delivery sometimes works suggests that most of those mean nothing. |
Here's the stats but for source code file from where the error originates:
|
I can reproduce the timeout even with 8 seconds limit:
But less frequently. More so with 7 seconds. |
If we look at raw_response = requests.post(url, json.dumps(request_body), headers=headers) And according to docs that should mean indefinite timeout:
https://requests.readthedocs.io/en/latest/user/advanced/#timeouts Which would contradict my hypothesis. But there are other timeouts involved, like the frontend one. |
It appears the CONNECTOR_PROXY_COMMAND_TIMEOUT = 45 Which is then used by proxied_response = requests.post(call_url, json=params, timeout=CONNECTOR_PROXY_COMMAND_TIMEOUT) |
So far, it looks like the problem is CPU related. |
Andrea made a PR to reduce CPU usage by disabling fetching of messages: But even that is not enough to prevent the |
I have added extra logging to diff --git a/connectors/connector-waku/connector_waku/commands/sendMessage.py b/connectors/connector-waku/connector_waku/commands/sendMessage.py
index 09a81bf..4d59e32 100644
--- a/connectors/connector-waku/connector_waku/commands/sendMessage.py
+++ b/connectors/connector-waku/connector_waku/commands/sendMessage.py
@@ -51,7 +51,10 @@ class SendMessage:
status_code = None
successful = False
try:
+ print('POST', url, request_body)
raw_response = requests.post(url, json.dumps(request_body), headers=headers)
+ print('RESPONSE:', raw_response.status_code)
+ print('RESPONSE TEXT:', raw_response.text)
raw_response.raise_for_status()
status_code = raw_response.status_code
parsed_response = json.loads(raw_response.text)
@@ -59,9 +62,11 @@ class SendMessage:
if not self.response_has_error(response) and status_code == 200:
successful = True
except HTTPError as ex:
+ print('HTTPError:', ex)
status_code = ex.response.status_code
response['error'] = str(ex)
except Exception as ex:
+ print('Exception:', ex)
response['error'] = str(ex)
status_code = 500
return (response, status_code, successful) It's not pretty but it workd. I want to see how it actually fails and what are the timings. So far I as unable to reproduce a failure. |
I was unable to reproduce the issue using this process: So I'm going to leave reproducing it with new logging to Spiff testers. |
Some failed process instances were found on |
What's fascinating is that if we look at https://test.app.spiff.status.im/i/637 which started at
We can see 3 new interesting error messages:
Most interestingly the task says it failed at |
The
But the
|
And indeed if I search for instances of
Crucial part:
|
Another instance at
We can see |
Interestingly neither |
Here's an example of failure: https://test.app.spiff.status.im/process-instances/misc:qa:send-waku-message/803
There's some other examples of There is no delay, so why would it cause a timeout. |
Possibly relevant issue: |
After migrating The problems have subsided, for now at least. So I'm closing this issue for now. |
Problem
The SpiffWorkflow node appears to have issues sending messages. Spiff shows errors like:
But debugging this is difficult because as far as I can tell 98% of the node logs are errors:
The text was updated successfully, but these errors were encountered: