Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event trigger webhooks are not runned concurrently #5189

Closed
MichalKalita opened this issue Jun 23, 2020 · 5 comments · Fixed by #5352
Closed

Event trigger webhooks are not runned concurrently #5189

MichalKalita opened this issue Jun 23, 2020 · 5 comments · Fixed by #5352
Labels
c/server Related to server k/bug Something isn't working

Comments

@MichalKalita
Copy link

MichalKalita commented Jun 23, 2020

I have insert event in table. It call webhook and response is in 10s. Actually webhooks are called sequential. But I want to call it concurrently. How to do it?

I am not sure if this behavior is bug in Hasura/heroku or some Hasura feature and how to change it.

Steps to reproduce:

  1. Start webhook https://github.com/MichalKalita/test-delayed-webhook/ or use deployed version on heroku https://protected-castle-19380.herokuapp.com/ - this webhook just wait 10 seconds and response with status ok
  2. Setup insert event trigger to webhook
  3. Insert multiple rows to trigger webhook.
  4. Looks to pending events in hasura console

Actual behavior:
After 10s 1 pending event is done. It is not runningconcurrently.

Expected behavior:
Run all webhooks in same time. All should be done in 10 seconds

Can be reproduced on v1.3.0-beta.1, v1.3.0-beta.2

It cannot be reproduced on v1.2.2

@MichalKalita MichalKalita changed the title run events concurrently trigger events concurrently Jun 23, 2020
@MichalKalita MichalKalita changed the title trigger events concurrently Event trigger webhooks are not runned concurrently Jun 24, 2020
@tirumaraiselvan tirumaraiselvan added the support/needs-triage Needs to be triaged so that we have enough information to add this to our backlog label Jun 24, 2020
@mousetraps
Copy link

mousetraps commented Jul 9, 2020

We're running into this issue as well on the latest builds (including v1.3.0-beta.4). It's crushing performance and it's a blocking issue for us, as our application counts on events to be processed in a timely manner. We weren't running into it before, so it seems like a regression.

Hasura is running in a docker container, so it would be surprising for this to be a platform issue, but let me know if you need more specifics about our environment.

@codingkarthik codingkarthik added k/bug Something isn't working c/server Related to server and removed support/needs-triage Needs to be triaged so that we have enough information to add this to our backlog labels Jul 10, 2020
@codingkarthik
Copy link
Contributor

@MichalKalita @mousetraps Thank you so much for reporting this regression.
This bug was introduced from v1.3.0-beta.1 as you folks point out due to a faulty refactor of the code.
I have raised a PR which will fix this issue.

@cusspvz
Copy link
Contributor

cusspvz commented Nov 17, 2020

@codingkarthik I've seen the PR (#5352) where you solve this regression. We're currently experiencing a bunch of webhook executions on production (on version 1.3.0), and I'm wondering if it is related to this.

We're seeing a lot errors like this one and I'm wondering what could be causing it:

2020-11-14 22:00:49 UTC:xxxxx.compute-1.amazonaws.com(34640):xxx@xxx:[14246]:ERROR: could not serialize access due to concurrent update
2020-11-14 22:00:49 UTC:xxxxx.compute-1.amazonaws.com(34640):xxx@xxx:[14246]:STATEMENT:
UPDATE hdb_catalog.event_log
SET locked = 't'
WHERE id IN ( SELECT l.id
FROM hdb_catalog.event_log l
WHERE l.delivered = 'f' and l.error = 'f' and l.locked = 'f'
and (l.next_retry_at is NULL or l.next_retry_at <= now())
and l.archived = 'f'
ORDER BY created_at
LIMIT $1
FOR UPDATE SKIP LOCKED )
RETURNING id, schema_name, table_name, trigger_name, payload::json, tries, created_at
  1. Does Hasura has a limit of number of webhook executions it can take concurrently?
  2. As it seems to be throwing out an issue while locking a trigger, does it mean there is more than one Hasura instance trying to lock to the same webhook at the same time?

Note: Our Hasura setup is running multiple instances at the same time to provide HA.

@codingkarthik
Copy link
Contributor

@cusspvz The issue you're facing is a different issue, which happens when you're running event triggers on multiple instances of hasura. We have fixed this in v1.3.3 which has been released yesterday. Do let us know if you continue facing the same issue with the newer version also.

@cusspvz
Copy link
Contributor

cusspvz commented Nov 18, 2020

@codingkarthik I will update it and let you know. Thanks for that quick answer @codingkarthik !

hasura-bot pushed a commit that referenced this issue Apr 29, 2021
…1237)

This essentially restores the original code from c425b55
(#4013). Prior to this
commit we would slurp messages as fast as possible from the database
(one thing c425b55 fixed).

Another thing broken as a consequence of the same logic was the
removeEventFromLockedEvents logic which unlocks in-flight events
(breaking at-least-once delivery)

Some archeology, post-c425b55:

- cc8e2cc erroneously attempted to refactor using `bracket`, resulting
  in the same slurp-all-events behavior (since we don't ever wait for
  processEvent to complete)
- at some point event processing within a batch is made serial, this
  reported as a bug. See: #5189
- in 0ef5229 (which I approved...) an `async` is added, again
  causing the same issue...

GitOrigin-RevId: d8cbaab
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/server Related to server k/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants