Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(core): Move event and telemetry handling into workers in queue mode #7138

Merged

Conversation

flipswitchingmonkey
Copy link
Contributor

@flipswitchingmonkey flipswitchingmonkey commented Sep 8, 2023

Motivation

In Queue mode, finished executions would cause the main instance to always pull all execution data from the database, unflatten it and then use it to send out event log events and telemetry events, as well as required returns to Respond to Webhook nodes etc.

This could cause OOM errors when the data was large, since it had to be fully unpacked and transformed on the main instance’s side, using up a lot of memory (and time).

This PR attempts to limit this behaviour to only happen in those required cases where the data has to be forwarded to some waiting webhook, for example.

Changes

Execution data is only required in cases, where the active execution has a postExecutePromise attached to it. These usually forward the data to some other endpoint (e.g. a listening webhook connection).

By adding a helper getPostExecutePromiseCount(), we can decide that in cases where there is nothing listening at all, there is no reason to pull the data on the main instance.

Previously, there would always be postExecutePromises because the telemetry events were called. Now, these have been moved into the workers, which have been given the various InternalHooks calls to their hook function arrays, so they themselves issue these telemetry and event calls.

This results in all event log messages to now be logged on the worker’s event log, as well as the worker’s eventbus being the one to send out the events to destinations. The main event log does…pretty much nothing.

We are not logging executions on the main event log any more, because this would require all events to be replicated 1:1 from the workers to the main instance(s) (this IS possible and implemented, see the worker’s replicateToRedisEventLogFunction - but it is not enabled to reduce the amount of traffic over redis).

Partial events in the main log could confuse the recovery process and would result in, ironically, the recovery corrupting the execution data by considering them crashed.

Refactor

I have also used the opportunity to reduce duplicate code and move some of the hook functionality into packages/cli/src/executionLifecycleHooks/shared/sharedHookFunctions.ts in preparation for a future full refactor of the hooks

…-with-current-state

# Conflicts:
#	packages/cli/src/commands/worker.ts
…-with-current-state

# Conflicts:
#	packages/cli/src/AbstractServer.ts
…-with-current-state

# Conflicts:
#	packages/cli/src/eventbus/eventBus.controller.ts
…rker

# Conflicts:
#	packages/cli/src/commands/worker.ts
#	packages/cli/src/eventbus/MessageEventBus/MessageEventBus.ts
@github-actions
Copy link
Contributor

github-actions bot commented Sep 8, 2023

Great PR! Please pay attention to the following items before merging:

Files matching packages/**:

  • If fixing bug, added test to cover scenario.
  • If addressing forum or Github issue, added link to description.

Files matching packages/**/*.ts:

  • Added unit tests to cover new or updated functionality.

Make sure to check off this list before asking for review.

@n8n-assistant n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Sep 8, 2023
@flipswitchingmonkey flipswitchingmonkey marked this pull request as ready for review September 8, 2023 14:43
@codecov
Copy link

codecov bot commented Sep 8, 2023

Codecov Report

Patch coverage: 8.67% and no project coverage change.

Comparison is base (34ebffe) 32.40% compared to head (42b9153) 32.40%.
Report is 3 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7138   +/-   ##
=======================================
  Coverage   32.40%   32.40%           
=======================================
  Files        3276     3278    +2     
  Lines      198005   198019   +14     
  Branches    21652    21646    -6     
=======================================
+ Hits        64162    64167    +5     
- Misses     132782   132791    +9     
  Partials     1061     1061           
Files Changed Coverage Δ
packages/cli/src/ActiveExecutions.ts 59.45% <0.00%> (-1.66%) ⬇️
packages/cli/src/Queue.ts 16.66% <ø> (ø)
packages/cli/src/WebhookHelpers.ts 31.51% <0.00%> (-0.13%) ⬇️
packages/cli/src/WorkflowRunner.ts 9.21% <0.00%> (-0.21%) ⬇️
...entbus/EventMessageClasses/EventMessageWorkflow.ts 58.82% <ø> (ø)
packages/cli/src/InternalHooks.ts 5.82% <4.76%> (+0.47%) ⬆️
packages/cli/src/WorkflowExecuteAdditionalData.ts 12.88% <4.76%> (+0.77%) ⬆️
packages/cli/src/worker/workerCommandHandler.ts 20.68% <20.68%> (ø)
...cutionLifecycleHooks/shared/sharedHookFunctions.ts 25.71% <25.71%> (ø)
packages/cli/src/commands/worker.ts 24.87% <50.00%> (+0.74%) ⬆️

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

packages/cli/src/InternalHooks.ts Show resolved Hide resolved
packages/cli/src/InternalHooks.ts Show resolved Hide resolved
packages/cli/src/Queue.ts Outdated Show resolved Hide resolved
packages/cli/src/WorkflowRunner.ts Outdated Show resolved Hide resolved
packages/cli/src/WorkflowRunner.ts Show resolved Hide resolved
packages/cli/src/WorkflowRunner.ts Outdated Show resolved Hide resolved
packages/cli/src/WorkflowRunner.ts Show resolved Hide resolved
packages/cli/src/commands/worker.ts Outdated Show resolved Hide resolved
@flipswitchingmonkey flipswitchingmonkey changed the title fix(core): Move event and telemetry handling into workers in queue mode refactor(core): Move event and telemetry handling into workers in queue mode Sep 13, 2023
…rker

# Conflicts:
#	packages/cli/src/commands/worker.ts
#	packages/cli/src/worker/workerCommandHandler.ts
krynble
krynble previously approved these changes Sep 13, 2023
Copy link
Contributor

@krynble krynble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, just suggesting some leftover code that was still in this PR.

packages/cli/src/Queue.ts Outdated Show resolved Hide resolved
@@ -193,7 +199,7 @@ export class Worker extends BaseCommand {
);
await additionalData.hooks.executeHookFunctions('workflowExecuteAfter', [failedExecution]);
}
return { success: true };
return { success: true, error: error as ExecutionError };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also remove this property since this is not in use at the moment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that one was actually being used but i just noticed that merging master in removed it...

@@ -23,6 +23,7 @@ export interface JobData {

export interface JobResponse {
success: boolean;
error?: ExecutionError;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe remove also this one?

@github-actions
Copy link
Contributor

⚠️ Some Cypress E2E specs are failing, please fix them before merging

@cypress
Copy link

cypress bot commented Sep 13, 2023

1 flaky test on run #2165 ↗︎

0 239 0 0 Flakiness 1

Details:

🌳 pay-780-move-onworkflowpostexecute-into-worker 🖥️ browsers:node18.12.0-chrom...
Project: n8n Commit: 42b915375e
Status: Passed Duration: 58:28 💡
Started: Sep 13, 2023 9:17 PM Ended: Sep 13, 2023 10:16 PM
Flakiness  cypress/e2e/24-ndv-paired-item.cy.ts • 1 flaky test

View Output Video

Test Artifacts
NDV > resolves expression with default item when input node is not parent, while still pairing items Output Screenshots Video

This comment has been generated by cypress-bot as a result of this project's GitHub integration settings.

@github-actions
Copy link
Contributor

⚠️ Some Cypress E2E specs are failing, please fix them before merging

2 similar comments
@github-actions
Copy link
Contributor

⚠️ Some Cypress E2E specs are failing, please fix them before merging

@github-actions
Copy link
Contributor

⚠️ Some Cypress E2E specs are failing, please fix them before merging

@github-actions
Copy link
Contributor

✅ All Cypress E2E specs passed

@flipswitchingmonkey flipswitchingmonkey merged commit 0c6169e into master Sep 14, 2023
52 of 53 checks passed
@flipswitchingmonkey flipswitchingmonkey deleted the pay-780-move-onworkflowpostexecute-into-worker branch September 14, 2023 05:58
@janober
Copy link
Member

janober commented Sep 21, 2023

Got released with n8n@1.8.1

ivov added a commit that referenced this pull request Nov 20, 2023
This PR adds `status` to run data so that
`determineFinalExecutionStatus` resolves correctly on execution failure
and removes the cleanup that is being duplicated in a worker hook.

Followup to #7138

Should fix:
- #7705
-
https://linear.app/n8n/issue/PAY-964/no-execution-found-after-execution-fails
-
https://linear.app/n8n/issue/PAY-1010/execution-deletion-in-queue-mode-not-complying-with-settings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team Released
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants