Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): Coordinate manual workflow activation and deactivation in multi-main scenario #7643

Merged
merged 65 commits into from
Nov 17, 2023

Conversation

ivov
Copy link
Contributor

@ivov ivov commented Nov 7, 2023

Followup to #7566 | Story: https://linear.app/n8n/issue/PAY-926

Manual workflow activation and deactivation

In a multi-main scenario, if the user manually activates or deactivates a workflow, the process (whether leader or follower) that handles the PATCH request and updates its internal state should send a message into the command channel, so that all other main processes update their internal state accordingly:

  • Add to ActiveWorkflows if activating
  • Remove from ActiveWorkflows if deactivating
  • Remove and re-add to ActiveWorkflows if the update did not change activation status.

After updating their internal state, if activating or deactivating, the recipient main processes should push a message to all connected frontends so that these can update their stores and so reflect the value in the UI.

Workflow activation errors

On failure to activate a workflow, the main instance should record the error in Redis - main instances should always pull activation errors from Redis in a multi-main scenario.

Leadership change

On leadership change...

  • The old leader should stop pruning and the new leader should start pruning.
  • The old leader should remove trigger- and poller-based workflows and the new leader should add them.

@n8n-assistant n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Nov 7, 2023
Copy link
Contributor

@krynble krynble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job overall! Left a few comments. Looking good!

Still haven't checked the tests as you said some are broken.

packages/cli/src/commands/start.ts Show resolved Hide resolved
packages/cli/src/commands/start.ts Outdated Show resolved Hide resolved
Comment on lines 110 to 127
async broadcastWorkflowWasActivated(workflowId: string, targets: string[], pushSessionId = '') {
if (!this.sanityCheck()) return;

await this.redisPublisher.publishToCommandChannel({
command: 'workflowWasActivated',
targets,
payload: { workflowId, pushSessionId },
});
}

async broadcastWorkflowWasDeactivated(workflowId: string, pushSessionId = '') {
if (!this.sanityCheck()) return;

await this.redisPublisher.publishToCommandChannel({
command: 'workflowWasDeactivated',
payload: { workflowId, pushSessionId },
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a comment: We've got to make sure that these 2 calls are only used for basic "Activation" and "Deactivation" actions.

Saying this because the update is comprised of a deactivation + database update + reactivation, and we'd like to prevent flooding with messages and also risking having messages being received in the wrong order, leading to an inconsistent state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could extend the sanityCheck() with more checks, though I'm not sure what we could add. We are currently testing for queue mode and whether it is main, but only the Leader should be sending these out since otherwise we can start an endless loop when the receivers start reactivating theirs...?

packages/cli/src/services/pruning.service.ts Outdated Show resolved Hide resolved
packages/editor-ui/src/mixins/pushConnection.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@flipswitchingmonkey flipswitchingmonkey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's discuss the communication protocol and what should happen when messages are received (and who should send what when)

packages/cli/src/ActiveWorkflowRunner.ts Outdated Show resolved Hide resolved
@ivov
Copy link
Contributor Author

ivov commented Nov 13, 2023

@krynble @flipswitchingmonkey Submitting a bit faster than I'd like to as I need to go back to cloud support, but this PR should be 95% there and ready for review.

@ivov ivov removed their assignment Nov 13, 2023
packages/cli/src/ActivationErrors.service.ts Outdated Show resolved Hide resolved
packages/cli/src/ActiveWorkflowRunner.ts Show resolved Hide resolved
@@ -272,15 +255,19 @@ export class ActiveWorkflowRunner implements IWebhookManager {
async allActiveInStorage(user?: User) {
const isFullAccess = !user || user.globalRole.name === 'owner';

const activatedSuccessfully = async (workflowId: string) => {
return (await this.activationErrorsService.get(workflowId)) === undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could easily lead to a lot of back-and-forth; maybe we can fetch all at once and then compare this list only once? Perhaps a separate call that fetches all activation errors and compares this to the list of active workflows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed here: 401c02e

packages/cli/src/commands/BaseCommand.ts Outdated Show resolved Hide resolved
if (!this.sanityCheck()) return;

await this.redisPublisher.publishToCommandChannel({
command: 'workflowActiveStateChanged',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
command: 'workflowActiveStateChanged',
command: 'reloadWorkflowActiveState',

Based on the conversation yesterday, commands should indicate imperative actions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I recall, what we decided yesterday was that these were not commands but rather events, so it'd be the others (and methods, etc.) that need renaming.

In any case, I tend to think we should avoid a single one being inconsistent with the others - ideally we could rename all of this implementation in a standalone PR.

packages/cli/src/commands/start.ts Show resolved Hide resolved
Comment on lines 423 to 434
| PushDataActiveWorkflowAdded
| PushDataActiveWorkflowRemoved;

type PushDataActiveWorkflowAdded = {
data: IActiveWorkflowAdded;
type: 'workflowActivated';
};

type PushDataActiveWorkflowRemoved = {
data: IActiveWorkflowRemoved;
type: 'workflowDeactivated';
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might not need these anymore right? This was about "multiplayer real time update" - I guess we removed this, didn't we?

Copy link
Contributor Author

@ivov ivov Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesterday I mentioned that multiplayer is out of scope, but that some work was done here. If a user activates or deactivates a workflow on the leader, the follower will see the change in real time if looking at the same workflow on canvas, and vice versa. This is not a full multiplayer implementation but it might be nice to keep since it's simple and already working.

Copy link
Contributor

@krynble krynble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment from me.

packages/cli/src/ActivationErrors.service.ts Show resolved Hide resolved
…tance (#7735)

This PR accounts for the edge case where the user on a FE connected to a
follower `main` instance leads to an activation failure. The follower
performs the DB update, skips activation in `ActiveWorkflows`, instructs
leader to perform activation in `ActiveWorkflows`, leader fails to
activate and reverts the DB update and communicates activation error to
follower, which broadcasts the event to all connected FEs, which update
their stores and display the activation error.

To reproduce, create a workflow with a trigger with an invalid
credential and attempt to activate on both leader and follower to
trigger the activation error - behavior should be identical on both.
@ivov
Copy link
Contributor Author

ivov commented Nov 17, 2023

@krynble The edge case you found yesterday should be covered by c23f7e8

Copy link
Contributor

@krynble krynble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, two very tiny changes

this.showError(
new Error(receivedData.data.errorMessage),
this.$locale.baseText('workflowActivator.showError.title', {
interpolate: { newStateName: 'deactivated' },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
interpolate: { newStateName: 'deactivated' },
interpolate: { newStateName: 'activated' },

The modal title is currently shown as Workflow could not be deactivated: which seems a bit misleading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see a3147a9

this.workflowsStore.setWorkflowInactive(receivedData.data.workflowId);
this.workflowsStore.setActive(false);

this.showError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add an if to check whether the currently visible workflow ID matches the one we just received. The way this is currently working, I tested with 3 main instances and if I click the toggle in one of the followers, the other displays the error even if another totally unrelated workflow is open.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see a3147a9

Copy link
Contributor

@krynble krynble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work, loved it!

Copy link

cypress bot commented Nov 17, 2023

5 flaky tests on run #2952 ↗︎

0 276 3 0 Flakiness 5

Details:

🌳 🖥️ browsers:node18.12.0-chrome107 🤖 ivov 🗃️ e2e/*
Project: n8n Commit: a3147a926a
Status: Passed Duration: 05:42 💡
Started: Nov 17, 2023 2:43 PM Ended: Nov 17, 2023 2:49 PM
Flakiness  12-canvas-actions.cy.ts • 1 flaky test

View Output Video

Test Artifacts
Canvas Actions > should execute node Screenshots Video
Flakiness  6-code-node.cy.ts • 1 flaky test

View Output Video

Test Artifacts
... > generate code button should have correct state & tooltips Screenshots Video
Flakiness  28-resource-mapper.cy.ts • 2 flaky tests

View Output Video

Test Artifacts
Resource Mapper > should correctly delete single field Screenshots Video
Resource Mapper > should correctly delete all fields Screenshots Video
Flakiness  24-ndv-paired-item.cy.ts • 1 flaky test

View Output Video

Test Artifacts
NDV > resolves expression with default item when input node is not parent, while still pairing items Screenshots Video

Review all test suite changes for PR #7643 ↗︎

Copy link
Contributor

✅ All Cypress E2E specs passed

Copy link
Contributor

✅ All Cypress E2E specs passed

@ivov ivov merged commit 4c40825 into master Nov 17, 2023
73 of 74 checks passed
@ivov ivov deleted the pay-926-part-3 branch November 17, 2023 14:58
MiloradFilipovic added a commit that referenced this pull request Nov 20, 2023
* master: (27 commits)
  fix: Include cypress TypeScript types in /cypress folder (no-changelog) (#7746)
  refactor(core): Stop reporting to Sentry `NodeApiError` outside 500 range (no-changelog) (#7753)
  fix(core): Guard against node not found on cancelling test webhook (#7750)
  fix(JotForm Trigger Node): Fix iteration on form loader (#7751)
  refactor(core): Stop reporting to Sentry unknown cred on mapping (no-changelog) (#7752)
  feat(core): Coordinate manual workflow activation and deactivation in multi-main scenario (#7643)
  ci: Fix "Release: Create Pull Request" workflow
  fix(editor): Fix Admin panel icon in the sidebar for cloud deployments (no-changelog) (#7738)
  fix(editor): Remove `n8nHooksNext` flag (no-changelog) (#7733)
  fix(editor): Show v1 banner dismiss button if owner (#7722)
  fix(GitHub Node): Fix issue preventing file edits on branches (#7734)
  fix(core): Fix all dependency versions for backend packages (no-changelog) (#7745)
  refactor(core): Convert dynamic node-parameter routes to a decorated controller (no-changelog) (#7284)
  refactor: Stop reporting to Sentry Facebook multi-webhook error (no-changelog) (#7743)
  refactor(core): Stop reporting to Sentry unrecognized node errors (no-changelog) (#7728)
  fix(core): Account for non-ASCII chars in filename on binary data download (#7742)
  ci: Fix DB tests and Workflow tests (no-changelog) (#7741)
  refactor: Extract Invitation routes to InvitationController (no-changelog) (#7726)
  fix(editor): Handle permission edge cases (empty scopes) (#7723)
  ci: Skip the regularly failing tests in 2-credentials.cy.ts (no-changelog) (#7736)
  ...
MiloradFilipovic added a commit that referenced this pull request Nov 22, 2023
* master: (89 commits)
  feat(core): Make postgres pool-size configurable (no-changelog) (#7772)
  refactor(core): Include execution progress in save settings (no-changelog) (#7769)
  refactor: Upgrade to TypeScript 5.3 (no-changelog) (#7768)
  fix(editor): Only show push to git menu item to owners (#7766)
  fix(editor): Use project diagram icon for worker view (#7764)
  fix(Item Lists Node): Don't check same type in remove duplicates operation (#7678)
  fix(editor): Move workerview entry into settings menu (#7761)
  fix(core): Ensure failed executions are saved in queue mode (#7744)
  docs: Update docs links for data transformation functions and ifEmpty (#7758)
  feat(editor): Add node context menu (#7620)
  feat: Add Creator hub link to Templates page (#7721)
  fix: Include cypress TypeScript types in /cypress folder (no-changelog) (#7746)
  refactor(core): Stop reporting to Sentry `NodeApiError` outside 500 range (no-changelog) (#7753)
  fix(core): Guard against node not found on cancelling test webhook (#7750)
  fix(JotForm Trigger Node): Fix iteration on form loader (#7751)
  refactor(core): Stop reporting to Sentry unknown cred on mapping (no-changelog) (#7752)
  feat(core): Coordinate manual workflow activation and deactivation in multi-main scenario (#7643)
  ci: Fix "Release: Create Pull Request" workflow
  fix(editor): Fix Admin panel icon in the sidebar for cloud deployments (no-changelog) (#7738)
  fix(editor): Remove `n8nHooksNext` flag (no-changelog) (#7733)
  ...
@github-actions github-actions bot mentioned this pull request Nov 22, 2023
ivov added a commit that referenced this pull request Nov 22, 2023
#
[1.18.0](https://github.com/n8n-io/n8n/compare/n8n@1.17.0...n8n@1.18.0)
(2023-11-22)


### Bug Fixes

* **core:** Account for non-ASCII chars in filename on binary data
download ([#7742](#7742))
([b4ebb1a](b4ebb1a))
* **core:** Correct permissions for getstatus
([#7724](#7724))
([f96c1d2](f96c1d2))
* **core:** Ensure failed executions are saved in queue mode
([#7744](#7744))
([b7c5c74](b7c5c74))
* **core:** Guard against node not found on cancelling test webhook
([#7750](#7750))
([6be453b](6be453b))
* **editor:** Handle permission edge cases (empty scopes)
([#7723](#7723))
([e2ffd39](e2ffd39))
* **editor:** Make sure LineController is registered with chart.js
([#7730](#7730))
([ebee1a5](ebee1a5))
* **editor:** Move workerview entry into settings menu
([#7761](#7761))
([366cd67](366cd67))
* **editor:** Only show push to git menu item to owners
([#7766](#7766))
([0d3d33d](0d3d33d))
* **editor:** Show v1 banner dismiss button if owner
([#7722](#7722))
([44d3b3e](44d3b3e))
* **editor:** Use project diagram icon for worker view
([#7764](#7764))
([ff0b651](ff0b651))
* **editor:** Validate user info before submiting
([#7608](#7608))
([2064f7f](2064f7f))
* **GitHub Node:** Fix issue preventing file edits on branches
([#7734](#7734))
([ce002a6](ce002a6))
* **Google Sheets Node:** Check for `null` before destructuring
([#7729](#7729))
([5d4a52d](5d4a52d))
* **Item Lists Node:** Don't check same type in remove duplicates
operation ([#7678](#7678))
([4f30764](4f30764))
* **JotForm Trigger Node:** Fix iteration on form loader
([#7751](#7751))
([82f3202](82f3202))


### Features

* Add Creator hub link to Templates page
([#7721](#7721))
([4dbae0e](4dbae0e))
* **core:** Coordinate manual workflow activation and deactivation in
multi-main scenario ([#7643](#7643))
([4c40825](4c40825))
* **editor:** Add node context menu
([#7620](#7620))
([8d12c1a](8d12c1a))
* **editor:** Node IO filter
([#7503](#7503))
([1881765](1881765))

Co-authored-by: ivov <ivov@users.noreply.github.com>
@janober
Copy link
Member

janober commented Nov 22, 2023

Got released with n8n@1.18.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team Released
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants