Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Detail Missing in Execution View #8793

Open
dkindlund opened this issue Mar 2, 2024 · 6 comments
Open

Error Detail Missing in Execution View #8793

dkindlund opened this issue Mar 2, 2024 · 6 comments

Comments

@dkindlund
Copy link

Bug Description

I've been trying to troubleshoot random n8n workflow errors for multiple days now, and I'm getting frustrated by the lack of detail in the execution view that's offered by default. Let me explain -- take a look at this example error:

image

My questions are simply:
In this view, how can I figure out what was the underlying error? Which node do I click on? There's no individual warning icon indicating to me which node I should focus on.

If I zoom in to just the subset of nodes that are "green"...
image

If I click on each of those node details, I can't find the original error at all.

In fact, the only way for me to figure out the underlying error, is by setting up an "Error Workflow" and then reviewing the contents of that workflow's output -- but here's the thing... there's no forward link from the original workflow execution pointing to the corresponding Error Workflow execution that maps to the underlying error!

Instead, right now, I'm left having to piece together this puzzle manually based on manual Slack notifications I've setup -- joined by the workflow execution ID:
image

In short, I believe that this feature is misleading:
image
^ I assume that when it's enabled, the full error details of failed executions should also be saved, but it looks like that's not happening here.

To Reproduce

Generate any sort of workflow error and then try to figure out where the error is located.

Expected behavior

I should see all types of errors in failed executions -- including out of memory errors.

Operating System

Google Cloud Run

n8n Version

1.30.1

Node.js Version

18.10

Database

PostgreSQL

Execution mode

main (default)

@Joffcom
Copy link
Member

Joffcom commented Mar 2, 2024

Hey @dkindlund,

Looking at the workflow I would say the error occurred on the Airtable node but more information would be needed.

Looking at the output you collected from the error trigger that may also have been in the n8n log it would suggest the issue occurred because of a memory issue this means the node never really got a chance to start.

You are not wrong we really should put this information in the UI somewhere but as it is a workflow level error it wouldn't be right to put it under the node output so we would need to think about how to best display it.

I suspect when the workflow process runs out of memory though it doesn't have the memory to add that to the node which is why it isn't there. We should probably make it clearer as well that the settings are for the workflow itself and not the system in general.

This isn't really a bug but I will keep this open and get a dev ticket created on Monday to look into how we can improve this.

@dkindlund
Copy link
Author

Thanks for the analysis, @Joffcom -- I agree it's a hard problem. Just trying to offer a user's perspective about it for now. Thanks!

@dkindlund
Copy link
Author

One other point: When I checked Google Cloud Run's memory usage in the single container around the time when this out-of-memory error was reported, I see that only ~15% of the container's memory was actually allocated:
image

Then, when I checked the logs, I see this sort of activity:
image

So the timeline of events appear to be:

  • The container crashed earlier (not sure why it got restarted by Google Cloud Run)
  • Upon recovery, the new container attempted to recover a the crashed job:
    Attempting to recover execution 3346
  • During the recovery of this job, it somehow ran out-of-memory

We're left with a bunch of questions/insights, such as:

  1. Why did the container crash to begin with? Looking through the older logs, there were no entries that provide any clues as to why the container crashed.

  2. When attempting to recover a crashed workflow, that recovery logic appears to trigger out-of-memory issues even though the container had more than enough memory allocated at the time. (I suspect there might be some sort of out-of-memory bug in n8n's workflow recovery logic that can be hard to pinpoint.)

@dkindlund
Copy link
Author

A couple of other data points about this n8n deployment:

  • It's a single container deployed in Google Cloud Run
  • Running n8n@1.30.1
  • Allocated 4 vCPUs and 2GB RAM
  • EXECUTIONS_MODE=regular
  • NODE_OPTIONS=--max-old-space-size=1536
    (Based on documented recommendations found here.)

@dkindlund
Copy link
Author

Oh, this might be a factor:
image

So essentially, Google Cloud Run can kill/restart the container at any time to run it at a cheaper rate -- so not necessarily because of any sort of n8n error.

I guess the main issue is: n8n's workflow recovery logic doesn't quite work correctly upon container restart -- hence the spurious out of memory errors we're seeing.

@Joffcom
Copy link
Member

Joffcom commented Mar 2, 2024

Ah yeah cpu will cause a similar message, we don’t have container restart logic to restart workflows though that is something that needs to be manually done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants