Skip to content

Conversation

@psychedelicious
Copy link
Contributor

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update
  • Community Node Submission

Have you discussed this change with the InvokeAI team?

  • Yes
  • No, because:

Have you updated all relevant documentation?

  • Yes
  • No, n/a

Description

When a queue item is popped for processing, we need to retrieve its session from the DB. Pydantic serializes the graph at this stage.

It's possible for a graph to have been made invalid during the graph preparation stage (e.g. an ancestor node executes, and its output is not valid for its successor node's input field).

When this occurs, the session in the DB will fail validation, but we don't have a chance to find out until it is retrieved and parsed by pydantic.

This logic was previously not wrapped in any exception handling.

Just after retrieving a session, we retrieve the specific invocation to execute from the session. It's possible that this could also have some sort of error, though it should be impossible for it to be a pydantic validation error (that would have been caught during session validation). There was also no exception handling here.

When either of these processes fail, the processor gets soft-locked because the processor's cleanup logic is never run. (I didn't dig deeper into exactly what cleanup is not happening, because the fix is to just handle the exceptions.)

This PR adds exception handling to both the session retrieval and node retrieval and events for each: session_retrieval_error and invocation_retrieval_error.

These events are caught and displayed in the UI as toasts, along with the type of the python exception (e.g. Validation Error). The events are also logged to the browser console.

Related Tickets & Documents

Closes #3860 , #3412

QA Instructions, Screenshots, Recordings

Create an valid graph that will become invalid during execution. Here's an example:
image

This is valid before execution, but the width field of the Noise node will end up with an invalid value (0). Previously, this would soft-lock the app and you'd have to restart it.

Now, with this graph, you will get an error toast, and the app will not get locked up.

Added/updated tests?

  • Yes (ish)
  • No

@Kyle0654 @brandonrising
It seems because the processor runs in its own thread, pytest cannot catch exceptions raised in the processor.

I added a test that does work, insofar as it does recreate the issue. But, because the exception occurs in a separate thread, the test doesn't see it. The result is that the test passes even without the fix.

So when running the test, we see the exception:

Exception in thread invoker_processor:
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/services/processor.py", line 50, in __process
    self.__invoker.services.graph_execution_manager.get(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/services/sqlite.py", line 79, in get
    return self._parse_item(result[0])

  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/services/sqlite.py", line 52, in _parse_item
    return parse_raw_as(item_type, item)
  File "pydantic/tools.py", line 82, in pydantic.tools.parse_raw_as
  File "pydantic/tools.py", line 38, in pydantic.tools.parse_obj_as
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__

But pytest doesn't actually see it as an exception. Not sure how to fix this, it's a bit beyond me.

[optional] Are there any post deployment tasks we need to perform?

nope don't think so

Copy link
Contributor

@Kyle0654 Kyle0654 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Only thing I'd have concern about is exposing full stack traces to the client. Probably fine for now, but you may want to make that optional via configuration.

@psychedelicious
Copy link
Contributor Author

I've just removed the test because the issue is lack of exception handling at a high level in the processor loop, as opposed to any deeper internal logic. Don't think we really need to test this.

When a queue item is popped for processing, we need to retrieve its session from the DB. Pydantic serializes the graph at this stage.

It's possible for a graph to have been made invalid during the graph preparation stage (e.g. an ancestor node executes, and its output is not valid for its successor node's input field).

When this occurs, the session in the DB will fail validation, but we don't have a chance to find out until it is retrieved and parsed by pydantic.

This logic was previously not wrapped in any exception handling.

Just after retrieving a session, we retrieve the specific invocation to execute from the session. It's possible that this could also have some sort of error, though it should be impossible for it to be a pydantic validation error (that would have been caught during session validation). There was also no exception handling here.

When either of these processes fail, the processor gets soft-locked because the processor's cleanup logic is never run. (I didn't dig deeper into exactly what cleanup is not happening, because the fix is to just handle the exceptions.)

This PR adds exception handling to both the session retrieval and node retrieval and events for each: `session_retrieval_error` and `invocation_retrieval_error`.

These events are caught and displayed in the UI as toasts, along with the type of the python exception (e.g. `Validation Error`). The events are also logged to the browser console.
@blessedcoolant blessedcoolant enabled auto-merge July 24, 2023 08:12
@blessedcoolant blessedcoolant merged commit d42c394 into main Jul 24, 2023
@blessedcoolant blessedcoolant deleted the feat/fix-soft-locks branch July 24, 2023 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug]: soft lock on crash related to number being not a multiple of 8

4 participants