Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ui): reduce reconnect requests #5451

Merged
merged 7 commits into from
Jan 11, 2024

Conversation

psychedelicious
Copy link
Collaborator

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update
  • Community Node Submission

Have you discussed this change with the InvokeAI team?

  • Yes
  • No, because:

Have you updated all relevant documentation?

  • Yes
  • No

Description

See the changes in invokeai/frontend/web/src/app/store/middleware/listenerMiddleware/listeners/socketio/socketConnected.ts for well-commented recovery logic.

  • Add checks to the "recovery" logic for socket connect events to reduce the number of network requests.
  • Remove the isInitialized state from systemSlice and make it a nanostore local to the socketConnected listener. It didn't need to be global state. It's also now more clearly named isFirstConnection.
  • Export the queue status selector (minor improvement, memoizes it correctly).

Note: Currently re-fetch everything during a socket connect event. We don't need to re-fetch everything - we only need to re-fetch things related to the queue and gallery. At least, I think that's all we need to re-fetch. However, those are exactly the things that are resource-intensive. The other requests are much lighter.

Instead of attempting to figure out exactly the minimal number of requests to re-fetch, and possibly miss something or do it wrong (I'd need to include the default args for all the requests, took, which is not trivial), and need to maintain that list as the app changes, I have left it as a full API reset.

Related Tickets & Documents

Vite's dev server forcibly reloads the page if its own internal socket connection is closed, making working on reconnect behaviour impossible. I had to patch vite directly to get past this. There's an issue upstream, and I've included my patch there for posterity: vitejs/vite#5675

QA Instructions, Screenshots, Recordings

This is kinda tricky to fully test. You'll need simulate a network blip and either 1) patch vite per the above linked issue or 2) test this with a production build. Suggest just building and testing the build.

To simulate a network blip you an create a local SSH tunnel and kill it.

Keep the network tab open as you run these tests so you can see what requests are being made.

Test Case 1

First, test the case we have a queue running, we disconnect, a generation finishes, and then we reconnect.

In this scenario, there will be at least one image that is missing from the gallery, so we need to re-fetch to sync the UI w/ server.

  • Create a local SSH tunnel from a free port to 9090: ssh -L 1337:localhost:9090 localhost
  • Run the server
  • Access the app via localhost:1337
  • Add to the queue - say 5 iterations - and then kill the SSH tunnel
  • Wait until a generation finishes
  • Recreate the SSH tunnel

Once it automatically reconnects, you should see a handful of network requests as everything is re-fetched, then the image(s) that completed while disconnected should be added to the gallery per usual.

Test Case 2

Next, test the scenario where the reconnect happens but we are not generating anything.

In this scenario, we shouldn't re-fetch anything because the server state hasn't changed.

Picking up from where you left off a moment ago in the previous test case:

  • Cancel all generations (or let them finish)
  • Kill the SSH tunnel
  • Recreate the SSH tunnel
  • Check network tab

You should have no additional requests when it reconnects

Test Case 3

Finally, test the scenario where the reconnect happens during generation, but it all happens between the start and finish of a single generation.

In this scenario, the only thing that has changed between the disconnect and reconnect is denoising progress, which is a socket event - not HTTP request. So, we do not want to re-fetch everything.

You should see one extra request to get the queue status. This extra request is used to check if a last known queue items completed count is different from the new completed count. In this scenario, these counts are the same, so there are no further network requests.

Picking up from where we left off above:

  • Set steps pretty high if you are on a fast machine - you'll need time to reset your SSH tunnel
  • Queue some generations
  • Immediately kill the SSH tunnel
  • Immediately recreate it
  • If you managed to do this before that first generation completed, you should see a single network request to get the queue status, then the normal network activity that occurs during generation.

Merge Plan

This PR needs actual review and testing. It is not eligible for rubber-stamping. Once it is carefully reviewed and tested, it can be merged.

Added/updated tests?

  • Yes
  • No : n/a

@psychedelicious
Copy link
Collaborator Author

psychedelicious commented Jan 10, 2024

I've thoroughly tested this now.

I further optimized this by tagging all of the queries that might need to be re-fetched if we generated something while disconnected with 'FetchOnReconnect'. Only mutable objects - like lists of images in the gallery, image totals, etc - are tagged with it. Immutable objects, like image DTOs or image metadata, are not tagged.

(technically, ImageDTOs are mutable - their is_intermediate or board_id can be changed, but those changes occur only as a result of direct user action, which is not possible when you are disconnected).

The listener now invalidates that tag if it sees a change in the queue status. This is far more granular.

- Add checks to the "recovery" logic for socket connect events to reduce the number of network requests.
- Remove the `isInitialized` state from `systemSlice` and make it a nanostore local to the socketConnected listener. It didn't need to be global state. It's also now more clearly named `isFirstConnection`.
- Export the queue status selector (minor improvement, memoizes it correctly).
Add `FetchOnReconnect` tag, tagging relevant queries with it. This tag is invalidated in the socketConnected listener, when it is determined that the queue changed.
This retains the current good developer experience when working on the server - the UI should fully reset when you restart the server.
@psychedelicious psychedelicious merged commit ff09fd3 into main Jan 11, 2024
7 checks passed
@psychedelicious psychedelicious deleted the fix/ui/reduce-reconnect-requests branch January 11, 2024 01:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants