Feat(Model Manager): Add improved download manager with pause/resume partial download. by DustyShoe · Pull Request #8864 · invoke-ai/InvokeAI

DustyShoe · 2026-02-08T22:51:26Z

Summary

This PR adds few cool things:

Adds pause/resume for model downloads with proper “restart required” handling when servers refuse Range.
Persists install state so downloads can resume across restarts using the same temp folder.
Multi‑file installs now run sequentially per job (one file at a time) for more stable resume behavior.
Adds restart actions (job‑level and file‑level) plus clearer inline status in the UI.
Progress bar now aggregates job‑level bytes so it reflects total download progress during multi‑file installs.

Parts of this code were written with assistance, so I’d appreciate any fixes or improvements.

QA Instructions

It might be better to run this with debug logging enabled.

Start a multi‑file model install (e.g., Tongyi-MAI/Z-Image-Turbo).
Verify progress bar increases and tooltip shows X / Y for the full job.
Pause the download mid‑file, then resume.
Confirm the download continues (not restarted) and bytes increase from the same point.
Kill/restart backend while a download is paused; verify it resumes on start.
Force a resume‑refused case (server returns 200 to Range) and confirm:

file shows “Restart required”
job status becomes Paused
“Restart file” works and only that file restarts

Cancel a job and re‑install the same model; ensure a new temp folder is created and no old partial blocks it.

Merge Plan

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

…nager

JPPhoto · 2026-02-09T00:37:34Z

@DustyShoe Does this handle the case where the user pauses and either:

Quits Invoke and deletes the temporary file?
Keeps Invoke running and deletes the temporary file?

DustyShoe · 2026-02-09T00:41:35Z

@JPPhoto Ofc. you had to find worst case scenario...

Why user even do that?
User pauses, quits Invoke, deletes temp file
On resume, we look for the .downloading file. If it’s gone, resume_from=0 and we start a fresh download (no resume).
User pauses, keeps Invoke running, deletes temp file
The file is already closed on pause, so deleting it is fine. When the user resumes, same as above: no .downloading file, so it restarts from scratch.

JPPhoto · 2026-02-09T00:49:34Z

@DustyShoe We can't predict what users or their systems will do, so coding defensively and being resilient (to a point) is always good.

DustyShoe · 2026-02-09T01:00:46Z

@JPPhoto Have to admit, that was a good point actually. Went back and added explicit toast message if temp file was removed and user tries to restart download. Also there was a bug in status bar updating. It did never reset to 0 in that case.

Copilot

Pull request overview

This PR adds pause/resume functionality for model downloads with persistence across backend restarts. The implementation includes proper handling of servers that refuse byte-range requests, sequential multi-file downloads, and comprehensive UI controls for managing download state.

Changes:

Adds pause/resume API endpoints and UI controls for model downloads
Implements persistent install state using marker files to survive backend restarts
Changes multi-file downloads to run sequentially (one file at a time) instead of in parallel
Adds restart functionality for failed or non-resumable downloads with per-file granularity
Updates progress calculation to aggregate bytes across all files in multi-file installs

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
invokeai/frontend/web/src/services/api/schema.ts	Added pause/resume API types and download metadata fields for resume support
invokeai/frontend/web/src/services/api/endpoints/models.ts	Added pause, resume, restart failed, and restart file mutations
invokeai/frontend/web/src/features/modelManagerV2/subpanels/AddModelPanel/ModelInstallQueue/ModelInstallQueueItem.tsx	Added UI controls for pause/resume and restart, plus progress aggregation logic
invokeai/frontend/web/src/features/modelManagerV2/subpanels/AddModelPanel/ModelInstallQueue/ModelInstallQueueBadge.tsx	Added "paused" status badge
invokeai/frontend/web/public/locales/en.json	Added translations for pause/resume/restart messages
invokeai/frontend/web/openapi.json	Regenerated schema with new endpoints and types (includes unrelated changes)
invokeai/app/services/model_install/model_install_default.py	Implemented marker file persistence, pause/resume/restart methods, and incomplete install restoration
invokeai/app/services/model_install/model_install_common.py	Added PAUSED status and paused property to ModelInstallJob
invokeai/app/services/model_install/model_install_base.py	Added abstract methods for pause/resume/restart operations
invokeai/app/services/events/events_common.py	Added DownloadPausedEvent
invokeai/app/services/events/events_base.py	Added emit_download_paused method
invokeai/app/services/download/download_default.py	Implemented resume logic with byte-range requests and sequential multi-file downloads
invokeai/app/services/download/download_base.py	Added PAUSED status and pause-related fields to DownloadJob
invokeai/app/api/routers/model_manager.py	Added pause/resume/restart API endpoints

Comments suppressed due to low confidence (6)

invokeai/frontend/web/src/features/modelManagerV2/subpanels/AddModelPanel/ModelInstallQueue/ModelInstallQueueItem.tsx:229

The code checks installJob.status === 'completed' in the progressValue calculation, but the TypeScript type system should enforce the correct status values. However, for consistency with the rest of the codebase, verify that the status type from the API matches the expected InstallStatus enum values in all locations.

  const progressValue = useMemo(() => {
    if (installJob.status === 'completed' || installJob.status === 'error' || installJob.status === 'cancelled') {
      return 100;
    }

    const parts = installJob.download_parts;
    if (parts && parts.length > 0) {
      const totalBytesFromParts = parts.reduce((sum, part) => sum + (part.total_bytes ?? 0), 0);
      const currentBytesFromParts = parts.reduce((sum, part) => sum + (part.bytes ?? 0), 0);
      const totalBytes = Math.max(totalBytesFromParts, installJob.total_bytes ?? 0);
      const currentBytes = Math.max(currentBytesFromParts, installJob.bytes ?? 0);
      if (totalBytes > 0) {
        return (currentBytes / totalBytes) * 100;
      }
      return 0;
    }

    if (!isNil(installJob.bytes) && !isNil(installJob.total_bytes) && installJob.total_bytes > 0) {
      return (installJob.bytes / installJob.total_bytes) * 100;
    }

    return null;
  }, [installJob.bytes, installJob.download_parts, installJob.status, installJob.total_bytes]);

invokeai/frontend/web/src/features/modelManagerV2/subpanels/AddModelPanel/ModelInstallQueue/ModelInstallQueueItem.tsx:83

The error handlers access error.data.detail without checking if error.data exists first. If the error object doesn't have a data property, this will throw an uncaught exception. Add a null check: error?.data?.detail or provide a fallback error message. This pattern is repeated in all error handlers (pause, resume, restart failed, restart file).

      .catch((error) => {
        if (error) {
          toast({
            id: 'MODEL_INSTALL_PAUSE_FAILED',
            title: `${error.data.detail} `,
            status: 'error',
          });
        }
      });

invokeai/frontend/web/src/features/modelManagerV2/subpanels/AddModelPanel/ModelInstallQueue/ModelInstallQueueItem.tsx:165

Same issue with error handling - accessing error.data.detail without null checks. This occurs in the resume, restart failed, and restart file handlers as well.

      .catch((error) => {
        if (error) {
          toast({
            id: 'MODEL_INSTALL_RESUME_FAILED',
            title: `${error.data.detail} `,
            status: 'error',
          });
        }
      });
  }, [hasRestartedFromScratch, installJob, resumeModelInstall]);

  const handleRestartFailed = useCallback(() => {
    restartFailedModelInstall(installJob.id)
      .unwrap()
      .then(() => {
        toast({
          id: 'MODEL_INSTALL_RESTART_FAILED',
          title: t('toast.modelDownloadRestartFailed'),
          status: 'success',
        });
      })
      .catch((error) => {
        if (error) {
          toast({
            id: 'MODEL_INSTALL_RESTART_FAILED_ERROR',
            title: `${error.data.detail} `,
            status: 'error',
          });
        }
      });
  }, [installJob.id, restartFailedModelInstall]);

  const handleRestartFile = useCallback(
    (fileSource: string) => {
      restartModelInstallFile({ id: installJob.id, file_source: fileSource })
        .unwrap()
        .then(() => {
          toast({
            id: 'MODEL_INSTALL_RESTART_FILE',
            title: t('toast.modelDownloadRestartFile'),
            status: 'success',
          });
        })
        .catch((error) => {
          if (error) {
            toast({
              id: 'MODEL_INSTALL_RESTART_FILE_ERROR',
              title: `${error.data.detail} `,
              status: 'error',
            });
          }

invokeai/app/services/model_install/model_install_default.py:562

The pause_job, resume_job, restart_failed, and restart_file methods access and modify shared state (job status, multifile_job) without holding self._lock. This could lead to race conditions if these methods are called concurrently with download callbacks or other operations. Consider adding lock protection similar to what's used in cancel_job and other methods that modify job state.

    def pause_job(self, job: ModelInstallJob) -> None:
        """Pause the indicated job, preserving partial downloads."""
        if job.in_terminal_state:
            return
        job.status = InstallStatus.PAUSED
        self._logger.warning(f"Pausing {job.source}")
        if dj := job._multifile_job:
            for part in dj.download_parts:
                self._download_queue.pause_job(part)
        self._write_install_marker(job, status=InstallStatus.PAUSED)

    def resume_job(self, job: ModelInstallJob) -> None:
        """Resume a previously paused job."""
        if not job.paused:
            return
        self._logger.info(f"Resuming {job.source}")
        self._resume_remote_download(job)

    def restart_failed(self, job: ModelInstallJob) -> None:
        """Restart failed or non-resumable downloads for a job."""
        if not isinstance(job.source, (HFModelSource, URLModelSource)):
            return
        if not job.download_parts:
            return
        if not any(part.resume_required or part.errored for part in job.download_parts):
            return
        sources_to_restart = {str(part.source) for part in job.download_parts if not part.complete}
        if not sources_to_restart:
            return
        job.status = InstallStatus.WAITING
        remote_files, metadata = self._remote_files_from_source(job.source)
        remote_files = [rf for rf in remote_files if str(rf.url) in sources_to_restart]
        subfolders = job.source.subfolders if isinstance(job.source, HFModelSource) else []
        self._enqueue_remote_download(
            job=job,
            source=job.source,
            remote_files=remote_files,
            metadata=metadata,
            destdir=job._install_tmpdir or job.local_path,
            subfolder=job.source.subfolder if isinstance(job.source, HFModelSource) and len(subfolders) <= 1 else None,
            subfolders=subfolders if len(subfolders) > 1 else None,
            clear_partials=True,
        )

    def restart_file(self, job: ModelInstallJob, file_source: str) -> None:
        """Restart a specific file download for a job."""
        if not isinstance(job.source, (HFModelSource, URLModelSource)):
            return
        job.status = InstallStatus.WAITING
        remote_files, metadata = self._remote_files_from_source(job.source)
        remote_files = [rf for rf in remote_files if str(rf.url) == file_source]
        if not remote_files:
            return
        subfolders = job.source.subfolders if isinstance(job.source, HFModelSource) else []
        self._enqueue_remote_download(
            job=job,
            source=job.source,
            remote_files=remote_files,
            metadata=metadata,
            destdir=job._install_tmpdir or job.local_path,
            subfolder=job.source.subfolder if isinstance(job.source, HFModelSource) and len(subfolders) <= 1 else None,
            subfolders=subfolders if len(subfolders) > 1 else None,
            clear_partials=True,
        )

invokeai/app/services/download/download_default.py:231

The _submit_next_mfd_part method accesses and modifies self._mfd_pending[job.id] and self._mfd_active[job.id] without lock protection. Since this method is called from multiple callbacks (_mfd_complete) which run in worker threads, there's potential for race conditions. Consider adding lock protection around the manipulation of these shared data structures.

    def _submit_next_mfd_part(self, job: MultiFileDownloadJob) -> None:
        pending = self._mfd_pending.get(job.id, [])
        if not pending:
            return
        if self._mfd_active.get(job.id) is not None:
            return
        download_job = pending.pop(0)
        self._mfd_active[job.id] = download_job
        self.submit_download_job(
            download_job,
            on_start=self._mfd_started,
            on_progress=self._mfd_progress,
            on_complete=self._mfd_complete,
            on_cancelled=self._mfd_cancelled,
            on_error=self._mfd_error,
        )

invokeai/frontend/web/openapi.json:175

This PR includes unrelated changes to openapi.json that appear to be from other features (orphaned models detection/deletion API endpoints, FLUX model loader changes, DyPE preset modifications). These changes are not mentioned in the PR description and may have been inadvertently included from a schema regeneration. Consider whether these should be in a separate PR or if the PR description should be updated to reflect all changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

invokeai/app/services/download/download_default.py

lstein

Tested and it works well!

Suggestions:

Would it be possible to add "pause all/resume all" and "cancel all" buttons to the install queue title bar, just to the left of "Prune"? Particularly after resuming from a crash, it would be great to be able to resume all the partial downloads with one click.
I found that if I killed and restarted the backend while a file download was occurring, the backend would put the downloads into a "pause" state, but the frontend didn't update to show the new status. I had to pause and then resume each file, or else refresh the whole page. Could it be possible for the frontend to update its download queue display after a backend restart, or even automatically restart the download going?

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

DustyShoe · 2026-02-21T16:11:27Z

@lstein

Thanks for the suggestion — that makes sense. I’ll add those buttons.
This behavior already exists in other parts of the frontend. For example, if there were staged generations on the canvas and the backend is restarted, the frontend does not automatically refresh its state. The generation may complete, but the final image is not displayed until the page is refreshed. That’s also why using the launcher is recommended — it closes the frontend when the backend stops.

Given that this is an existing pattern, I treated the download state the same way. That said, I’ll see whether the download state can be updated after a backend restart without introducing major changes.

DustyShoe · 2026-02-23T00:01:15Z

@lstein

I think the cleanest UX would be to make “Pause All / Resume All” a single toggle button that switches based on the current queue state.

I’ve done some additional testing and found that the behavior occurs only when the backend is stopped via Ctrl-C.

In that case, the backend explicitly pauses active download jobs during shutdown. However, the frontend does not receive the final status update because the socket connection is already closed at that moment.

If the backend is terminated via window close (X) or Task Manager, the jobs are not explicitly paused on shutdown. After restart, the backend restores and resumes the in-progress downloads automatically, which is why they continue without user interaction.

I propose to keep the current behavior and add an explicit re-sync on reconnect when jobs were paused during a graceful shutdown. This way, the UI will correctly reflect the paused state, and the user can resume them using the “Resume All” action.

Additionally, I added a “Backend disconnected” indicator to the downloader title bar.
It is shown when the backend crashes or is closed, and disappears once the connection is restored.

lstein · 2026-02-23T04:28:11Z

Looking good. I'll do just a little more testing tomorrow before approving.

lstein

Working as advertised. Great enhancement!

DustyShoe and others added 7 commits February 7, 2026 05:46

Refine messaging and pause behavior

e00bed8

Improved resume download behavior

4d213e7

Merge branch 'invoke-ai:main' into Feat(Backend)/improved-download-ma…

40a3cf2

…nager

Syntax fix

fda822f

Formatting

806ee9d

Improved partial download recovering

e6f3e88

fix(downloads): resume integrity, serialized parts, and UI feedback

5ae4bbd

DustyShoe requested review from JPPhoto, Pfannkuchensack, blessedcoolant, dunkeroni and lstein as code owners February 8, 2026 22:51

github-actions bot added api python PRs that change python files services PRs that change app services frontend PRs that change frontend files labels Feb 8, 2026

DustyShoe added 4 commits February 9, 2026 01:53

Fix download test expectations and multifile totals

a076f99

Ruff appease

0c6da97

schema updates

c3d2d35

schema fix

0b33567

DustyShoe added 3 commits February 9, 2026 03:34

Added toast msg if partial file was deleted.

2343ad8

Formatting

c1d7eda

Fixed "missing temp file" message pop up

eddec8f

DustyShoe changed the title ~~Feat(backend): Add improved download manager with pause/resume partial download.~~ Feat(Model Manager): Add improved download manager with pause/resume partial download. Feb 12, 2026

lstein self-assigned this Feb 16, 2026

lstein added the v6.13.x label Feb 20, 2026

lstein requested a review from Copilot February 21, 2026 14:45

Copilot started reviewing on behalf of lstein February 21, 2026 14:45 View session

Copilot AI reviewed Feb 21, 2026

View reviewed changes

invokeai/app/services/download/download_default.py Outdated Show resolved Hide resolved

lstein requested changes Feb 21, 2026

View reviewed changes

Update invokeai/app/services/download/download_default.py

e0546fe

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

DustyShoe and others added 2 commits February 22, 2026 19:12

Merge branch 'main' into Feat(Backend)/improved-download-manager

21bc2c2

Fix: Add bulk action buttons and force resync on backend reconnect.

1f0a33d

lstein approved these changes Feb 24, 2026

View reviewed changes

Merge branch 'main' into Feat(Backend)/improved-download-manager

97784e2

lstein enabled auto-merge (squash) February 24, 2026 02:26

lstein merged commit b9f9015 into invoke-ai:main Feb 24, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Feat(Model Manager): Add improved download manager with pause/resume partial download.#8864

Feat(Model Manager): Add improved download manager with pause/resume partial download.#8864
lstein merged 18 commits intoinvoke-ai:mainfrom
DustyShoe:Feat(Backend)/improved-download-manager

DustyShoe commented Feb 8, 2026 •

edited

Loading

Uh oh!

JPPhoto commented Feb 9, 2026

Uh oh!

DustyShoe commented Feb 9, 2026

Uh oh!

JPPhoto commented Feb 9, 2026

Uh oh!

DustyShoe commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

lstein left a comment

Uh oh!

DustyShoe commented Feb 21, 2026

Uh oh!

DustyShoe commented Feb 23, 2026

Uh oh!

lstein commented Feb 23, 2026

Uh oh!

lstein left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

DustyShoe commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

QA Instructions

Merge Plan

Checklist

Uh oh!

JPPhoto commented Feb 9, 2026

Uh oh!

DustyShoe commented Feb 9, 2026

Uh oh!

JPPhoto commented Feb 9, 2026

Uh oh!

DustyShoe commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

lstein left a comment

Choose a reason for hiding this comment

Uh oh!

DustyShoe commented Feb 21, 2026

Uh oh!

DustyShoe commented Feb 23, 2026

Uh oh!

lstein commented Feb 23, 2026

Uh oh!

lstein left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DustyShoe commented Feb 8, 2026 •

edited

Loading