Skip to content

Optimize deploy upload with delta file transfer#670

Merged
evanphx merged 2 commits intomainfrom
mir-668-optimize-deploy-upload
Mar 13, 2026
Merged

Optimize deploy upload with delta file transfer#670
evanphx merged 2 commits intomainfrom
mir-668-optimize-deploy-upload

Conversation

@evanphx
Copy link
Copy Markdown
Contributor

@evanphx evanphx commented Mar 12, 2026

Summary

  • Adds manifest-first deploy flow: client sends file hashes, server compares against cached source code image, client uploads only changed files
  • New RPC methods prepareUpload and buildFromPrepared on the Builder service
  • Source code cache stored per-app at {DataPath}/source-code/{app}/ with atomic writes to prevent corruption
  • Refactored MakeTar/MakeFilteredTar to share a common makeTarWithFilter implementation with lazy directory emission
  • Deploy UI shows cache reuse stats on the "Upload artifacts" phase line
  • Falls back gracefully to full upload if server doesn't support the new methods or if manifest computation fails

Test plan

  • Unit tests for ComputeManifest and MakeFilteredTar in pkg/tarx/tarx_test.go
  • Unit tests for source cache roundtrip, staging, overwrite in servers/build/source_cache_test.go
  • Test for directory emission behavior (only dirs containing accepted files are included)
  • make test passes (3579 tests)
  • make lint clean
  • Manual test: deploy app twice, verify "Reused N/M files" on second deploy

@evanphx evanphx requested a review from a team as a code owner March 12, 2026 20:12
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 12, 2026

📝 Walkthrough

Walkthrough

Adds a manifest-based delta-upload workflow and per-app source caching to the builder. New RPC types FileManifestEntry and PrepareUploadResult and RPC methods PrepareUpload and BuildFromPrepared were introduced. Server-side builder now tracks upload sessions, stages prepared uploads, and can build from prepared sessions; NewBuilder gains a dataPath parameter and Builder contains a DataPath field. pkg/tarx gained manifest computation and filtered-tar utilities. CLI deploy was updated to use prepared uploads when available. Tests added for tarx and the source cache.

📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
pkg/tarx/tarx_test.go (1)

374-389: Determinism test is valuable but could be strengthened.

The test verifies hash consistency across runs, which is essential for delta comparison. Consider adding a brief delay or file modification between runs to ensure the test isn't trivially passing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tarx/tarx_test.go` around lines 374 - 389, In
TestComputeManifestDeterministic, strengthen the determinism check by changing
the file system state between ComputeManifest calls (e.g., sleep briefly and
update the test file's modification time or rewrite the same content) so the
second ComputeManifest(tmpDir, nil) runs after a visible change; keep assertions
the same to ensure ComputeManifest still returns identical hashes. Use the
existing TestComputeManifestDeterministic test and the ComputeManifest function
as anchors when making the change.
pkg/tarx/tarx.go (1)

326-354: Lazy directory emission logic is correct but silently ignores errors.

Lines 343-349 silently continue on os.Stat or tar.FileInfoHeader errors for parent directories. While this prevents one bad directory from blocking the entire tar, consider logging these errors for debugging.

Consider logging directory emission errors
 				dirInfo, dirErr := os.Stat(filepath.Join(dir, dirPath))
 				if dirErr != nil {
+					// Log but continue - parent dir might be missing in filtered scenarios
 					continue
 				}
 				dirHdr, hdrErr := tar.FileInfoHeader(dirInfo, "")
 				if hdrErr != nil {
+					// Log but continue
 					continue
 				}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tarx/tarx.go` around lines 326 - 354, The parent-directory emission loop
silently swallows errors from os.Stat and tar.FileInfoHeader (inside the loop
that uses parts, dirPath, emittedDirs and writes with tw.WriteHeader), making
debugging hard; change the two continue branches to log their errors (including
dirPath and the returned error) before continuing — use the project's logger or
log.Printf if none exists — so os.Stat(filepath.Join(dir, dirPath)) and
tar.FileInfoHeader(dirInfo, "") failures are recorded with clear context while
still continuing to the next parent directory.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@servers/build/build.go`:
- Around line 919-933: BuildFromPrepared currently consumes a prepared session
without re-checking per-app authorization; before calling
b.sessions.LoadAndDelete (or immediately after retrieving sess but before using
sess.appName), perform the same app authorization check used in PrepareUpload
(e.g., call rpc.AllowApp or the equivalent auth helper) against sess.appName and
the request context, and return an authorization error if it fails; ensure the
check runs before any session state is consumed or sess.cancelFunc/dir cleanup
is invoked so unauthorized callers cannot use a valid sessionID to build for
another app.
- Around line 870-884: The session stored after PrepareUpload only saves dir and
appName so BuildFromPrepared can't detect missing uploads; fix by adding the
negotiated delta set (the "needed" slice/paths returned by
stageMatchingFiles/PrepareUpload) to buildSession (e.g., expectedPaths or
neededPaths) when creating the session, and then in BuildFromPrepared validate
that each expected path exists in the extracted tempDir before calling
buildFromDir; if any are missing, clean up and return an error. Ensure you
update the buildSession struct and references in PrepareUpload and
BuildFromPrepared to use the new field.

In `@servers/build/source_cache.go`:
- Around line 216-235: The Walk callback opens each regular file with
os.Open(path) and defers f.Close(), which delays closing until after Walk
returns and can exhaust file descriptors; replace defer f.Close() with an
explicit f.Close() immediately after io.Copy(w, f) (ensure it still runs on copy
error by closing in a short defer-like pattern or via a named cleanup call),
keeping the existing error handling and manifest assignment (references: the
local variables f/fErr, os.Open, io.Copy(w, f), f.Close(), manifest and
cachedFileInfo).

---

Nitpick comments:
In `@pkg/tarx/tarx_test.go`:
- Around line 374-389: In TestComputeManifestDeterministic, strengthen the
determinism check by changing the file system state between ComputeManifest
calls (e.g., sleep briefly and update the test file's modification time or
rewrite the same content) so the second ComputeManifest(tmpDir, nil) runs after
a visible change; keep assertions the same to ensure ComputeManifest still
returns identical hashes. Use the existing TestComputeManifestDeterministic test
and the ComputeManifest function as anchors when making the change.

In `@pkg/tarx/tarx.go`:
- Around line 326-354: The parent-directory emission loop silently swallows
errors from os.Stat and tar.FileInfoHeader (inside the loop that uses parts,
dirPath, emittedDirs and writes with tw.WriteHeader), making debugging hard;
change the two continue branches to log their errors (including dirPath and the
returned error) before continuing — use the project's logger or log.Printf if
none exists — so os.Stat(filepath.Join(dir, dirPath)) and
tar.FileInfoHeader(dirInfo, "") failures are recorded with clear context while
still continuing to the next parent directory.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 532942f7-9214-4db0-9642-7112b8fa0b2d

📥 Commits

Reviewing files that changed from the base of the PR and between 915ad02 and 115eea8.

📒 Files selected for processing (10)
  • api/build/build_v1alpha/rpc.gen.go
  • api/build/rpc.yml
  • cli/commands/deploy.go
  • cli/commands/deploy_ui.go
  • components/coordinate/coordinate.go
  • pkg/tarx/tarx.go
  • pkg/tarx/tarx_test.go
  • servers/build/build.go
  • servers/build/source_cache.go
  • servers/build/source_cache_test.go

Copy link
Copy Markdown
Contributor

@phinze phinze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude and I dug into this one together — traced the full flow from manifest computation through cache staging and the new build paths. I've already dubbed this "mirsync" in my head.

Really like the design. The fallback-to-full-upload approach means this is pure upside with no risk to existing deploys, and the buildFromDir extraction is a clean refactor.

CodeRabbit flagged a few things (auth check on BuildFromPrepared, session path tracking, deferred close in Walk) — skipping those here since you've seen them.

Two small optional nits below. Also, food for thought: now that we have per-file sizes in the manifest and the cached layer.tar.gz on disk, we've got everything needed to estimate compression ratio and show real upload progress bars with ETAs + rsync-style speedup stats. Going to file a follow-up for that.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
servers/build/source_cache.go (2)

193-198: Consider checking Chmod error.

The Chmod call on line 197 silently ignores any error. While permission failures are rare and the file is still usable, logging or returning the error would help diagnose issues on restrictive filesystems.

🔧 Proposed fix
 		if _, err := io.Copy(outFile, tr); err != nil {
 			outFile.Close()
 			return err
 		}
-		outFile.Chmod(os.FileMode(hdr.Mode) & os.ModePerm)
+		if err := outFile.Chmod(os.FileMode(hdr.Mode) & os.ModePerm); err != nil {
+			outFile.Close()
+			return fmt.Errorf("chmod %s: %w", outPath, err)
+		}
 		outFile.Close()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@servers/build/source_cache.go` around lines 193 - 198, The outFile.Chmod call
currently ignores any returned error; modify the block in the function that
performs io.Copy (the code using io.Copy(outFile, tr) then
outFile.Chmod(os.FileMode(hdr.Mode) & os.ModePerm) and outFile.Close) to capture
the Chmod error, ensure outFile is closed on all paths, and either return the
Chmod error (or log it and return a wrapped error) so permission failures are
not silently ignored; preserve the existing behavior of closing outFile (use
defer or explicit closes) and return the captured error if non-nil.

48-53: Potential panic if unlock called for non-existent app.

If unlock is called for an app that was never locked (e.g., due to a coding error), the map lookup will return nil, and l.Unlock() will panic.

While currently the code always pairs lock/unlock correctly, adding a defensive check prevents future bugs.

🛡️ Proposed defensive check
 func (al *appLocks) unlock(app string) {
 	al.mu.Lock()
-	l := al.locks[app]
+	l, ok := al.locks[app]
 	al.mu.Unlock()
+	if !ok {
+		return // or log a warning
+	}
 	l.Unlock()
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@servers/build/source_cache.go` around lines 48 - 53, The unlock method may
panic if the map lookup returns nil; update appLocks.unlock to defensively
handle a missing lock by checking the retrieved pointer (l) from the locks map
before calling l.Unlock(). Specifically, inside appLocks.unlock (which uses
al.mu and al.locks), hold al.mu while reading al.locks[app], if l == nil then
release al.mu and return (or log a warning) instead of calling l.Unlock();
otherwise release al.mu and call l.Unlock() as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@servers/build/source_cache.go`:
- Around line 193-198: The outFile.Chmod call currently ignores any returned
error; modify the block in the function that performs io.Copy (the code using
io.Copy(outFile, tr) then outFile.Chmod(os.FileMode(hdr.Mode) & os.ModePerm) and
outFile.Close) to capture the Chmod error, ensure outFile is closed on all
paths, and either return the Chmod error (or log it and return a wrapped error)
so permission failures are not silently ignored; preserve the existing behavior
of closing outFile (use defer or explicit closes) and return the captured error
if non-nil.
- Around line 48-53: The unlock method may panic if the map lookup returns nil;
update appLocks.unlock to defensively handle a missing lock by checking the
retrieved pointer (l) from the locks map before calling l.Unlock().
Specifically, inside appLocks.unlock (which uses al.mu and al.locks), hold al.mu
while reading al.locks[app], if l == nil then release al.mu and return (or log a
warning) instead of calling l.Unlock(); otherwise release al.mu and call
l.Unlock() as before.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a53a8a54-e036-406e-a43d-6910efb12d6c

📥 Commits

Reviewing files that changed from the base of the PR and between 115eea8 and febd363.

📒 Files selected for processing (3)
  • servers/build/build.go
  • servers/build/source_cache.go
  • servers/build/source_cache_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • servers/build/source_cache_test.go

evanphx added 2 commits March 12, 2026 19:06
Instead of uploading a complete tar of all source files on every deploy,
the client now sends a file manifest first, lets the server compare against
a cached source code image, and only uploads changed files. This significantly
reduces upload time for iterative development where typically only a few files
change between deploys.

New RPC methods: prepareUpload and buildFromPrepared. The original buildFromTar
remains unchanged for backward compatibility and fallback.
- Add app authorization check in BuildFromPrepared
- Fix deferred file close leak inside Walk callback in saveSourceImage
- Add debug log for corrupt manifest.json in source cache
- Reorder rename calls for safer crash recovery (manifest before layer)
- Add per-app mutex to serialize concurrent cache reads and writes
@evanphx evanphx force-pushed the mir-668-optimize-deploy-upload branch from febd363 to 8740097 Compare March 13, 2026 02:09
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/tarx/tarx.go (1)

265-378: ⚠️ Potential issue | 🟠 Major

Error from filepath.Walk is silently discarded in the goroutine.

The goroutine at line 272 calls filepath.Walk but ignores its return value. If the walk fails (e.g., permission denied on a subdirectory), the error is lost, and the caller receives a valid-looking but incomplete tar stream.

Additionally, using defer f.Close() inside the loop (line 371) accumulates deferred closures until the goroutine exits, which can exhaust file descriptors on large directory trees.

🛠️ Suggested direction
 go func() {
+    var walkErr error
     defer w.Close()
     defer tw.Close()
     defer gzw.Close()

     emittedDirs := make(map[string]bool)

-    filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
+    walkErr = filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
         // ... existing code ...

         if info.Mode().IsRegular() {
             f, fErr := os.Open(path)
             if fErr != nil {
                 return fErr
             }
-            defer f.Close()
             if _, err := io.Copy(tw, f); err != nil {
+                f.Close()
                 return err
             }
+            f.Close()
         }

         return nil
     })
+    if walkErr != nil {
+        // Consider closing the pipe with an error so the reader sees it
+        w.CloseWithError(walkErr)
+    }
 }()

Using *os.File with an explicit Close() call after io.Copy avoids accumulating deferred closures. To propagate walkErr to the reader, use io.Pipe() which supports CloseWithError.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tarx/tarx.go` around lines 265 - 378, The goroutine ignores the error
returned by filepath.Walk and defers file closes inside the loop, which can leak
descriptors; change the code so you capture the error returned by
filepath.Walk(...) and if non-nil call the pipe writer's CloseWithError(err)
(w.CloseWithError) instead of silently returning, and replace the deferred
f.Close() inside the file-handling branch with an explicit f.Close() immediately
after io.Copy completes (use a short-lived close and handle its error), ensuring
emittedDirs/tw/gzw are still closed in the goroutine's defer but the walk error
is propagated to the reader.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/tarx/tarx.go`:
- Around line 342-349: The code silently skips errors when calling os.Stat
(dirErr) and tar.FileInfoHeader (hdrErr) for parent directories (variables
dirInfo/dirErr and dirHdr/hdrErr), which can produce malformed tars; change the
behavior to surface failures instead of silently continuing by returning or
wrapping the error (with context including dirPath and the operation) from the
surrounding tar creation function, or at minimum log the error and fail the tar
write; ensure you handle the error paths for both os.Stat and tar.FileInfoHeader
consistently so parent directory entries are created or the caller is informed
of the failure.

---

Outside diff comments:
In `@pkg/tarx/tarx.go`:
- Around line 265-378: The goroutine ignores the error returned by filepath.Walk
and defers file closes inside the loop, which can leak descriptors; change the
code so you capture the error returned by filepath.Walk(...) and if non-nil call
the pipe writer's CloseWithError(err) (w.CloseWithError) instead of silently
returning, and replace the deferred f.Close() inside the file-handling branch
with an explicit f.Close() immediately after io.Copy completes (use a
short-lived close and handle its error), ensuring emittedDirs/tw/gzw are still
closed in the goroutine's defer but the walk error is propagated to the reader.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8aa21a85-875d-4e98-851d-8f8ab3e1b1ab

📥 Commits

Reviewing files that changed from the base of the PR and between febd363 and 8740097.

📒 Files selected for processing (10)
  • api/build/build_v1alpha/rpc.gen.go
  • api/build/rpc.yml
  • cli/commands/deploy.go
  • cli/commands/deploy_ui.go
  • components/coordinate/coordinate.go
  • pkg/tarx/tarx.go
  • pkg/tarx/tarx_test.go
  • servers/build/build.go
  • servers/build/source_cache.go
  • servers/build/source_cache_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • servers/build/source_cache.go

@evanphx evanphx merged commit 9cec31f into main Mar 13, 2026
13 checks passed
@evanphx evanphx deleted the mir-668-optimize-deploy-upload branch March 13, 2026 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants