Skip to content

Conversation

@zerox80
Copy link
Contributor

@zerox80 zerox80 commented Nov 24, 2025

Graceful Upload Cancellation

Summary

This PR introduces a robust "Graceful Stop" mechanism for file uploads. It ensures that when an upload is cancelled (by the user or the system), the process terminates cleanly without leaving the database or file system in an inconsistent state.

Important

Dependency Note: This branch is built on top of PR #36 (TUS Implementation).
You will see TUS-related commits in the history (e.g., b7bae7d), but they are NOT part of this review.
This PR requires the TUS implementation as a foundation. Once PR #36 is merged/rebased, this branch will be updated.
Please review ONLY the cancellation-specific commits listed below.


🎯 Review Scope

Please focus your review on these groups of changes:

Group 1: Cancellation Logic (Core Feature)

Commit Type Description
df1562c feat 🛑 Core Feature: Cancellation support for TUS & standard uploads
a276f60 fix 🔧 Cleanup: Remove undefined foregroundJob.cancel() calls
021f314 fix 🔧 Cleanup: Remove stale foreground cancellation logic

Group 2: Misc Fixes (Included in this branch)

Commit Type Description
07e0c0c fix 🐛 FAB Crash: Fix vector tint crash on API 34/35
e8c2971 chore 🎨 Detekt: Fix trailing whitespace in workers
c5f67dc chore 🎨 Detekt: Collapse nested if in LoginActivity

🛠️ Technical Deep Dive

1. The isStopped() Checkpoint

The Problem:
Standard WorkManager cancellation sends a signal, but if the code is stuck in a loop uploading chunks, it might ignore the signal until the loop finishes.

The Solution:
We injected isStopped() checks inside the critical upload loops.

// TusUploadHelper.kt
while (offset < fileSize) {
    // 🛑 CHECKPOINT: Did user press cancel?
    if (cancelled || isStopped()) {
        Timber.i("Upload cancelled at offset %d", offset)
        throw CancellationException("User requested cancel")
    }

    // Upload next chunk...
    uploadChunk(offset)
}

2. Handling CancellationException

The Problem:
Previously, a cancellation was often caught by a generic catch (e: Exception) block, causing the app to mark the upload as FAILED (red error icon) instead of CANCELLED (paused state).

The Solution:
We now explicitly catch CancellationException in the Worker.

// UploadFileFromContentUriWorker.kt
try {
    performUpload()
} catch (e: CancellationException) {
    // ✅ Correctly mark as cancelled in DB
    transferRepository.updateStatus(id, TransferStatus.TRANSFER_CANCELLED)
    Result.failure() 
} catch (e: IOException) {
    // ❌ Network error -> Retry
    Result.retry()
}

3. Resource Cleanup

The Problem:
Cancelling an upload abruptly could leave file streams open, locking the file on disk.

The Solution:
We implemented strict try-finally blocks to ensure streams are closed immediately upon cancellation.


🧪 Verification

  • Manual Cancel: Started 500MB upload, pressed "Cancel" at 50%. Result: Upload stopped immediately, UI showed "Cancelled".
  • Resume: Resumed the cancelled upload. Result: TUS resumed from 50% (server offset) instead of restarting.
  • System Kill: Force-stopped app during upload. Result: Database state remained consistent (not stuck in "In Progress").

@zerox80 zerox80 force-pushed the upload-cancellation-clean branch 5 times, most recently from 9c880b0 to 07e0c0c Compare November 25, 2025 10:21
@zerox80 zerox80 force-pushed the upload-cancellation-clean branch 2 times, most recently from 2a2b373 to 28cf181 Compare November 26, 2025 16:41
.setResponseCode(201)
.addHeader("Tus-Resumable", "1.0.0")
.addHeader("Location", locationPath)
.addHeader("Upload-Offset", firstChunkSize.toString())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the way, note that you're not actually testing here what the mock web server received from CreateTusUploadRemoteOperation but you are just telling it to return some fixed integer.
A better way might be:

server.setDispatcher(new Dispatcher() {
    @Override
    public MockResponse dispatch(RecordedRequest request) {
        // Read body bytes
        byte[] bodyBytes = request.getBody().readByteArray();
        int bodySize = bodyBytes.length;

        return new MockResponse()
                .setResponseCode(201)
                .addHeader("Tus-Resumable", "1.0.0")
                .addHeader("Location", locationPath)
                .addHeader("Upload-Offset", bodySize);
    }
});
// (after this test, re-set the dispatcher again to default)

(did not test this, AI written pseudocode ^)

That way, we also test that the CreateTusUploadRemoteOperation is actually sending the chunk size amount/length to the (mock) server. So you're also testing if there is accidently too little or too much sent.
(only do it in this test, not for all tests.. just to get a bit more coverage).
(you could even extend it by comparing in the dispatch handler if bodyBytes == first chunk contents)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, look the "real" pr please. Give ur opinion if its what u expected

@zerox80 zerox80 force-pushed the upload-cancellation-clean branch 2 times, most recently from 92fe248 to 81b8be0 Compare December 1, 2025 12:27
@zerox80
Copy link
Contributor Author

zerox80 commented Dec 1, 2025

@guruz github trolled me kinda, but duplicates are gone :)

@guruz
Copy link
Contributor

guruz commented Dec 3, 2025

Thanks, I will test it.

btw, wrong commit subject " feat: Add initial implementation and TusUploadHelper ". Shouldn't it be "TUS: Fix #71"

@zerox80
Copy link
Contributor Author

zerox80 commented Dec 3, 2025

its from vs code i let it auto generate

@guruz
Copy link
Contributor

guruz commented Dec 3, 2025

OK, please let's have nice commit subjects. Else it's pointless to even have them if they're wrong or confusing :)

@zerox80
Copy link
Contributor Author

zerox80 commented Dec 3, 2025

yes sorry

@zerox80
Copy link
Contributor Author

zerox80 commented Dec 3, 2025

ill correct it later

@zerox80 zerox80 force-pushed the upload-cancellation-clean branch from 21d2f1d to 0fd5654 Compare December 3, 2025 20:15
@zerox80 zerox80 force-pushed the upload-cancellation-clean branch from 0fd5654 to 0318e33 Compare December 3, 2025 20:47
@guruz
Copy link
Contributor

guruz commented Dec 4, 2025

Two things I've noticed so far:

  1. While an upload if in a timed out state with host wifi off and I switch to "Personal" tab and then back to "Uploads" tab, the progress is 0 until you switch back wifi on and wait for it to resume. I was only able to reproduce this once so doesn't matter if you can't reproduce it

  2. I've turned the wifi off for a longer time (half an hour?) then the whole upload broke with 404 (chunks expired? e.g. TUS upload can't continue)
    @TheOneRing @kulmann What is the default life time a TUS upload will stay alive on the server? Is this a configuration thing with demo.opencloud.eu or is it really so low?
    @zerox80 The app only then tries to resume the TUS upload after 5 minutes. Not sure how easy is it to fix this and match it into the back off logic or whatever else triggers the re-uploads in this case?

I also see this falling back to single PUT, now I'm not sure anymore if this is good for a big file @TheOneRing opinion?

12-04 15:21:02.408 25230 25277 D (TusUploadHelper.kt:209): W: TUS: PATCH failed at offset 0 (retry 1/5)
12-04 15:21:02.409 25230 25277 D (OpenCloudClient.java:129): D: Executing in request with id 2c9960e5-197d-40f4-83db-06993d354998
12-04 15:21:02.456 25230 25277 D (GetTusUploadOffsetRemoteOperation.kt:28): D: Get TUS upload offset - 404(FAIL)
12-04 15:21:02.457 25230 25277 D (TusUploadHelper.kt:336): W: TUS: upload not found on server (404), clearing state to restart

12-04 15:21:02.460 25230 25277 D (UploadFileFromContentUriWorker.kt:318): W: TUS upload failed, falling back to single PUT

@zerox80
Copy link
Contributor Author

zerox80 commented Dec 5, 2025

Two things I've noticed so far:

  1. While an upload if in a timed out state with host wifi off and I switch to "Personal" tab and then back to "Uploads" tab, the progress is 0 until you switch back wifi on and wait for it to resume. I was only able to reproduce this once so doesn't matter if you can't reproduce it
  2. I've turned the wifi off for a longer time (half an hour?) then the whole upload broke with 404 (chunks expired? e.g. TUS upload can't continue)
    @TheOneRing @kulmann What is the default life time a TUS upload will stay alive on the server? Is this a configuration thing with demo.opencloud.eu or is it really so low?
    @zerox80 The app only then tries to resume the TUS upload after 5 minutes. Not sure how easy is it to fix this and match it into the back off logic or whatever else triggers the re-uploads in this case?

I also see this falling back to single PUT, now I'm not sure anymore if this is good for a big file @TheOneRing opinion?

12-04 15:21:02.408 25230 25277 D (TusUploadHelper.kt:209): W: TUS: PATCH failed at offset 0 (retry 1/5)
12-04 15:21:02.409 25230 25277 D (OpenCloudClient.java:129): D: Executing in request with id 2c9960e5-197d-40f4-83db-06993d354998
12-04 15:21:02.456 25230 25277 D (GetTusUploadOffsetRemoteOperation.kt:28): D: Get TUS upload offset - 404(FAIL)
12-04 15:21:02.457 25230 25277 D (TusUploadHelper.kt:336): W: TUS: upload not found on server (404), clearing state to restart

12-04 15:21:02.460 25230 25277 D (UploadFileFromContentUriWorker.kt:318): W: TUS upload failed, falling back to single PUT

@guruz Thanks for testing! Here are my answers:

Regarding point 1 (Progress 0 when switching tabs):
This is a UI refresh issue when loading the transfer list. Since it only happened once and is not an upload error, I would consider this a minor improvement. If it becomes reproducible more often, we can investigate it further.

Regarding point 2 (TUS upload expired/404):
The logs show exactly the expected behavior for an expired TUS session, but the log message "falling back to single PUT" is slightly misleading in this specific context. Here is what actually happens:

  1. PATCH failed (retry 1/5) -> Upload attempt after reconnection.
  2. Get TUS upload offset - 404(FAIL) -> Server no longer recognizes the session.
  3. upload not found (404), clearing state to restart -> We correctly detect the expiry and clear the stored TUS URL/offset.
  4. falling back to single PUT -> The code prepares to fallback, BUT:
  5. The Worker incorrectly catches this 404/IOException as a "retryable network error" before the PUT fallback actually starts uploading.

Result: The upload correctly fails with a Retry signal to WorkManager. It does not actually switch to a single PUT (which is good for large files). Instead, after the WorkManager backoff delay, it restarts the job. Since we cleared the state in step 3, it begins a fresh TUS upload from scratch.

Regarding the wait time:
The delay you see is indeed controlled by WorkManager's automatic retry logic (exponential backoff), which kicks in between these attempts. The internal buffer (max 2 seconds) is only for retries during an active session.

Possible improvements:
Since the client already handles this by starting a fresh TUS upload (after the backoff delay), the main "fix" here would be server-side:

  1. Increase the TUS session lifetime on the server to prevent the 404s in the first place during reasonable pauses.

(Technically we could also try to bypass the WorkManager backoff if we detect a 404, but that might be risky if the server is actually down).

@zerox80
Copy link
Contributor Author

zerox80 commented Dec 5, 2025

No single put is made.
if you do that again with the 30 minutes and then turn Wi-Fi off for a few seconds, you will see if it really is a single PUT,
because then it starts from the beginning without TUS :)
This is extremely confusing programming, but the log is basically lying here—it announces something that doesn't happen after all.

Copy link
Contributor

@guruz guruz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this, it works. We can create follow up issues in the issue tracker.

@guruz guruz merged commit 80764e2 into opencloud-eu:main Dec 5, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants