Fix paused sync file move issue #5949 by mrow4a · Pull Request #5953 · owncloud/client

mrow4a · 2017-08-10T22:06:54Z

@SamuAlfageme can you verify that this fixes issue for ChunkingNG for you? I checked on my local machine and this fixed it.

@guruz I will also add similar thing, but for V1, however there I will check if the file bytes been sent or not.

Issue #5949

mrow4a · 2017-08-10T22:09:39Z

+            if (_runningJobs.size() == 0) {
+                break;
+            }
+            Utility::sleep(1);


This is required, because if composite job gets aborted, but one of the job isnt, it will not finalize adding file to journal

but that's not acceptable. You can't let the main thread sleep like this.
You need to find a better way to wait that the job is actually finished (by listening to the finished signal or so)

Ohh, true, just tested and it stops the main thread, yep will do that with signals

mrow4a · 2017-08-10T22:11:06Z

+    foreach (AbstractNetworkJob *job, _jobs) {
+        // Abort only PUT and MKDIR jobs, since abording MoveJob
+        // might result in conflict.
+        if (job->reply() && !job->inherits("OCC::MoveJob")) {


if it is not MoveJob, just exit. (Wont it result in tmp new chunking directory be left without deletion, is server deleting periodicaly these tmp directories? @DeepDiver1975 )

nitpick: use qobject_cast (more efficient and also compile-time checked)

Also, we shold probzbly decrease the timeout. For example, if the abort happens because the connection was detected to be dropped, we should not wait a full HTTP timeout again, IMHO.

And to answer your question about stale directory, the next sync will resume the upload, so it won't be stale.

Changed to qobject_cast.

ogoffart

That goes in the right direction,

Also the chunkingv1 need to be changed not to abort the last job. This is also what is used when the file is too small to be split in chunks.

ogoffart · 2017-08-11T08:12:25Z

+            if (_runningJobs.size() == 0) {
+                break;
+            }
+            Utility::sleep(1);


but that's not acceptable. You can't let the main thread sleep like this.
You need to find a better way to wait that the job is actually finished (by listening to the finished signal or so)

ogoffart · 2017-08-11T08:16:13Z

+    foreach (AbstractNetworkJob *job, _jobs) {
+        // Abort only PUT and MKDIR jobs, since abording MoveJob
+        // might result in conflict.
+        if (job->reply() && !job->inherits("OCC::MoveJob")) {


nitpick: use qobject_cast (more efficient and also compile-time checked)

Also, we shold probzbly decrease the timeout. For example, if the abort happens because the connection was detected to be dropped, we should not wait a full HTTP timeout again, IMHO.

guruz · 2017-08-11T10:25:45Z

Also the chunkingv1 need to be changed not to abort the last job. This is also what is used when the file is too small to be split in chunks.

I think it's better to hook into how UploadDevice is used and from there control an abort. Then you for sure know you didn't send all bytes and can safely abort.

mrow4a · 2017-08-14T15:20:15Z

@ogoffart can you have a look on this approach? One disadvantage is that now after paused, MOVE job continues (1st stage):

after it is finished, view switches to (2nd stage):

One thing is that the button resume sync is still available also in "1st stage", not sure what happens if it still processes MOVE, but someone clicks resume in that time.

@SamuAlfageme I did manual testing and for ChunkingNG problem is solved.

ogoffart · 2017-08-14T15:37:13Z

That's not needed, abort was already sent

So you don't need to put abortJobs in a different function

The idea here was to force the abort. Thus if after 5 seconds they did not finish, you kill the job just by sending abort jobs again, please have a look on chunkingng implementation

Hmm, maybe in fact I dont need it also... it will roll back to what was before this PR for the jobs which did not terminate within 5sec

ogoffart · 2017-08-14T15:38:20Z

You need to count the jobs and only emit the signal when ALL subjobs have finished their abort.

Ah no. i got confused with CompositeJob.

Well, you don't need a slot for this single line, just connect to SIGNAL(abortFinished())

abortFinished signal is being send after 5 seconds timeout regardless if jobs terminated in that time or not. I give jobs max 5 seconds to finish what they started after receiving first abort signal.

maybe I should change namings, since I see your confusion.

@ogoffart so I can connect signal to signal?

ogoffart · 2017-08-14T15:44:46Z

Looks mostly good. But don't forget the put in the V1, this one is even more important as it happens more often.

mrow4a · 2017-08-14T15:46:01Z

Yep, I will do now V1, just wanted to verify that aborting implementation will work for NG (if it will work there, it will probably work everywhere just by implementing abort() in the job)

mrow4a · 2017-08-15T07:44:25Z

Guys, I tried also the version where CompositeJob waits for aborts from its child, but for some reason with "not aborted MOVE", it was not working properly (looks like a mix of finished and abortFinished signals or even their lack messes things). It took me quite long to debug everything.

Should I try once again Timeout in the job itself, or keep with version of timeout in composite job (which will finish sync run no matter what since it does not care if subjob finished abort or not) ?

mrow4a · 2017-08-15T12:20:06Z

@ogoffart @guruz @SamuAlfageme PR ready, fixes both V1 and NG

guruz · 2017-08-15T13:21:43Z

+void PropagatorCompositeJob::abort()
+{
+    foreach (PropagatorJob *j, _runningJobs)
+        j->abort();


please use { and }

guruz · 2017-08-15T13:26:18Z

-        foreach (PropagatorJob *j, _runningJobs)
-            j->abort();
-    }
+    virtual void abort() Q_DECL_OVERRIDE;


Please also document here that the abort() might not be synchronous/immediate

Actually, the PropagatorJob::abort documentation should state that this function must emit abortFinished

guruz

I'm missing a bit of info what happens in a situation where an immediate abort does not work. What will happen after those 5000msec? It will go to emitFinished but what happens with the running stuff?

Also added some other comments:)

mrow4a · 2017-08-15T13:43:12Z

I'm missing a bit of info what happens in a situation where an immediate abort does not work. What will > happen after those 5000msec? It will go to emitFinished but what happens with the running stuff?

@guruz Good question, in my observation when job finishes the sync run is already gone, thus whatever finishes after that time is ignored.

I think I will try to implement "forced abort".

guruz · 2017-08-15T13:45:22Z

I'm not sure, maybe the QNetworkReplys are parented to the Jobs which get deleted when SyncEngine is deleted.
(Which is the force abort, basically)

mrow4a · 2017-08-15T13:54:27Z

@guruz how to check it? @jturcotte

jturcotte · 2017-08-15T14:00:08Z

Maybe just call abort() in the code where you think the issue could appear.

For example in tests we have FakeFolder::execUntilItemCompleted that allows you to something right after an itemCompleted signal. But you could also just test this manually in the code with a QTimer or something like that.

ogoffart

For the composite job, you need to increment a counter for every subjob you call abort(). connectconnect to the abortFinished on each job Only when all job have emited the abortFinished, you can safely emit the CompositeJob's abortFinished.

Note that all other jobs need to emit abortfinished, so you need to review every job's abort to make sure they emit that signal.

ogoffart · 2017-08-15T14:29:09Z

    {
+        // Abort first job and sub jobs
+        // Finished abort will be announced via abortFinished()
+        // (look contructor implementation)


(Maybe you can do the connection here.)

Here as well you need to have a counter to make sure that both jobs sent their abortFinished.

mrow4a · 2017-08-15T14:35:37Z

@ogoffart I did that, and for some reason it was not working properly e.g. finished signal never went back after aborting, and other issues... I can give a try once again on that approach..

I just discovered that I also have to be careful not to break abort mechanism on network error with move, since it will finish in infinite aborts loop (it will retry all the time).

ckamm · 2017-09-22T09:11:52Z

I've rebased to master and will review the current state to help move this forward.

ckamm · 2017-09-25T10:36:18Z

@mrow4a Looks good. My remaining worry is about what happens when abort(Async) is called twice. Do we guard against this somewhere?

I've also added my test case here.

mrow4a · 2017-09-25T11:03:08Z

I mean, if you call it again, it will basically let unfinished job run (as with previous call of abort(Async)), the other jobs were synchronously aborted with previous call. The only difference is, that now you might have 2 timers (timeout timers) running in parallel, and you add another connects on signals. @ogoffart is that a problem?

ckamm · 2017-09-28T07:53:11Z

@mrow4a I added a minor guard against it that seems sufficient for me.

ckamm · 2017-09-28T07:54:55Z

@mrow4a @guruz @ogoffart Are there any remaining concerns (also about the code I added)? Can we merge this?

ogoffart

👍

ogoffart · 2017-09-28T10:11:38Z

    PropagateUploadFileV1(OwncloudPropagator *propagator, const SyncFileItemPtr &item)
        : PropagateUploadFileCommon(propagator, item)
+        , _startChunk(0)
+        , _currentChunk(0)


Note: this could be initialized in the header file.

Should I step in and fix it, or @ckamm ?

Sorry, I missed the notifications for this. Looking now.

@mrow4a (in general, feel free to just fix issues if I'm unresponsive!)

ogoffart · 2017-09-28T10:12:28Z

-                if (abortType == AbortType::Asynchronous && (((_currentChunk + _startChunk) % _chunkCount) == 0)
-                        && putJob->device()->atEnd()) {
+                if (abortType == AbortType::Asynchronous
+                    && _chunkCount > 0


I wonder how chunk count can be 0

chunkCount gets set by doStartUpload and the job's abort can be called before we get there.

mrow4a · 2017-09-28T11:30:19Z

+            // since this might result in conflicts
+            if (PUTFileJob *putJob = qobject_cast<PUTFileJob *>(job)){
+                if (abortType == AbortType::Asynchronous
+                    && _chunkCount > 0


Can you elaborate more here?

I guess this is safety check right?

See above. I was getting crashes because of zero chunkCount when I aborted jobs quickly.

Ohh that would make sense did not try indeed this corner case.

mrow4a · 2017-10-05T19:49:35Z

Any progress here?

ckamm · 2017-10-06T10:54:39Z

@mrow4a I think you can merge this now

mrow4a · 2017-10-06T10:58:22Z

What is happening with Jenkins build? Did you run tests locally?

ckamm · 2017-10-06T11:02:20Z

No, I hadn't. The test that fails to compile on jenkins works locally.

The problem here is that jenkins merges on top of master, and this patch needs some adjustments to work there. I'll take care of the rebase and fixups.

Dont abort final chunk immedietally Use sync and async aborts

_chunkCount could be 0, leading to a floating point exception I also added initializers for several uninitialized integers in the upload jobs.

ckamm · 2017-10-06T11:11:40Z

Jenkins should pass now, let's wait and see.

ckamm · 2017-10-09T12:21:41Z

@DeepDiver1975 I think we might want this in 2.4, but jenkins failed in something after running the tests. Is this a transient issue where retriggering jenkins would help? Could you do that?

ogoffart · 2017-10-13T09:36:06Z

We anyway need to merge this to the 2.4 branch if we want this in 2.4 so it will need to be re-targetted

(others reviewed)

ckamm · 2017-10-17T07:45:06Z

Targeted to 2.4 and merged!

mrow4a requested review from SamuAlfageme, guruz and ogoffart August 10, 2017 22:06

mrow4a commented Aug 10, 2017

View reviewed changes

ogoffart suggested changes Aug 11, 2017

View reviewed changes

mrow4a force-pushed the fix_5949 branch from 514eff5 to e02a683 Compare August 14, 2017 15:14

mrow4a force-pushed the fix_5949 branch from e02a683 to d2a8427 Compare August 14, 2017 15:28

ogoffart reviewed Aug 14, 2017

View reviewed changes

mrow4a force-pushed the fix_5949 branch 2 times, most recently from 89b7324 to 6d533eb Compare August 14, 2017 20:27

mrow4a changed the title ~~[WIP] Fix paused sync file move issue #5949~~ Fix paused sync file move issue #5949 Aug 15, 2017

mrow4a mentioned this pull request Aug 15, 2017

Resuming a paused sync that was uploading a bunch of files and were moved in the client is duplicating one file in both locations #5949

Closed

guruz reviewed Aug 15, 2017

View reviewed changes

guruz previously requested changes Aug 15, 2017

View reviewed changes

ogoffart suggested changes Aug 15, 2017

View reviewed changes

ogoffart reviewed Aug 15, 2017

View reviewed changes

ckamm force-pushed the fix_5949 branch from 27cb94b to d725aaf Compare September 25, 2017 10:32

ogoffart approved these changes Sep 28, 2017

View reviewed changes

mrow4a commented Sep 28, 2017

View reviewed changes

ckamm force-pushed the fix_5949 branch from fc60bef to ac75b6d Compare October 6, 2017 10:53

mrow4a and others added 6 commits October 6, 2017 13:04

Fix paused sync file move issue #5949

1e02877

Dont abort final chunk immedietally Use sync and async aborts

UploadNG: Avoid div-by-zero for super fast uploads

c082287

TestUtils: Invalidate etags on PUT or chunk-MOVE

a00970b

Test case for #5949

a40fe26

Propagator: Avoid duplicate async abort

3826e22

Abort: Fix crash with early aborts

8a3b144

_chunkCount could be 0, leading to a floating point exception I also added initializers for several uninitialized integers in the upload jobs.

ckamm force-pushed the fix_5949 branch from ac75b6d to 8a3b144 Compare October 6, 2017 11:11

ckamm added this to the 2.4.0 milestone Oct 9, 2017

ckamm changed the base branch from master to 2.4 October 17, 2017 07:44

ckamm merged commit b2a8ffc into 2.4 Oct 17, 2017

ckamm deleted the fix_5949 branch October 17, 2017 07:44

ogoffart mentioned this pull request Apr 9, 2018

Moving folder causes conflict with identical files if 503 is encountered during sync #6435

Closed

Conversation

mrow4a commented Aug 10, 2017 • edited by ogoffart Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogoffart left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guruz commented Aug 11, 2017

Uh oh!

mrow4a commented Aug 14, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrow4a Aug 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogoffart commented Aug 14, 2017

Uh oh!

mrow4a commented Aug 14, 2017

Uh oh!

mrow4a commented Aug 15, 2017

Uh oh!

mrow4a commented Aug 15, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guruz left a comment

Choose a reason for hiding this comment

Uh oh!

mrow4a commented Aug 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guruz commented Aug 15, 2017

Uh oh!

mrow4a commented Aug 15, 2017

Uh oh!

jturcotte commented Aug 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogoffart left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrow4a commented Aug 15, 2017

mrow4a commented Aug 10, 2017 •

edited by ogoffart

Loading

mrow4a Aug 14, 2017 •

edited

Loading

mrow4a commented Aug 15, 2017 •

edited

Loading

jturcotte commented Aug 15, 2017 •

edited

Loading

ckamm commented Sep 25, 2017 •

edited

Loading

mrow4a commented Sep 25, 2017 •

edited

Loading