Data loss on rename of a 49 GB folder #13391

MorrisJobke · 2015-01-15T14:17:59Z

I accidentially renamed a folder on my production instance:

the folder was filled up with nearly 20k files in a total size of 48,7 GB
I renamed the folder in the web UI
shortly (2-5 seconds) after I noticed this I shut down the client to avoid bigger trouble (as this folder was set up as a synced folder inside the client)
the spinner spinned forever

Notes

I have a database dump here
I have all apache logs here
I have the owncloud.log here
I have a filesystem snapshot

If someone wants to help me with digging in the debris is welcome.

Access log:

The rename:

127.0.0.1 - - [15/Jan/2015:13:32:39 +0100] "GET /index.php/apps/files/ajax/rename.php?dir=%2F&newname=Bildersd&file=Bilder HTTP/1.1" 503 1216

The access log filtered for the folder Bilder:

...
127.0.0.1 - - [15/Jan/2015:13:24:36 +0100] "PROPFIND /remote.php/webdav/Bilder HTTP/1.1" 207 411
127.0.0.1 - - [15/Jan/2015:13:25:04 +0100] "PROPFIND /remote.php/webdav/Bilder HTTP/1.1" 207 411
127.0.0.1 - - [15/Jan/2015:13:25:34 +0100] "PROPFIND /remote.php/webdav/Bilder HTTP/1.1" 207 411
127.0.0.1 - - [15/Jan/2015:13:26:06 +0100] "PROPFIND /remote.php/webdav/Bilder HTTP/1.1" 207 411
127.0.0.1 - - [15/Jan/2015:13:26:39 +0100] "PROPFIND /remote.php/webdav/Bilder HTTP/1.1" 207 16662
127.0.0.1 - - [15/Jan/2015:13:27:36 +0100] "PROPFIND /remote.php/webdav/Bilder HTTP/1.1" 207 411
127.0.0.1 - - [15/Jan/2015:13:28:06 +0100] "PROPFIND /remote.php/webdav/Bilder HTTP/1.1" 207 411
127.0.0.1 - - [15/Jan/2015:13:28:36 +0100] "PROPFIND /remote.php/webdav/Bilder HTTP/1.1" 207 411
127.0.0.1 - - [15/Jan/2015:13:29:06 +0100] "PROPFIND /remote.php/webdav/Bilder HTTP/1.1" 207 411
127.0.0.1 - - [15/Jan/2015:13:29:36 +0100] "PROPFIND /remote.php/webdav/Bilder HTTP/1.1" 207 411
127.0.0.1 - - [15/Jan/2015:13:30:05 +0100] "PROPFIND /remote.php/webdav/Bilder HTTP/1.1" 207 411
127.0.0.1 - - [15/Jan/2015:13:30:34 +0100] "PROPFIND /remote.php/webdav/Bilder HTTP/1.1" 207 411
127.0.0.1 - - [15/Jan/2015:13:31:04 +0100] "PROPFIND /remote.php/webdav/Bilder HTTP/1.1" 207 411
127.0.0.1 - - [15/Jan/2015:13:31:36 +0100] "PROPFIND /remote.php/webdav/Bilder HTTP/1.1" 207 411
127.0.0.1 - - [15/Jan/2015:13:32:09 +0100] "PROPFIND /remote.php/webdav/Bilder HTTP/1.1" 207 16662

Nothing special in php-fpm.log or apache error log.

The folder is successfully renamed (in database and in the filesystem), but all database entries are gone (files are still there in the filesystem). Just a forced rescan with the occ command line tool was able to get them back into the database. Browsing the folder in the web UI didn't trigger the update of the file cache.

! For the user (without admin rights) it's not possible to get back the data from the server. It's simply not shown.

I will try to investigate further and try to reproduce.

cc @karlitschek @DeepDiver1975 FYI could get a showstopper soon - I opened this ticket to document my process

The text was updated successfully, but these errors were encountered:

MorrisJobke · 2015-01-15T14:58:55Z

To clarify: the rename happend through the web UI

PVince81 · 2015-01-15T15:12:40Z

127.0.0.1 - - [15/Jan/2015:13:32:39 +0100] "GET /index.php/apps/files/ajax/rename.php?dir=%2F&newname=Bildersd&file=Bilder HTTP/1.1" 503

503 ? That's "service unavailable" and shouldn't even trigger an actual rename.

PVince81 · 2015-01-15T15:13:31Z

You said you didn't have any other storages than home:: and the root, so this excludes the case of an unavailable external storage.

MorrisJobke · 2015-01-15T15:18:10Z

@PVince81 Yes. There is no external storage.

Regards the 503: It's the only rename request and it definetly gets renamed. Could that be caused by the timeout?

PVince81 · 2015-01-15T15:20:58Z

Not sure. Do you think php-fpm would decide to send 503 by itself when a timeout occurs ?

If the server was not available / maintenance mode, our Sabre plugin should kick in very early and prevent any file operations.

PVince81 · 2015-01-15T15:21:47Z

You could check the owncloud.log from around the time it happened (mind the timezone/utc differences)

MorrisJobke · 2015-01-15T19:28:21Z

I noticed that a rename caused a lot of database queries (600 for a folder with 507 files in it). It needs to update the path of all elements. I guess this caused the timeout and PHP-FPM will kill the process once the timeout is hit.

https://blackfire.io/profiles/a33715d9-0191-4f9a-ad4a-2f3166d71584/graph

MorrisJobke · 2015-01-15T19:29:32Z

@icewind1991 What is the reason to store the full path? Isn't knowing the parent enough to generate the full path?

MorrisJobke · 2015-01-16T09:40:52Z

@DeepDiver1975 @karlitschek I would rate this a bit higher. I talk to @icewind1991 and he would like to come up with a partly improving change, but reducing the load (especially the SQL queries) in a way like it was done for the delete operation isn't possible for 8.0 (#13394).

On the one hand this would require bigger changes but on the other hand this will cause critical problems (and even data loss) on renaming folders with many children.

Is this rated a showstopper or not?

icewind1991 · 2015-01-16T13:01:45Z

@icewind1991 What is the reason to store the full path? Isn't knowing the parent enough to generate the full path?

Yes, but most queries we do are by path

MorrisJobke · 2015-01-16T13:23:21Z

@icewind1991 I guess it scales better if you simply traverse the file tree. And you can cache this too. The current approach doesn't scale in any direction. :(

PVince81 · 2015-01-19T10:51:09Z

I still don't understand why renaming a simple folder in place could run into a timeout (not even moving it to another location)
Was that folder shared with many people ?

MorrisJobke · 2015-01-19T12:19:52Z

@PVince81 No. Have a look at the path column. it contains the full path and this needs to be updated for every child element. With many childs this could cause a huge processing action.

PVince81 · 2015-01-19T12:21:39Z

Ah right... the DB update :-/

PVince81 · 2015-02-06T13:13:08Z

A ticket should be either technical debt or a bug.
I'd rather this is a bug. It might be caused by legacy code, but is still a bug.

PVince81 · 2015-02-06T13:13:58Z

@MorrisJobke have you been able to find any more clues ?

MorrisJobke · 2015-02-06T13:34:26Z

@PVince81 It's simply just the massive amount of DB updates. And the executing process got killed before it can finish this task. Nothing we can change for now :(

christianrj · 2015-02-06T13:39:39Z

This is really a showstopper bug (as you can see in #10711). Our manager is considering to stop using Owncloud because of all these rename and sync problems that never gets fixed. I hope that you can fix all of these problems, because for us right now, oC can't be used in production. Thanks!

PVince81 · 2015-02-06T13:48:05Z

@DeepDiver1975 I've set this to 8.1, this should definitely be looked into.

It might take some time to debug because this bug is difficult to reproduce consistently.

I suspect that the part that handles renames will need to be rewritten to use a different approach, either by using part folders #13756 or updating the cache for each file one by one, as proposed here #13775 instead of doing a bulk update at the end.

MorrisJobke · 2015-02-06T13:53:54Z

It might take some time to debug because this bug is difficult to reproduce consistently.

? You can't reproduce this? But the rename takes ages for you too, didn't it?

PVince81 · 2015-02-06T14:14:29Z

The few times I tried I couldn't reproduce the issue. At least in my case there was no data loss / deletion from the sync client.

PVince81 · 2015-02-06T14:19:56Z

Either the rename operation needs to take longer than one sync cycle, which means the sync client would try and access an inconsistent DB state. Or the rename must run into a PHP timeout where the PHP process gets killed (php-fpm case)

Maybe case 1 can be simulated by adding a few sleep() operations in the code to slow down renaming.

PVince81 · 2015-06-16T15:24:15Z

@icewind1991 it didn't work, still happening.

How about the hasUpdated approach you suggested ?

PVince81 · 2015-06-18T09:15:56Z

Work in progress here #16963, searching for alternative approaches to lock the cache/scanner

PVince81 · 2015-06-18T13:02:43Z

These two PRs together #17017 and #16963 make the problem disappear (with locking enabled)

MorrisJobke · 2015-06-18T16:36:19Z

@PVince81 @icewind1991 Thanks for this! You all rock :)

beejee · 2015-06-22T13:44:27Z

Hi,

can it be that #15702 is related to this?
I can test any changes/fixes you require me to execute to get you any feedback on this.

Thanks!

PVince81 · 2015-06-22T13:47:09Z

Not necessarily. This ticket here is about files randomly disappearing, it is not consistent.
The ticket you linked against is about SWIFT.

PVince81 · 2015-06-22T13:49:14Z

If you have a test instance where you can test 8.1, you could enable file locking, see https://doc.owncloud.org/server/8.1/admin_manual/configuration_files/files_locking_experimental.html

beejee · 2015-06-22T14:12:30Z

Ah indeed, I responded too fast and only noticed after the difference after. I will build a testsetup with 8.1 to experiment with the filelocking and present some feedback on the other thread.

Thanks.

iGadget · 2015-08-03T10:58:27Z

So will the file locking solution also work when you rename a large folder and undo that rename within a few seconds? How would that work out?

PVince81 · 2015-08-03T11:00:47Z

If you undo the rename while the operation is still in progress you will get a message like "folder is currently busy" and will need to try again later. If done through the sync client, the sync client will automatically retry later.

PVince81 · 2015-08-03T11:01:56Z

On another note, @icewind1991 had a POC fix that should accelerate renaming of database entries: #13956

simopal6 · 2015-08-24T16:00:13Z

Excuse me for intruding, but it is not clear to me if the problem still happens or not.

alantygel · 2016-12-09T10:51:06Z

It just happened on my server.

We are still using owncloud 8.0 . Upgrading to 9 will solve the problem?

PVince81 · 2016-12-09T10:53:47Z

@alantygel yes, because OC 9 has some locking mechanism to avoid this kind of race conditions

lock · 2019-08-03T05:00:29Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

MorrisJobke added the Type:Bug label Jan 15, 2015

MorrisJobke self-assigned this Jan 15, 2015

MorrisJobke removed their assignment Jan 15, 2015

MorrisJobke mentioned this issue Jan 16, 2015

Renaming a folder deletes its content #13409

Closed

MorrisJobke added the technical debt label Jan 23, 2015

PVince81 mentioned this issue Feb 6, 2015

oC [7.0.2] Folder deleted after rename #10711

Closed

PVince81 added this to the 8.1-next milestone Feb 6, 2015

PVince81 mentioned this issue Feb 6, 2015

Move files + update cache one by one instead of bulk update #13775

Closed

PVince81 added the feature:locking label Jun 16, 2015

icewind1991 mentioned this issue Jun 18, 2015

update the file cache within the write lock #17017

Merged

PVince81 closed this as completed in #17017 Jun 18, 2015

LukasReschke mentioned this issue Jul 2, 2015

Folder which was full and just renamed is now empty #17358

Closed

PVince81 mentioned this issue Aug 21, 2015

Undefined index: extension at /var/www/owncloud/apps/files_versions/lib/storage.php#320 #14408

Closed

ghost mentioned this issue Sep 21, 2015

Owncloud synced encrypted folder can't handle directory structure changes / folder rename #19201

Closed

PVince81 mentioned this issue Sep 22, 2015

Moving files via WebUI sometimes causes integrity constraint violation #19241

Closed

This was referenced Oct 9, 2015

Directory and files within are missing after rename of folder #16925

Closed

Renaming folder caused deletion of files within owncloud/client#3954

Closed

PVince81 mentioned this issue Oct 26, 2015

Don't lock if we're only reading cache metadata #20053

Merged

ownclouders mentioned this issue Sep 14, 2018

Renaming folders on a system with a lot of data takes a long time #32714

Closed

lock bot locked as resolved and limited conversation to collaborators Aug 3, 2019

Data loss on rename of a 49 GB folder #13391

Data loss on rename of a 49 GB folder #13391

Comments

MorrisJobke commented Jan 15, 2015

MorrisJobke commented Jan 15, 2015

PVince81 commented Jan 15, 2015

PVince81 commented Jan 15, 2015

MorrisJobke commented Jan 15, 2015

PVince81 commented Jan 15, 2015

PVince81 commented Jan 15, 2015

MorrisJobke commented Jan 15, 2015

MorrisJobke commented Jan 15, 2015

MorrisJobke commented Jan 16, 2015

icewind1991 commented Jan 16, 2015

MorrisJobke commented Jan 16, 2015

PVince81 commented Jan 19, 2015

MorrisJobke commented Jan 19, 2015

PVince81 commented Jan 19, 2015

PVince81 commented Feb 6, 2015

PVince81 commented Feb 6, 2015

MorrisJobke commented Feb 6, 2015

christianrj commented Feb 6, 2015

PVince81 commented Feb 6, 2015

MorrisJobke commented Feb 6, 2015

PVince81 commented Feb 6, 2015

PVince81 commented Feb 6, 2015

PVince81 commented Jun 16, 2015

PVince81 commented Jun 18, 2015

PVince81 commented Jun 18, 2015

MorrisJobke commented Jun 18, 2015

beejee commented Jun 22, 2015

PVince81 commented Jun 22, 2015

PVince81 commented Jun 22, 2015

beejee commented Jun 22, 2015

iGadget commented Aug 3, 2015

PVince81 commented Aug 3, 2015

PVince81 commented Aug 3, 2015

simopal6 commented Aug 24, 2015

alantygel commented Dec 9, 2016

PVince81 commented Dec 9, 2016

lock bot commented Aug 3, 2019