Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VFS deleted 60+GiB of files; left dir structure untouched - Win10 client - VFS enabled *BEFORE* first sync cycle #8610

Closed
TheWebMachine opened this issue May 6, 2021 · 25 comments
Labels

Comments

@TheWebMachine
Copy link

Actual behavior

  • Have Owncloud syncing across both Win10 and Linux devices
    • 3x Win10 devices (if we include the problem device); 2 of which have Virtual Files enabled, one does not
    • 1x Linux device; Virtual Files is not enabled
  • A couple of Android devices in the mix, but I doubt that is relevant here
  • In the case of all devices, all online folders are selected for synced without exception

Just did a fresh Win10 install on a system, installed OC client, enabled Virtual Files before first sync. That initial sync (or a near term subsequent sync) deleted all files across all dirs leaving only the actual directory structure behind online. This then propagated across all of my other devices, resulting in much panic and sadness. I had go to the Deleted Files section to restore my files, which I'm still working on...

(This Deleted Files section needs some serious UI improvements, btw...restoring 43,000+ files is NO picnic, I'll tell ya! I'm lucky if I can get a few thousand at a time to restore and the webUI just freezes/times out if I go nuclear and try to restore them all at once. Perhaps an admin-side mass-restore page is needed to recovery from an "oh $h!t" like this; select a user and a date range to restore and click a button.)

Steps to reproduce

  1. Have an existing sync account setup with files stored in OC and locally on other devices
  2. Install and setup OC client on Win10 for first time, enabling Virtual Files before the first sync starts
  3. Realize a day later that your files have vanished from all of your devices

Server configuration

Operating system: Ubuntu 16.04
Web server: Apache 2.4.46
Database: MariaDB 1.3.5
PHP version: 7.2.34
ownCloud version: 10.6.0 stable
Storage backend (external storage): NFS mount

Client configuration

Client version: v2.7.6 b3261 c751767
Operating system: Win10 Pro x64
OS language: US Eng
Installation path of client: C:\Program Files\ownCloud

@TheWebMachine
Copy link
Author

Note: I haven't included any logs yet because:

  1. The mass amount of filenames involved would reveal too much PII
  2. I'm a little busy restoring my files right now but wanted to get the ball rolling on this in case others may be experiencing it without having realized it yet

Perhaps if you can provide some targeted data of relevance I can provide without revealing PII in the process by way of complete logs (given the massive impact of the issue, the log is >10MB anyway), I could get to work on that for you. I'd rather not setup an entire cloned test environment to generate sanitized logs if I don't have to.

@TheOneRing
Copy link
Member

I'm sorry for the experience.
Could you tell us the root file path of your sync on your Windows machine?
Are you using a folder sync pair ( not syncing the whole tree but a subfolder)
image

@TheOneRing
Copy link
Member

TheOneRing commented May 7, 2021

Also without sharing the full log, could you look into your log and share some selected lines starting like
05-07 11:09:46:810 [ warning
or
05-07 11:09:46:810 [ error
(Please leave out the date when searching)

@TheWebMachine
Copy link
Author

TheWebMachine commented May 7, 2021

I'm sorry for the experience.
Could you tell us the root file path of your sync on your Windows machine?

My bad. I thought I mentioned that. On Windows: C:\OwnCloud ; On Linux: /home//ownCloud

Are you using a folder sync pair ( not syncing the whole tree but a subfolder)

On all systems (prior to this incident), I was syncing the full tree on all systems. Since restoring my files, I have unchecked a few large folders on one of my other systems to preserve space (laptop) but am still syncing from the root of the full tree on all systems.

Also without sharing the full log, could you look into your log and share some selected lines starting like
05-07 11:09:46:810 [ warning
or
05-07 11:09:46:810 [ error
(Please leave out the date when searching)

I got all my files restored and took a deep dive into the log I kept from the new machine. While the format of the file lines makes my eyes cross (lol), I found not a single warning, not a single error. The only things I find when cleanly regex searching for warning or error are files with those words in the name - I'm a fellow dev, so I have a lot of files like blahblah_error.h and such. The entire sync looks perfectly normal in the logs, spare the fact that it deleted all the files on cloud instead of creating the placeholder files locally on the new system since this was an initial sync with VFS turned on from the start.

I suspect the client thinks what it is doing is perfectly normal and it just isn't accounting for a scenario where VFS would be enabled before the first sync has been performed on a clean install. Instead, it is treating it like an existing sync setup where the files appear to have been deleted locally, as no placeholder or real files exist there when the sync starts.

Perhaps some check that is performed on a new sync setup without VFS enabled isn't being performed when VFS is enabled, resulting in VFS being entirely unaware of/prepared for a new sync setup condition in the first place. What is the current outcome if you have VFS enabled, working fine, but you delete the placeholder files on a local system? Does this result in deletion from the server? Perhaps this is the condition being triggered because VFS is unaware of a new sync condition.

I haven't reviewed any client code yet, so I'm just flinging ideas off the top of my head in hopes that it might inspire someone else to a solution. If I'm way off base, I apologize.

@TheOneRing
Copy link
Member

With enabled before you mean
image
?

@TheWebMachine
Copy link
Author

With enabled before you mean?

To be perfectly honest, I can't be certain if I had...

  1. selected the "Use virtual files instead of..." initially
    -or-
  2. if I initially selected "Synchronize everything..." and then decided, as the first sync was about to start, to switch VFS on, which would have canceled the initial sync and started it over

It was rather late as I finished setting up the new system, so I can't be 100% sure either way. Blah...I may have to setup a test sync account with some data and test both ways to be sure. I probably won't have time for that for a few days yet.

That being said, I'd argue that if (1) were the case, you'd have a riot on your hands as this would be the default path to take for most folks and many others would be having this problem. So, it's more likely (2) is the case because it is the path less taken and sounds like something I would totally do...and it makes sense that this scenario wouldn't have been tested in advance by the team. Leave it to me to find the lesser known bugs! haha

@TheOneRing
Copy link
Member

If breaking things is one of your skills you could apply with the qa team ;)

If it was case 2 I guess it can also be a timing thing as I'm sure we tested that scenario before, so maybe it happens only when the switch is done at a very specific point in the sync.

@gabi18 @jnweiger could you try to reproduce next week?

@jnweiger
Copy link
Contributor

@TheWebMachine welcome to the team!

When you speak about empty folders on the client, one idea comes to my mind:

How does our sync algorithm decide the direction of the sync. Should it
a) sync the missing files from server to client, or
b) should it asume a user removed files at the client and propagate that "delete" to the server?

If b) happened to you, the it is plausible that you see no errors in the logs. It was a perfectly fine sync. "Just in the wrong direction 🙈

@TheWebMachine
Copy link
Author

Hi there! Yeah, it seems the direction gets flipped in this scenario somehow. Instead of realizing it needed to create placeholders from server to client, it presumed they had existed and were deleted, thereby choosing to delete them from the server, instead.

Perhaps this should flag as an error that needs resolved by the user before the sync is allowed to proceed. Similar to asking for confirm before syncing a large folder to client; instead ask about a large delete affecting more than a certain number of files. If it was properly flagging and checking for a new sync connection setup (the very first on this new Windows install), it should have known to sync placeholders down to client. I suppose, without errors in the log (since it thinks it did the right thing), it's not going to be easy to track down what didn't happen where in order to locate the breakdown in my case.

If I can reliably reproduce this with a test account against my server this week, I will be able to provide my exact steps. However, is there perhaps a special debug flag or debugging version I can use to capture more than what the Release version's logs will show? I dev in the *nix universe and am not setup to build from source on/for Windows at the present time.

@TheOneRing
Copy link
Member

TheOneRing commented May 10, 2021

Hi there! Yeah, it seems the direction gets flipped in this scenario somehow. Instead of realizing it needed to create placeholders from server to client, it presumed they had existed and were deleted, thereby choosing to delete them from the server, instead.

Perhaps this should flag as an error that needs resolved by the user before the sync is allowed to proceed. Similar to asking for confirm before syncing a large folder to client; instead ask about a large delete affecting more than a certain number of files. If it was properly flagging and checking for a new sync connection setup (the very first on this new Windows install), it should have known to sync placeholders down to client. I suppose, without errors in the log (since it thinks it did the right thing), it's not going to be easy to track down what didn't happen where in order to locate the breakdown in my case.

If I can reliably reproduce this with a test account against my server this week, I will be able to provide my exact steps. However, is there perhaps a special debug flag or debugging version I can use to capture more than what the Release version's logs will show? I dev in the *nix universe and am not setup to build from source on/for Windows at the present time.

If you enable debugging in the ui you will gather more information.
image

Please check all boxes.

@TheOneRing
Copy link
Member

@TheWebMachine welcome to the team!

When you speak about empty folders on the client, one idea comes to my mind:

How does our sync algorithm decide the direction of the sync. Should it
a) sync the missing files from server to client, or
b) should it asume a user removed files at the client and propagate that "delete" to the server?

If b) happened to you, the it is plausible that you see no errors in the logs. It was a perfectly fine sync. "Just in the wrong direction 🙈

The algorithm will discover files on the server, if the file is in the local db but does not exist, its a delete.
Files are added to the db only after they are created locally ( except when we have a bug and for example don't report issues with the file creation #8294).

@gabi18
Copy link
Contributor

gabi18 commented May 10, 2021

If breaking things is one of your skills you could apply with the qa team ;)

If it was case 2 I guess it can also be a timing thing as I'm sure we tested that scenario before, so maybe it happens only when the switch is done at a very specific point in the sync.

@gabi18 @jnweiger could you try to reproduce next week?

So far I couldn't reproduce the problem.

I have tested with fresh installed Win10 Pro (VirtualBox) and newly installed client 2.7.6.
On scenario 1 with "Use virtual files instead of..." initially all folder and files are synced correctly (virtual).

Also when starting with "Synchronize everything..." and then switching to VFS so far worked without problems.
I will concentrate on scenario 2 and try to find out where the critical point of switching to VFS could be.

@TheWebMachine
Copy link
Author

TheWebMachine commented May 10, 2021

If you enable debugging in the ui you will gather more information.
Please check all boxes.

Yeah, my sign to go get some sleep. So used to the Linux client I always forget about the extra logging button on the Windows client.

The algorithm will discover files on the server, if the file is in the local db but does not exist, its a delete.
Files are added to the db only after they are created locally ( except when we have a bug and for example don't report issues with the file creation #8294).

Reading that issue, it sounds similar to what might have happened here. Perhaps they are related? You may recall me mentioning that I may have switched on VFS after the initial sync started, which canceled the first sync...but I have no idea exactly which stage of that initial sync it was in when it was aborted to make the config change. (As an aside, following those breadcrumbs just now led me to JanAkermann's owncloud-restore-trash code, which might have made my life easier last week haha)

So far I couldn't reproduce the problem.

I have tested with fresh installed Win10 Pro (VirtualBox) and newly installed client 2.7.6.
On scenario 1 with "Use virtual files instead of..." initially all folder and files are synced correctly (virtual).

Also when starting with "Synchronize everything..." and then switching to VFS so far worked without problems.
I will concentrate on scenario 2 and try to find out where the critical point of switching to VFS could be.

Given my total store under that account is over 100GiB, it definitely would have never been done with an initial sync when switching to VFS under Scenario 2. In fact, it was almost certainly not done with the initial parsing of the file structure...it likely never downloaded a single file before being switched over to VFS. The more I think about it, Scenario 1 is probably a dead end. Pretty sure I initially set it up and then fairly quickly went "oh crap, I don't need ALL these files on a single-purpose box" and toggled the VFS option on, it mentioned aborting the current sync, I agreed, then went on with my night thinking all was well.

@TheOneRing
Copy link
Member

If you enable debugging in the ui you will gather more information.
Please check all boxes.

Yeah, my sign to go get some sleep. So used to the Linux client I always forget about the extra logging button on the Windows client.

The algorithm will discover files on the server, if the file is in the local db but does not exist, its a delete.
Files are added to the db only after they are created locally ( except when we have a bug and for example don't report issues with the file creation #8294).

Reading that issue, it sounds similar to what might have happened here. Perhaps they are related? You may recall me mentioning that I may have switched on VFS after the initial sync started, which canceled the first sync...but I have no idea exactly which stage of that initial sync it was in when it was aborted to make the config change. (As an aside, following those breadcrumbs just now led me to JanAkermann's owncloud-restore-trash code, which might have made my life easier last week haha)

Hm I hope not because the bug is supposed to be long gone.

So far I couldn't reproduce the problem.
I have tested with fresh installed Win10 Pro (VirtualBox) and newly installed client 2.7.6.
On scenario 1 with "Use virtual files instead of..." initially all folder and files are synced correctly (virtual).
Also when starting with "Synchronize everything..." and then switching to VFS so far worked without problems.
I will concentrate on scenario 2 and try to find out where the critical point of switching to VFS could be.

Given my total store under that account is over 100GiB, it definitely would have never been done with an initial sync when switching to VFS under Scenario 2. In fact, it was almost certainly not done with the initial parsing of the file structure...it likely never downloaded a single file before being switched over to VFS. The more I think about it, Scenario 1 is probably a dead end. Pretty sure I initially set it up and then fairly quickly went "oh crap, I don't need ALL these files on a single-purpose box" and toggled the VFS option on, it mentioned aborting the current sync, I agreed, then went on with my night thinking all was well.

The switching on vfs should actually have no effects, as I mentioned only files created on the hd are added to the data base....

@gabi18
Copy link
Contributor

gabi18 commented May 10, 2021

The issue is still not reproducible on my system (Win10 20H2 Build 19042.508).
When switching to 'VFS on' while a sync is already started an error message about stopping the sync appears (and disappears automatically). After that, virtual syncing is successful.

@TheWebMachine
Copy link
Author

I'll begin trying to reproduce on Wednesday and report back.

Which server version were you working against? I am still on 10.6.0, as I usually hold back on new server versions for a few months unless a major sec patch is needed.

@gabi18
Copy link
Contributor

gabi18 commented May 11, 2021

I'll begin trying to reproduce on Wednesday and report back.

Thanks a lot for your feedback and testing!

Which server version were you working against? I am still on 10.6.0, as I usually hold back on new server versions
for a few months unless a major sec patch is needed.

For yesterdays tests I connected to a 10.7.0 server.
I have just retested against 10.6.0 server version. Both scenarios worked without problems.

@manuell1986
Copy link

Yesterday I was struggling with the same problem:
Upgraded two existing ownCloud clients (on Windows machines) from 2.7.6 to 2.8.1, no problems.
Installed a NEW third ownCloud client, also on a Windows machine, and used it with the same user account. and lo and behold, the client secretly deletes 16,000 files in the background. Also with me, the folder structure remained!

Fortunately, with a lot of patience and a mouse bot, I was able to restore the data (each file individually!) from the recycle bin. The restore feature should definitely be improved.

We use the ownCloud client in conjunction with a Nextcloud server.

I hope they find the bug soon, it's anything but funny to restore so many files by hand :(

@TheOneRing
Copy link
Member

@manuell1986 while we don't provide QA or official support for nextcloud the behaviour is quite unfortunate.
Can you provide infos about the machine on which the sync failed?
What exact version of Windows?
Are virtual files enabled?
Where you syncing the root folder or a sub folder?

@jnweiger
Copy link
Contributor

jnweiger commented Jun 1, 2021

@manuell1986 do you also use NFS storage on the server side?

@manuell1986
Copy link

@TheOneRing, I see. The machine's Windwos version is 20H2 (Build 19042.985), virtual files were enabled and as far I can remember I synced a subfolder. Hope this helps a bit to find the bug :)

@jnweiger I can't say for sure, because the server is maintained by the university, but i think no

@github-actions
Copy link

github-actions bot commented Jul 2, 2021

This issue was marked stale because it has been open for 30 days with no activity. Remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jul 2, 2021
@TheWebMachine
Copy link
Author

Anything new on this? I haven't had time to fire up another server to test this, as I've been swamped with work. I might have time next week.

@github-actions
Copy link

github-actions bot commented Aug 7, 2021

This issue was marked stale because it has been open for 30 days with no activity. Remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Aug 7, 2021
@github-actions
Copy link

The issue was marked as stale for 7 days and closed automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants