Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nextcloud-Client creating conflicts when it should not #2467

Closed
Bockeman opened this issue Sep 23, 2020 · 190 comments
Closed

Nextcloud-Client creating conflicts when it should not #2467

Bockeman opened this issue Sep 23, 2020 · 190 comments

Comments

@Bockeman
Copy link

How to use GitHub

  • Please use the 👍 reaction to show that you are affected by the same issue.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.

Expected behaviour

Conflicts should only be created when more than one client uploads to the server the same file with different contents.

Actual behaviour

I have several Nextcoud-client 3.0.1 all running on Win10 machines.
One of the sync folders is /AppData/Roaming/Mozilla; this is an "active" folder tree with several files changing continuously. But I have only one Mozilla Firefox browser open.

I am getting several conflicts generated every hour or so.

Steps to reproduce

  1. Add a sync folder pointing at a directory where the contents are changing continuously.
  2. Wait for conflicts!

Client configuration

Client version: 3.0.1
Operating system: Win10
OS language: English

Server configuration

Nextcloud version: 19.0.3
Storage backend (external storage): No external storage. Files on server are NFS mounted (actually gluster).

Logs

The following are generated from scripts running on a linux server with the Win10 machine mounted under
/mnt/veriulan/d/Data/BobW
and show 4 separate occassions (with dates shown) when conflicts were generated:

Compare sizes and dates

-rwxr-xr-x 1 bobw warren 254622 2020-09-23 16:51 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/sessionstore-backups/recovery (conflicted copy 2020-09-23 165141).jsonlz4
-rwxr-xr-x 1 bobw warren 254616 2020-09-23 16:51 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/sessionstore-backups/recovery.jsonlz4

2020-09-23 17:01:43
2020-09-23 17:01:55

Compare sizes and dates

-rwxr-xr-x 1 bobw warren 2514944 2020-09-23 17:20 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/storage/default/https+++web.whatsapp.com/idb/3166453069wcaw (conflicted copy 2020-09-23 172056).sqlite
-rwxr-xr-x 1 bobw warren 2514944 2020-09-23 17:29 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/storage/default/https+++web.whatsapp.com/idb/3166453069wcaw.sqlite

2020-09-23 17:29:59
2020-09-23 17:30:04

Compare sizes and dates

-rwxr-xr-x 1 bobw warren 2514944 2020-09-23 17:36 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/storage/default/https+++web.whatsapp.com/idb/3166453069wcaw (conflicted copy 2020-09-23 173605).sqlite
-rwxr-xr-x 1 bobw warren 114688 2020-09-23 17:56 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/storage/default/https+++web.whatsapp.com/idb/3166453069wcaw.sqlite

-rwxr-xr-x 1 bobw warren 2514944 2020-09-23 17:58 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/storage/default/https+++web.whatsapp.com/idb/3166453069wcaw (conflicted copy 2020-09-23 175858).sqlite
-rwxr-xr-x 1 bobw warren 114688 2020-09-23 17:56 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/storage/default/https+++web.whatsapp.com/idb/3166453069wcaw.sqlite

2020-09-23 19:39:10
2020-09-23 19:39:16

Compare sizes and dates

-rwxr-xr-x 1 bobw warren 261050 2020-09-23 20:10 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/sessionstore-backups/recovery (conflicted copy 2020-09-23 201018).baklz4
-rwxr-xr-x 1 bobw warren 264483 2020-09-23 20:45 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/sessionstore-backups/recovery.baklz4

-rwxr-xr-x 1 bobw warren 261112 2020-09-23 20:10 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/sessionstore-backups/recovery (conflicted copy 2020-09-23 201033).jsonlz4
-rwxr-xr-x 1 bobw warren 264488 2020-09-23 20:45 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/sessionstore-backups/recovery.jsonlz4

-rwxr-xr-x 1 bobw warren 262513 2020-09-23 20:18 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/sessionstore-backups/recovery (conflicted copy 2020-09-23 201853).jsonlz4
-rwxr-xr-x 1 bobw warren 264488 2020-09-23 20:45 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/sessionstore-backups/recovery.jsonlz4

-rwxr-xr-x 1 bobw warren 262061 2020-09-23 20:22 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/sessionstore-backups/recovery (conflicted copy 2020-09-23 202248).jsonlz4
-rwxr-xr-x 1 bobw warren 264488 2020-09-23 20:45 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/sessionstore-backups/recovery.jsonlz4

-rwxr-xr-x 1 bobw warren 264463 2020-09-23 20:38 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/sessionstore-backups/recovery (conflicted copy 2020-09-23 203812).baklz4
-rwxr-xr-x 1 bobw warren 264483 2020-09-23 20:45 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/sessionstore-backups/recovery.baklz4

-rwxr-xr-x 1 bobw warren 264409 2020-09-23 20:38 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/sessionstore-backups/recovery (conflicted copy 2020-09-23 203834).jsonlz4
-rwxr-xr-x 1 bobw warren 264488 2020-09-23 20:45 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/sessionstore-backups/recovery.jsonlz4

-rwxr-xr-x 1 bobw warren 2523136 2020-09-23 20:08 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/storage/default/https+++web.whatsapp.com/idb/3166453069wcaw (conflicted copy 2020-09-23 200812).sqlite
-rwxr-xr-x 1 bobw warren 2531328 2020-09-23 20:45 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/storage/default/https+++web.whatsapp.com/idb/3166453069wcaw.sqlite

2020-09-23 20:46:07
2020-09-23 20:46:13

@edmael
Copy link

edmael commented Sep 24, 2020

Same is happening here with Win10, Nextcloud Desktop 3.0.1 and Nextcloud Server 13.0.5.

@codewichtel
Copy link

Same happening in Win10 , Nextcloud Desktop 3.0.1 and NC Server 16 17 18 19

@Bockeman
Copy link
Author

Another example conflict. This conflict should not happen. The files changing on the client are due to activity. The only files changing on the server are the nextcloud-client uploads to the server.

image

As a separate issue, "Click for details" has no effect.
#1195

A debug log is here
nextcloud_200929_1637.log

@rasos
Copy link

rasos commented Dec 29, 2020

After several conflicted copy incidents we have been observing randomly timestamp changes on some files on the storage backend. By appointing incrond on some frequently affected files (including files which are not of interest to nextcloud syncing like .htaccess) to report any changes to syslog we could exclude that the server itself is touching any files. Our storage backend (Hetzner Storage Box) is being mounted via sshfs (which turned out more performant and more reliable than Samba/CIFS that had problems with directory names containing spaces at the end). We also mount it read-only (!) on a second server to run backups every second day - which was roughly the frequency of some changed file timestamps. After pausing the backup process we have not seen any conflicted files anymore. When reading a file with the command less on the nextcloud, once we could observe, that this also changed the timestamp, but this behavior was not yet reproducible.

So as the problem of @Bockeman occurs also with files that are NFS mounted to gluster we should be looking more closely at the reliability of storage backends.

@pandiloko
Copy link

I'm having this problem with Android Client and Docker server installation (20.0.4). Sorry, I don't understand the nature of the problem but is it not possible to obtain a MD5 sum in case of conflict to let the user know that the changes are related to date (access, creation, whatever) and not content? Even a option to ignore date changes would be fine.

@rojaro
Copy link

rojaro commented Jan 17, 2021

I am having this issue with the Windows 10 laptop of my wife quite often (at least a dozen times in within the last 12 months):

  • Client and server machines use the same NTP server.
  • Windows client version is 3.1.1
  • Server version is 20.0.5
  • There is always a newer and an older file.
  • Usually the newer file is on the client, rarely on the server.
  • The difference between the timestamps is usually just a few minutes, but often also a day or two.
  • More often than not, the changes have been made a few months in the past.
  • All files have been uploaded to the server by the very same client and user (no other clients/users have access).
  • Storage on the server resides on vanilla Ext4 filesystem which is exclusive to Nextcloud (No other services are using the storage space)

Conflict resolution is also really annoying as i can only resolve one file at a time:

  • If i choose to keep a local or remote file the client starts synchronizing which takes some time (~ 10 seconds).
  • If i try to resolve another file while the synchronization is still running, the client will reproducably crash and i have to manually restart it.

Once i've finished manually resolving the conflicts everything seems to be fine for some time until my wife calls me again because the conflicts are back.

No problems with our Linux and Android clients ...

@rasos
Copy link

rasos commented Jan 17, 2021

We could nail down the problem with an external storage that was changing the wrong timestamp. Instead of atime (when a backup was reading files) it always changed mtime and ctime. Check it with the stat command on any file and see if that is changed correctly, even if you just read. We've built now our own storage system and are just migrating data.

@er-vin
Copy link
Member

er-vin commented Jan 18, 2021

We could nail down the problem with an external storage that was changing the wrong timestamp. Instead of atime (when a backup was reading files) it always changed mtime and ctime. Check it with the stat command on any file and see if that is changed correctly, even if you just read. We've built now our own storage system and are just migrating data.

I advise everyone here to check this kind of things. Indeed with an external storage behaving that way there's nothing else the client can do but report conflicts.

@github-actions
Copy link

This bug report did not receive an update in the last 4 weeks. Please take a look again and update the issue with new details, otherwise the issue will be automatically closed in 2 weeks. Thank you!

@github-actions github-actions bot added the stale label Feb 25, 2021
@Bockeman
Copy link
Author

I'm now on Nextcloud Client version 3.1.3 and I am still getting conflicts.

The general case appears to be when client files change during a Nextcloud sync, with one client and (obviusly) one server. In my opinion, if there is only one client, there should never be any conflicts under any circumstances, whether files change during the sync process, or there are network drop-outs. This should be all managed by the local and server database files with appropriate checksums to cover potential network issues.

My experience continues to be that Nextcloud Client is creating conflicts when it should not.
(Though, in fairness, such conflicts have been occurring less frequently).

@ikogan
Copy link

ikogan commented Feb 25, 2021

I'm even seeing this issue when using the built in markdown editor in Nextcloud. I did stat on a file:

  File: CC Transition Plan.md
  Size: 6906            Blocks: 29         IO Block: 7168   regular file
Device: 4ah/74d Inode: 329327      Links: 1
Access: (0700/-rwx------)  Uid: (1234/   kogan)   Gid: (1234/   kogan)
Access: 2021-02-25 13:53:08.275518704 -0500
Modify: 2021-02-25 14:39:43.303202466 -0500
Change: 2021-02-25 14:39:43.303202466 -0500
 Birth: 

Then I save the file in the nextcloud gui and run stat again:

  File: CC Transition Plan.md
  Size: 6905            Blocks: 29         IO Block: 7168   regular file
Device: 4ah/74d Inode: 329327      Links: 1
Access: (0700/-rwx------)  Uid: (1234/   kogan)   Gid: (1234/   kogan)
Access: 2021-02-25 13:53:08.275518704 -0500
Modify: 2021-02-25 14:40:00.383787486 -0500
Change: 2021-02-25 14:40:00.383787486 -0500
 Birth: -

A few seconds later, the GUI will tell me the file was edited "outside of the editor". I then stat the file again:

  File: CC Transition Plan.md
  Size: 6905            Blocks: 29         IO Block: 7168   regular file
Device: 4ah/74d Inode: 329327      Links: 1
Access: (0700/-rwx------)  Uid: (1234/   kogan)   Gid: (1234/   kogan)
Access: 2021-02-25 13:53:08.275518704 -0500
Modify: 2021-02-25 14:40:00.383787486 -0500
Change: 2021-02-25 14:40:00.383787486 -0500
 Birth: -

When I looked down at my local clock, it read 14:40 as well. I don't know the exact number of milliseconds but this does not seem to be working as intended. These files are indeed on an "External SSH" share.

@sunjam
Copy link

sunjam commented Feb 25, 2021 via email

@github-actions github-actions bot removed the stale label Feb 26, 2021
@pandiloko
Copy link

pandiloko commented Feb 28, 2021

I always had problems with Autoupload. It just doesn't work reliably. I really don't understand why is this happening. The only thing that should work, the basis and main focus of Nextcloud should be a secure and reliable sync, in my opinion. It isn't apparently. I don't care about all the fancy apps and bells and whistles. Or maybe, but first and foremost sync MUST work. Always.

If I disable Autoupload and enable it again. It actually re-uploads the whole library. The files don't retain the original creation date and files_versions saves a copy of each file which is byte for byte the same as the re-uploaded one. I have the feeling the solution might be a md5sum (or blake2 or whatever) away because it is almost working but just not yet.

Sorry for this heated feedback. I honestly greatly appreciate this project and someday it will be great but today a 644MB video didn't wanted to upload and I did again the disable/enable trick and while I was watching reupload my 6000+ photos and videos again I had enough. I pulled the plug of my Nextcloud instance and went back to seafile, which doesn't have all the fancy features but it does the sync right (creation date is also not preserved, though).

@github-actions
Copy link

This bug report did not receive an update in the last 4 weeks. Please take a look again and update the issue with new details, otherwise the issue will be automatically closed in 2 weeks. Thank you!

@Bockeman
Copy link
Author

With respect, I still think the synchronization algorithm needs further refinement.

The mode of operation is one client which is actively and regularly updating files, and one server with no activity on the directories being synchronised. Other clients are being updated, but this is stricly read only. Thus NC is effectively working as a live backup from the client to the server with a distributed backup to other clients. Any file on the client may change or disappear during synchronization.

In this scenario there should never be any conflicts. Conflicts that do arise are therefore because of a problem with the synchronization algorithm or implementation. As with any distributed system, other independent network or disk activity may mean that response times are sometimes longer than usual (with possible timeouts), but the synchronization algorithm should handle this gracefully with retries, or at worst some "unable to sync error". This scenario should never cause conflicts, yet NC continues to do so.

Below are some recent examples. Notice the dates and frequency; more than one conflict per week.

-rwxr-xr-x 1 bobw warren 50151 2021-03-26 11:02 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Thunderbird/Profiles/rdvpq1tu.default/prefs (conflicted copy 2021-03-26 110259).js
-rwxr-xr-x 1 bobw warren 50151 2021-03-26 11:05 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Thunderbird/Profiles/rdvpq1tu.default/prefs.js

-rwxr-xr-x 1 bobw warren 53 2021-03-26 11:03 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Thunderbird/Profiles/rdvpq1tu.default/sessionCheckpoints (conflicted copy 2021-03-26 110301).json
-rwxr-xr-x 1 bobw warren 204 2021-03-26 11:01 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Thunderbird/Profiles/rdvpq1tu.default/sessionCheckpoints.json

-rwxr-xr-x 1 bobw warren 524288 2021-03-26 11:09 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Thunderbird/Profiles/rdvpq1tu.default/cookies (conflicted copy 2021-03-26 110923).sqlite
-rwxr-xr-x 1 bobw warren 524288 2021-03-26 11:01 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Thunderbird/Profiles/rdvpq1tu.default/cookies.sqlite

-rwxr-xr-x 1 bobw warren 8826 2021-03-26 11:09 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Thunderbird/Profiles/rdvpq1tu.default/Mail/pop3.blueyonder.co.uk/Inbox (conflicted copy 2021-03-26 110923).msf
-rwxr-xr-x 1 bobw warren 7255 2021-03-26 11:01 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Thunderbird/Profiles/rdvpq1tu.default/Mail/pop3.blueyonder.co.uk/Inbox.msf

-rwxr-xr-x 1 bobw warren 1173870 2021-03-26 11:09 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Thunderbird/Profiles/rdvpq1tu.default/panacea (conflicted copy 2021-03-26 110923).dat
-rwxr-xr-x 1 bobw warren 1173870 2021-03-26 11:01 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Thunderbird/Profiles/rdvpq1tu.default/panacea.dat

-rwxr-xr-x 1 bobw warren 5242880 2021-03-26 11:09 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Thunderbird/Profiles/rdvpq1tu.default/places (conflicted copy 2021-03-26 110923).sqlite
-rwxr-xr-x 1 bobw warren 5242880 2021-03-26 11:01 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Thunderbird/Profiles/rdvpq1tu.default/places.sqlite

-rwxr-xr-x 1 bobw warren 1048576 2021-04-04 01:13 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/cookies (conflicted copy 2021-04-04 011354).sqlite
-rwxr-xr-x 1 bobw warren 1048576 2021-04-04 01:18 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/cookies.sqlite

-rwxr-xr-x 1 bobw warren 11206656 2021-04-06 00:19 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/webappsstore (conflicted copy 2021-04-06 001936).sqlite
-rwxr-xr-x 1 bobw warren 11206656 2021-04-05 00:21 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/webappsstore.sqlite

-rwxr-xr-x 1 bobw warren 10485760 2021-04-08 20:40 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/places (conflicted copy 2021-04-08 204009).sqlite
-rwxr-xr-x 1 bobw warren 5242880 2021-04-08 20:40 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/places.sqlite

-rwxr-xr-x 1 bobw warren 11206656 2021-04-08 20:40 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/webappsstore (conflicted copy 2021-04-08 204010).sqlite
-rwxr-xr-x 1 bobw warren 11206656 2021-04-08 15:05 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/webappsstore.sqlite

-rwxr-xr-x 1 bobw warren 1048576 2021-04-10 01:23 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/cookies (conflicted copy 2021-04-10 012324).sqlite
-rwxr-xr-x 1 bobw warren 1048576 2021-04-09 08:04 /mnt/veriulan/d/Data/BobW/AppData/Roaming/Mozilla/Firefox/Profiles/6qw6a8eb.default/cookies.sqlite

@github-actions
Copy link

This bug report did not receive an update in the last 4 weeks. Please take a look again and update the issue with new details, otherwise the issue will be automatically closed in 2 weeks. Thank you!

@github-actions github-actions bot added the stale label May 10, 2021
@Bockeman
Copy link
Author

I upgraded to client 3.2.1.
Same issue is still present.

Can I provide more information or other assistance in order for this bug to at least get some attention?

@github-actions github-actions bot removed the stale label May 11, 2021
@FlexW FlexW added the approved bug approved by the team label May 12, 2021
@tweiner
Copy link

tweiner commented Apr 7, 2022

Just to pile on with everyone else.

I have a nextcloud server running in a jail on a truenas ( TrueNAS-12.0-U8) server. My machines are Windows 10. I use nextcloud to sync between my desktop and notebook. I never have the same files open on both machines at the same time.

The client and server are up to date. I use Excel files, alot. When I have one open I start getting the "Conflict" message many many times. This is new with the last major server update. I do not know if it is a coincidence or happenstance.

But it is real frustrating.

Thanks.

@Bockeman
Copy link
Author

I have been following up on the comments that @wonx has contributed, particularly that he has one configuration for which he has seen no false conflicts to date.

It seems that there are several configuration settings that make a difference, and what seems to have the most effect is the opcache settings suggested by @vkahl above:
#2467 (comment)

Indeed when I replicate these settings on my test system (where data is stored on local memory of the server), then my rate of false conflicts drops singificantly.

opcache.enable=1
opcache.memory_consumption=512
opcache.interned_strings_buffer=64
opcache.max_accelerated_files=100000
opcache.max_wasted_percentage=15
opcache.validate_timestamps=0
opcache.revalidate_freq=0
opcache.save_comments=1
opcache.fast_shutdown=1
opcache.mmap_base=0x20000000
opcache.file_cache_fallback=1

After 15 hours I got 8 false conflicts with a sleep 5 in my script

for i in {0..1000}
do
  echo -n "  transaction ${i}0,000"
  date +\ \ %F\ %T.%N
  for j in {0..10000..30}
  do
    for k in {0..9}
    do
      cp -a listing_{,2}${k}.txt
      mv listing_{2,1}${k}.txt
      (echo "listing ${i},${j}${k}"; ls -a --full-time; date +%F\ %T.%N\%n;) > listing_${k}.txt
    done
    sleep 5
  done
  ls -ltr | awk '/conflicted/{c++;l=$0}END{if(c){print " ",c,"false conflicts, latest =",l}}'
done
date +\ \ %F\ %T.%N\%n

If the sleep is less than 5, then the server does not have time to "catch up" and the rate of false conflicts drops. For systems where the access times to the server storage is slow, you may have to increase this sleep time.

To quantify the rate of false conflicts, I use two measures:

  1. ppm or parts per million, as the estimated number of false conflicts that would occur for one million transactions. This is why the step in the j loop above is 30; 3 transactions, a copy, a move, a replace for each of the 10 i loop iterations.
  2. MTBF or mean time between failures. This is, on average, how long you might have to wait before/between seeing any false conflicts. This is biased by the sleep time, so only makes sense for comparisons when using the above script with the same sleep time. It does not give any indication of how frequently one might expect false conflicts to occur in a production system.

Using the above configuration I get 240ppm MTBF=2hours.
Whereas previously I have been getting results like 650ppm, 790ppm or more.
I conclude, getting the correct opcache and other configuration settings has a significant impact, but does not solve the fundamental problem that in this scenario (single client active) Nextcloud should not generate any conflicts.

However, this configuration does not make sense to me.
https://www.php.net/manual/en/opcache.configuration.php#ini.opcache.validate-timestamps

opcache.validate_timestamps=0
opcache.revalidate_freq=0            This configuration directive is ignored if opcache.validate_timestamps is disabled. 
opcache.fast_shutdown=1              Removed in PHP 7.2.0
opcache.file_cache_fallback=1        Windows only.

in other words, if a recommended configuration is not self-consistent, how can anyone rely upon it?

So I adjusted my configuration as follows:

  opcache.validate_timestamps=1
  opcache.revalidate_freq=100

I ran for 138 hours and got 16 false conflicts, that is 13ppm MTBF=8.6 hours.

@Bockeman
Copy link
Author

Bockeman commented Apr 12, 2022

In an attempt to identify which of the opcache configuration settings has the most effect, I ran a control test:

opcache.enable=0
opcache.enable_cli=0

To my complete surprise, I have not, to date, seen any false conflicts. This test is still running. Even if I find no false conflicts after a week or so, that does not mean the problem has been solved (because, as demonstrated, changes in configuration affect the rate at which false conflicts occur, they do not address the fundamental problem).

But this does suggest a viable temporary workaround for those that are suffering from false conflicts.

Please could anyone who finds a false conflict with opcache disabled report here. Conversely, if this proves effective for you by dramatically reducing false conflicts, please upvote this comment (see the smiley on the strapline for this comment).

@Bockeman
Copy link
Author

Bockeman commented Apr 12, 2022

Does disabling opcache have any noticeable effect on performance?

https://docs.nextcloud.com/server/latest/admin_manual/configuration_server/caching_configuration.html

"You can significantly improve your Nextcloud server performance with memory caching, where frequently-requested objects are stored in memory for faster retrieval."

"A PHP opcache stores compiled PHP scripts so they don't need to be re-compiled every time they are called."

"Data caching is supplied by the user (APCu), Memcached or Redis."

So the opcache affects performance where a significant amount of time is spent processing PHP scripts and where that same PHP script is used repeatedly.

Experience on my production system suggests that page render times are dominated by the volume of data being fetched. Even on pages that I imagine might involve significant PHP processing (such as using the picoCMS App), I cannot discern any difference in page render times with opcache either enabled or disabled.

The very first time on a given machine, after a reboot, or similar, page render times can be longer, and I put this down to fetching data that can be held in the local browser cache. The following are a sample of results that I repeated several times and after switching opcache on and off several times. Times are very approximate because of human deviations in recording button press time and observing page fully rendered.

login                              5s
logout                             5s
visit a different folder           1s
visit a different picoCMS page    10s

EDIT: My mistake, I had changed /etc/php.ini instead of /etc/php.d/10-opcache.ini. Now there is a difference when opcache is enabled: visiting different folders is slower, perhaps 2s or even 3s. But I'd rather suffer that than false conflicts. I've retained the rest of this comment for historical completeness, now obviously no longer quite so pertinent.

Subjectively, it feels like performance is slightly faster when opcache is disabled. That would make sense because no time is wasted in either searching for entries in the opcache nor in filling the opcache with newly compiled scripts/objects.

Can anyone justify enabling opcache?
Please upvote this comment if you agree that disabling opcache makes no discernable negative effect on performance.

@sunjam
Copy link

sunjam commented Apr 12, 2022

opcache is a recommended part of setting up nextcloud. See the documentation here.

@Bockeman
Copy link
Author

Bockeman commented Apr 12, 2022

@sunjam I'm gald you've joined the discussion.

opcache is a recommended part of setting up nextcloud. See the documentation here.

Is that really true? Is opcache actually recommended? On the page you referenced, there is a link to the page to which I referred which goes into more detail about opcache for scripts/objects and how data is handled independently. It states:

A memcache is not required and you may safely ignore the warning if you prefer.

Now it is not that I want to disagree with you, but rather I really want to understand why something might be recommended (and in this case, the documentation is, let's say "unclear", if not contradictory and confusing).

  1. I could not observe any difference in performance whether opcache was enabled or disabled
  2. An unwanted artefact (false conflicts) appears to have a viable workaround when opcache is disabled

So from my perspective, I would say that the recommendation should be that opcache is disabled.

Please could someone provide reasoned arguments as to why opcache might be recommended or not.

@wonx
Copy link

wonx commented Apr 12, 2022

@Bockeman
I tried disabling opcache to see if anything changed. (And I checked that it was indeed disabled using phpinfo())

opcache.enable=0
opcache.enable_cli=0

In my case, I don't see any difference:

  • On SFTP external folders I get consistent conflicts the second time a file is modified, no exceptions.
  • On local storage and on SMB external folders I don't get conflicts, unless a file was already present as a virtual file (.nextcloud), then downloaded (sync'd) and then modified.

So, just like before.

@Bockeman
Copy link
Author

I may have to eat an awful lot of words!

How do you ensure php.ini is honoured? How do you check if php has those settings?

On my production system, I tried

systemctl restart redis mariadb httpd php-fpm
egrep '^opcache\.enable' /etc/php.ini
  opcache.enable=0
  opcache.enable_cli=0
php -i | egrep opcache\.enable
  opcache.enable => On => On
  opcache.enable_cli => On => On
  opcache.enable_file_override => Off => Off
php -r "phpinfo();" | egrep opcache\.enable
  opcache.enable => On => On
  opcache.enable_cli => On => On
  opcache.enable_file_override => Off => Off

but, as you see, opcache still appears to be enabled.

What am I doing wrong?
(on my test system, I do get

opcache.enable => Off => Off
opcache.enable_cli => Off => Off
opcache.enable_file_override => Off => Off

)

@wonx
Copy link

wonx commented Apr 12, 2022

Ok, in my installation, setting opcache.enable=0 and opcache.enable_cli=0 in php.ini did nothing. In my case, I had to add these two lines in a file in the conf.d subfolder:
/etc/php7/conf.d/00_opcache.ini
(and restart the server, of course)
But that might be specific to my docker installation. Just make sure there isn't any other configuration file that overrides php.ini

I checked that it was indeed enabled by creating a file in nextcloud's folder (e.g. info.php) with the following inside:

<?php
  phpinfo();
?>

and then browsing https://yourserver.com/info.php, and checking the Zend OPcache section you will see something like opcache.enable | On. But I guess it's equivalent as checking the php info from the command line as you did.

@Bockeman
Copy link
Author

@wonx thanks, the file I should have changed was /etc/php.d/10-opcache.ini
(A little concerned that you have php7 in your path as php has been version 8 for several Nextcloud releases.)
Disabling opcache does appear to have an effect on performance though barely noticeable.
I edited my findings above.

@Bockeman
Copy link
Author

I ran the test script, with sleep 50 (because of the slow memory) on local storage via FUSE with 250ms delay on lstat, with opcache.enable=0.

Fedora                                                   35
nextcloud                                                24.0.0 beta 3                                     CLI update
httpd.x86_64                                             2.4.53-1.fc35                                     @updates
mariadb.x86_64                                           3:10.5.13-1.fc35                                  @updates
nextcloud-client.x86_64                                  3.3.6-1.fc35                                      @updates
php-fpm.x86_64                                           8.0.17-1.fc35                                     @updates
redis.x86_64                                             6.2.6-1.fc35                                      @updates
Storage: Local HDD: 7200 RPM, 32MB Cache, SATA 6.0Gb/s via FUSE with 250ms delay on lstat

After running for 36.7 hours 15 false conflicts were generated, 194ppm MTBF=2.4 hours

This means that opcache.enable=0 is NOT a viable workaround , although it does reduce the frequency of false conflicts considerably. It also means that of all the configuration tweaks applied, the only effect is a change in the frequency of false conflicts. Furthermore, there appears to be no single component, like opcache, that can be identified as doing something unexpected (such as inducing an exception which might not be handled correctly in the client code).

There is a bug in the client code (most likely in the exception handling) which erroneously causes conflicts to be generated.

Please @FlexW @er-vin or @claell @mgallien @allexzander @camilasan can you help get some developer attention on this issue?

@wonx
Copy link

wonx commented Jun 23, 2022

I can confirm this issue is still present in Nextcloud 24.0.2, with the desktop client 3.5.1.

In my case, I can only replicate it in sftp external folders, but not in SMB.

@stuartjordan
Copy link

My issue has gone away now that I've moved my data from external SMB storage to local (NVMe) disk.

@wonx
Copy link

wonx commented Oct 11, 2022

In my case, with client 3.6.0, and a SFTP external folder, the issue is still present.

@wonx
Copy link

wonx commented Nov 14, 2022

I have been running the same tests with Nextcloud 25.0.1, and the issue with sftp external folders seems to have gone away.

@ruedigerkupper
Copy link

ruedigerkupper commented Nov 15, 2022

Sounds almost too good to be true ;-). It will be a while until we can upgrade our production system to v25, though.
Can anyone confirm?

@wonx
Copy link

wonx commented Nov 15, 2022

Nah, I spoke too soon. Today I tried with the script that was posted a few coments above, and I started getting conflicts almost immediately. It has the same problems as before.

@mgallien
Copy link
Collaborator

#5188 and #5182 should help a lot with fake conflicts
Could you please test the build from 5188 PR once it is ready or wait for it to be merged and test a nightly build from https://download.nextcloud.com/desktop/daily/ ?
feedback would be very very much appreciated
keep in mind that I am syncing a very busy instance to detect such bugs and hope to squeeze them all

@Bockeman
Copy link
Author

@mgallien thanks for all your work on nextcloud, it is much appreciated.

Executive summary

Your 5188 PR does not solve the false conflicts problem.

Executive question

If 5188 PR addresses "update local file mtime on changes from server", but not "update server mtime on changes from client", then surely this will cause false conflicts?

Details

I upgraded all I could before using 5188 PR and confirmed that I still get false conflicts:
1133ppm, MTBF=3mins, trans=30k, after=107mins, conflicts=34, sleep=104+1+0.1+0.01, 100%_del=500ms, k_max=2
In case you have not trawled through the 2 years of messages above, I implement a fuse storage with random delays, run a script which creates, deletes, renames, copies and appends to a number of files in a loop with delays. I then report the error rate as ppm (parts per million), and so on, for a large number (e.g. 30,000) of these transactions.
The occurrence of false conflicts is random, I cannot guarantee to generate a false conflict, and conversely if I do not see a false conflict, I assume that I have simply not run for long enough to spot the random event.

I installed and ran your 5188 PR AppImage without the fuse random delays to check against regression. Surprisingly, I got a false conflict
250ppm, MTBF=20mins, trans=4k, after=20mins, conflicts=1, sleep=190.7+1+0.1+0.01, 100%_del=500ms, k_max=2
I was not trying to induce false conflicts, but since the false conflict phenomenon is random, I should not have been surprised.

I repeated with fuse random delays and got many false conflicts.
2466ppm, MTBF=2mins, trans=30k, after=152mins, conflicts=74, sleep=190.7+1+0.1+0.01, 100%_del=500ms, k_max=2
This confirms that your 5188 PR does not solve the false conflicts problem.
You might conclude that 5188 PR makes things worse, but I would rather consider this as more evidence of the random nature of the problem of false conflicts. However, given the significant increase in the error rate, I would suggest that 5188 PR does change the dynamics and might therefore be a relevant clue into what is causing the false conflicts.
I'm going to run some longer tests with/without 5188 PR to establish whether 5188 PR does have a significant impact (or is just a fluke of randomness).

Environment

The randomisation and delays within the fuse storage, plus the delays within the script loop are extremely sensitive. I am trying to catch random events and tuning the parameters to yeild the most false conflicts is tricky. The fuse program and loop script is not as posted above, but I can supply if requested. I suspect such scripts need to be tuned to the specific machine and environment to maximise the occurrence of false conflicts.

Forum

Given no developer attention on this issue for two years, until now, might I ask, which would be the best forum and/or topic title to raise this issue to pique the interest of the relevant experts?

@ruedigerkupper
Copy link

ruedigerkupper commented Nov 26, 2022

@Bockeman Thank you for your work and analysis. Regarding your last question I'd suggest that someone with a payed support contract from Nextcloud should raise this issue with the official Nextcloud support.

@mgallien
Copy link
Collaborator

@Bockeman Thank you for your work and analysis. Regarding your last question I'd suggest that someone with a payed support contract from Nextcloud should raise this issue with the official Nextcloud support.

You should not worry, we are actively working on it. Most cases can be reproduced so we are busy with the bug fixes

@margual56
Copy link

Hello @mgallien! Is there any update?

I'm having the same issue, so I cannot use Nextcloud until this is fixed :(

@flyinghuman
Copy link

Same here with 24.0.12 Enterprise Version. Any Progress on this Issue?

@meonkeys
Copy link

meonkeys commented Aug 26, 2023

I can get a conflict every time with these repro steps:

  1. have Temporary files lock installed and enabled
  2. lock a file (say, test.md or test.ods)
  3. edit that file locally (I'm certain I'm the only one editing when I do)
  4. wait until the desktop client (tries to) sync the file

I'm using Nextcloud server v27.0.2 (latest stable, run via Docker on an Ubuntu server), Desktop client v3.9.3 (latest stable, installed from nextcloud-devs PPA). Desktop client is running on a 64-bit Ubuntu 22.04 LTS desktop.

@ruedigerkupper
Copy link

Found an issue that might be related:
nextcloud/server#34176
It describes ptoblems with etags on CIFS storage. Since it appears that NC uses etags for versioning, there might be a general problem with etags on external storage?

@flyinghuman
Copy link

Good News on this Issue: Enterprise Support fixed this Issue: nextcloud/server#40487

At least for us everything is fine now. Thanks!

@mgallien
Copy link
Collaborator

mgallien commented Oct 3, 2023

I would prefer to close this one and have people open a new one for current findings
There are a number of different issues being covered in this issue making it difficult to get an idea if work still need to be done or not

@mgallien mgallien closed this as completed Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests