Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renamed files are not deteced by the sync client #3615

Closed
christianlupus opened this issue Jul 30, 2021 · 21 comments
Closed

Renamed files are not deteced by the sync client #3615

christianlupus opened this issue Jul 30, 2021 · 21 comments
Assignees
Labels

Comments

@christianlupus
Copy link

How to use GitHub

  • Please use the 👍 reaction to show that you are affected by the same issue.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.

Expected behaviour

When disabling the synchronization for a certain folder and reenabling it, any moved files should be moved on NC as well.

Actual behaviour

The renamed files are not detected as renamed. Instead they are deleted and recreated which causes multiple issues:

  • Loosing of sharing information
  • Loosing of comments
  • History of file
  • Single point of failure if the client data gets corrupted
  • Large files need to be retransmitted to the server and also to any other synced client (server and network load spikes)

Steps to reproduce

  1. Have a folder synchronized and all well, you will need some files within that folder to play with
  2. Disable the synchronization in the desktop app
  3. Do some renaming of files within the folder
  4. Reenable the sync process in the desktop app

Client configuration

Client version: 3.2.3

Operating system: Archlinux

OS language: DE/EN

Qt version used by client package (Linux only, see also Settings dialog): ???

Client package (From Nextcloud or distro) (Linux only): From distro

Installation path of client: /usr/bin

Server configuration

Nextcloud version: 21.0.3.1

Storage backend (external storage): yes, ext4 local filesystem

Logs

Please use Gist (https://gist.github.com/) or a similar code paster for longer
logs.

  1. Client logfile: Link to file

I created an anonymized log by replacing (using sed) the server names with example[123].com and replacing the oc_sessionPassphrase as this seemed sensible to hide these.

The log was taken while the upload process of a few GB was just running as I renamed a few images/videos.

  1. Web server error log: Nothing special is logged here, just a series of 200, 201, 304, and 204 responses.

  2. Server logfile: nextcloud log (data/nextcloud.log):
    Excerpt from the log file. I generated it using cat nextcloud.log | grep '"user":"christian-nextcloud"' | grep '"level":[^0]' | grep -v webdav | grep -v IPreview:PreviewRequested | tail -n 40 to get rid of some cluttering information.

There seem to be some issues related to this problem but all I found were closed in the meantime: #3240, #2132, possible #3493

@allexzander
Copy link
Contributor

@christianlupus We had a fix related to this in 3.2.4. Can you reproduce it on that version too?

@christianlupus
Copy link
Author

I will have to setup a test case. Will do so next week as I am not at the main machine at the moment. Will see what is the result.

@joholm2
Copy link

joholm2 commented Aug 1, 2021

The log file above includes the same "checksum" text error as I reported in #3381

See lines like...
2021-07-30 12:31:02:985 [ info sync.discovery ]: checking checksum of potential rename "bilder-main/Bilder Jasmin/Katzen/zuhause/2021/07/2021-07-04_16-08-37_VID_20210704_160837.mp4" "SHA1:374a0f0468cda18f8a63cc9a352c0e3903645003" "SHA1:374a0f0468cda18f8a63cc9a352c0e3903645003</checksum>"
2021-07-30 12:31:02:985 [ info sync.discovery ]: Not a move, checksums differ

...where the spurious text at the end of the second checksum spoils the comparison.

@joholm2
Copy link

joholm2 commented Aug 1, 2021

I have also just verified that the fault is still present in Nextcloud client 3.2.4

Extract from the log where a simple move was handled as a new file (create) and old file remove (delete)

2021-08-01 15:15:07:353 [ info sync.discovery ]: checking checksum of potential rename "Data/7 - Scans/A999_Bg850-4201_Insättningsuppgifter_Detaljer_20210714.pdf" "SHA1:957b348c0f54aa4b7d28cd356f0f467956d568a8" "SHA1:957b348c0f54aa4b7d28cd356f0f467956d568a8</checksum>"
2021-08-01 15:15:07:353 [ info sync.discovery ]: Not a move, checksums differ

@allexzander
Copy link
Contributor

allexzander commented Aug 3, 2021

@christianlupus Could you provide a bit more details? Which file/folder that you've renamed was later deleted and re-uploaded?
I've been trying to reproduce it and couldn't succeed even on 3.2.3. Tried connecting the root remote folder and a subfolder. Or, maybe you are using the VFS and the problem only happens with VFS mode enabled?

@joholm2 I will check that other issue you've mentioned. FYI, the fix for the case of reuploading when renaming when having a non-root remote folder connected was released in 3.2.4. So, it can indeed be something else and maybe related to incorrectly closed <checksum> tag.

@allexzander
Copy link
Contributor

Alright, I was overwhelmed and haven't noticed the whole picture. Thanks, @joholm2 I will check that issue of inconsistent <checksum> tag more carefully now.

@allexzander
Copy link
Contributor

@christianlupus @joholm2 So, I've been trying to find where does this trailing </checksum> comes from. We don't have anything like that neither in the server nor in the client-side code. There is a chance we had it somewhere in past, probably in the server code which I am not familiar with. On the client's end, we are formatting the checksum like the following: CHECKSUM_TYPE:CHECKSUM, without any <checksum> tags.

Since I can see from logs (and also confirmed by @joholm2 in another issue), this checksum value with redundant </checksum> part is coming from the local sync database, could you then try the following: add a new local sync folder via the Settings dialog. Choose only specific folder/folders that you've been having issues with. Try to reproduce the issue using that folder as described in this ticket. Will the issue be happening with the new sync folder?

I could add a hot-fix to always ignore <checksum> and </checksum> from the stored checksum value, but, then, my colleagues won't be happy with that change, also, I am not fun of this approach too. I might think of some other, better way to migrate from checksums with </checksum> at the end and without it, but, for now, can we just try the test scenario I've described and confirm that with the latest desktop client (and latest or as recent as possible server version), the issue is not happening.

Sorry for this inconvenience.

@joholm2
Copy link

joholm2 commented Aug 3, 2021

Hi @allexzander,

I'm now running Window 10, Nextcloud client 3.2.4 and
Ubuntu server 20.04.2, Nexcloud server 21.0.3

As you suggested I created a new local sync folder "Test@NC" and added a new sync connection to a new corresponding folder on the server. I added a test folder and single test file which then sync'd up as expected.

I tried a rename of the test file and the problem appeared straightaway, with the client onscreen log showing that I had done a delete and a create.

I checked the Nextcloud logs and found the line I am very used to - showing the spurious </checksum> text which causes a failure of the checksum comparison for what are otherwise identical checksum values.

2021-08-03 15:33:11:410 [ info sync.discovery ]: checking checksum of potential rename "Testfolder/A999_Testfile.pdf" "SHA1:252bce66b05be348fffa48192589ba4fe47a626f" "SHA1:252bce66b05be348fffa48192589ba4fe47a626f</checksum>"
2021-08-03 15:33:11:410 [ info sync.discovery ]: Not a move, checksums differ

I have also searched all the code I can and find but see no trace of that text being written anywhere. All I can guess so far is that somewhere the checksum is being stored in something like XML format and that the routine extracting the checksum is missing to strip off the closing XML tag.

@allexzander
Copy link
Contributor

@joholm2 I will have another try and if I fail to find the source of that tag, I will probably add more logs starting from the place where this checksum is created and sent, all the way to when it is getting stored to the database and received from the server. I couldn't find it in the source code even when searching in different versions of the server and the client.

@joholm2
Copy link

joholm2 commented Aug 3, 2021

Looking in the root of my sync folders on Windows I see that each contains two files with names of the form ".sync_nnnnnnnnnnnn.db" and ".sync_nnnnnnnnnnnn.db_wal"

I realise they are not pure text files, but they do contain enough text to be able to see some of the contents, and most of them contain numerous lines which include the text </checksum> at the end of what looks like a normal checksum sequence.

Interestingly there is no sign of an opening tag like <checksum> which I'd expect to be present to match the closing tag.

So presumably the closing tags are not meant to be there in those database files.

If you have any idea which code is responsible for creating and maintaining those files then maybe we can find out more.

@joholm2
Copy link

joholm2 commented Aug 3, 2021

I've found one place in the code so far where end tags are written....

In file .../desktop-master/src/libsync/networkjobs.cpp

Lines 189 to 212...

/*********************************************************************************************/
// supposed to read <D:collection> when pointing to <D:resourcetype><D:collection></D:resourcetype>..
static QString readContentsAsString(QXmlStreamReader &reader)
{
    QString result;
    int level = 0;
    do {
        QXmlStreamReader::TokenType type = reader.readNext();
        if (type == QXmlStreamReader::StartElement) {
            level++;
            result += "<" + reader.name().toString() + ">";
        } else if (type == QXmlStreamReader::Characters) {
            result += reader.text();
        } else if (type == QXmlStreamReader::EndElement) {
            level--;
            if (level < 0) {
                break;
            }
            result += "</" + reader.name().toString() + ">";
        }

    } while (!reader.atEnd());
    return result;
}

I find it strange that the comment at the start indicates the aim is to read the text out from between XML tags but that in the loop we see it includes the possibility to actually write the tags into the result string.

Another thing I find strange is that it starts with "level=0" but only breaks when "level<0"
So if it steps up to level=1 (as it seems intended to do within the loop) then it has to step down twice before it will ever breaks; and I can't see that happening unless it finds two end tags - which would be very odd.

It's hard to say whether this is relevant, but it's the best lead I've found so far.

@joholm2
Copy link

joholm2 commented Aug 4, 2021

Just to note - Today Windows Nextcloud client 3.3.0 was pushed to me so I retested and the same issue with trailing text </checksum> is still present and still causing moves to be handled as delete and creates.

@allexzander
Copy link
Contributor

@joholm2 BTW, which server version you are using?

@joholm2
Copy link

joholm2 commented Aug 4, 2021

Ubuntu server 20.04.2, Nexcloud server 21.0.3

@allexzander
Copy link
Contributor

allexzander commented Aug 6, 2021

@joholm2 Good catch with that function you've found. I have sent a PR for review. Here is a build if you want to test if the issue is gone for you on Linux https://cloud.nextcloud.com/s/WCcWtiQjCyxmos9 (you will need to remove - at the beginning of this file so you can run it). Windows build is here https://cloud.nextcloud.com/s/XTsjER5HCio6Dgw

FYI: You will have to do clear sync (similar to how you did that when testing it last time). This is because the incorrect checksums with trailing <checksum/> have been stored to the local database in your local sync folder :( If the fix works, you will need to remove SQLite database files from the sync folder (hidden files), or, just connect a new empty local sync folder :(

@christianlupus Feel free to test the AppImage above. The same recommendation applies to you. You may want to test it on a clean local sync folder.

@joholm2
Copy link

joholm2 commented Aug 7, 2021

Hi @allexzander, many thanks for the Windows test build.
I installed it (it shows as version 3.3.50) and tested it by creating a new sync folder (Test@NC) containing just a couple of sub folders and a few files.

After renaming one of the files I checked the log and found the corresponding "checking checksum" lines where the problem occurred before. Now there is no sign of the spurious </checksum> text and the rename is detected as it should be.

2021-08-07 20:52:18:003 [ info sync.discovery ]: checking checksum of potential rename "SubfolderA/A999_Testfile.pdf" "SHA1:252bce66b05be348fffa48192589ba4fe47a626f" "SHA1:252bce66b05be348fffa48192589ba4fe47a626f"
2021-08-07 20:52:18:056 [ info sync.discovery ]: Rename detected (up) "SubfolderA/A999_Testfile_renamed.pdf" -> "SubfolderA/A999_Testfile.pdf"

Following that success I tried more kinds of renames and moves by renaming the folders, moving the files from one folder to another and moving folders in and out of each other.

Nextcloud carried all of these out correctly and didn't make any unwanted uploads and deletes.

@allexzander - would you be able to connect the solution to my original post about this at #3381 and close that one in due course?

Thanks again!

@allexzander
Copy link
Contributor

@joholm2 Thanks for contributing your time in helping to discover and solve the root cause for this issue. I have linked my PR to your original issue. It will get closed once the PR is merged.

@allexzander
Copy link
Contributor

This will get fixed with cherry-pick from here 9ffa7bc

@mgallien
Copy link
Collaborator

@christianlupus Can you please test again with 3.3.2 once released at end of month ?
If the bug is indeed fixed for you please close this issue.

@christianlupus
Copy link
Author

Yes, I will give it a try as soon as I have 3.3.2 installed.

@christianlupus
Copy link
Author

I just tried (Arch version if client is 3.3.5) it with a bunch of files (same script as in the original post to rename my images) and the result was positive. The renaming is comparably slow (I suspect due to individual API requests per file and checksum comparisons) but not delete/uploading happened where not needed.

Closing here as it seems solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants