Restoring MS office files with .*x extensions to an older version in the WebUI results in file corruption #23928

Closed
gnohp opened this Issue Apr 11, 2016 · 55 comments

Projects

None yet

7 participants

@gnohp
gnohp commented Apr 11, 2016

Steps to reproduce

I was able to reproduce the file corruption consistently when the file has existed in ownCloud for a few months but inconsistently when a few days old. This only seems to occur on .*x file types. Older format and other types (png, txt, etc.) have no issues.

  1. On a Windows system, open Chrome
  2. Locate a .docx or .xlsx file which exist in ownCloud for 8 months
  3. Download the file by clicking the file name
  4. Open the file and verify file opens properly
  5. Make a change
  6. Save the file
  7. Drag and drop the updated file into ownCloud
  8. Select "New File" checkbox when prompted "Which files do you want to keep?"
  9. Click Continue
  10. Select the file row to bring the sideview bar (Activities, Sharing, Versions)
  11. Click Version in the sideview bar
  12. Click the revert button
  13. Wait 2 - 5 minutes
  14. Click the file name to download
  15. Go to file location in the File Explorer and double click the file

Expected behaviour

File opens to the reverted version

Actual behaviour

Error dialog:

Excel:
We Found a problem with some content in '...xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.

Word:
We're sorry. We can't open ... because we found a problem with its contents

Workaround

  1. Click download previous version instead of revert
  2. Upload the downloaded version

Server configuration

Operating system:
CentOS 6.6

Web server:
Apache

Database:
Galera

PHP version:
5.6

ownCloud version: (see ownCloud admin page)
8.2.2 and 9.0

Updated from an older ownCloud or fresh install:
Yes
8.2.2 was from 8.2.1
9.0 was from 8.2.3

Where did you install ownCloud from:
repo

Are you using external storage, if yes which one: local/smb/sftp/...
no

Are you using encryption: yes/no
yes

Are you using an external user-backend, if yes which one: LDAP/ActiveDirectory/Webdav/...
yes

Client configuration

Browser:
Chrome

Operating system:
Windows 10

Logs

Web server error log

No errors

ownCloud log (data/owncloud.log)

Log is empty with log level set to info, warnings, errors and fatal issues
@nickvergessen
Contributor

I think that is the stuff we fixed lately @schiesbn, right?

@schiessle
Member

@gnohp can you re-test this with 9.0.1, we fixed some stuff in this version regarding encryption and versioning.

@gnohp
gnohp commented Apr 12, 2016

@schiesbn just finished testing on 9.0.1 and confirmed the issue still exist. In 9.0.1, after reverting to an older version and discovering the file has been corrupted, when I revert back to the latest document version and attempt to download the document, the previous file version is now corrupted as well. No exceptions are thrown in the log.

@gnohp
gnohp commented Apr 12, 2016

In 8.2.2, the version becomes corrupted once a revert has taken place too. This is not new to 9.0.1 and I'd assume it exist in 9.0.0.

When a shared user updates a file in 8.2.2 (user has change, edit, and share permissions on the file), the previous version is an empty file. Attempting to revert, an exception is thrown Failed to revert [File Name] to revision [Date] and the spinner remains on the file and version sidebar. This is a nonissue in 9.0.1.

@schiessle
Member

just tried it again with 9.0.1 and can't reproduce it. Can you reproduce this behavior reliable? Does it happen with multiple files or only with some specific files? If it happens only with some files could you prepare a test files and send it to me? (I don't know any reason why it should be file specific but you never know)

@gnohp
gnohp commented Apr 20, 2016

Sorry for the delayed response. Just got back from vacation.

Yup the issue is reproducible on Office files with .*x extensions as long as it's been on the server for a few months. I've made new Office files (12 days old now) and those are currently reverting fine. I've also tested with a duplicate copy of a docx file and that works as well (12 days old too).

I was hoping to have a test file when I first reported the issue but I'm not sure if the issue appears after a certain time has passed or if the issue is due to the file being a part of a few upgrades...from late OC7 to OC8-9. I've verified the version was not corrupted since the version downloads and opens fine. It only happens after a user attempts to revert to an older version.

@PVince81 PVince81 added the sev2-high label Apr 21, 2016
@PVince81 PVince81 added this to the 9.0.3-next-maintenance milestone Apr 21, 2016
@schiessle
Member

I just tested it again going from 7.0 to 9.0. Always created new versions in between and tried to download them. It always worked. I could never reproduced the described behavior...

@gnohp
gnohp commented Apr 21, 2016

@schiesbn did you try reverting to the old version or downloading? Maybe it's time related? I'll see if I can gather more information.

@schiessle
Member
schiessle commented Apr 26, 2016 edited

@gnohp I tried both downloading and reverting the version. Also Versions don't have their own encryption keys. All versions and the original file always use the same key.

@gnohp
gnohp commented Apr 27, 2016

On 9.0.1, I performed an octal dump once a docx becomes corrupted.

test1.txt is when I downloaded the docx in the version window (verifying the content is not corrupted) and test2.txt is after I click restore and downloaded the file in the content window.

test1.txt
test2.txt

@PVince81
Collaborator

And here is the diff:

1744,1764c1744,1745
< 0072040  \0  \0 332  \0  \0  \0 023  \0  \0  \0  \0  \0  \0  \0  \0  \0
< 0072060  \0  \0  \0  \0   q   b  \0  \0   c   u   s   t   o   m   X   m
< 0072100   l   /   i   t   e   m   1   .   x   m   l   P   K 001 002   -
< 0072120  \0 024  \0 006  \0  \b  \0  \0  \0   !  \0   4   -   r 271   C
< 0072140 001  \0  \0   F 002  \0  \0 021  \0  \0  \0  \0  \0  \0  \0  \0
< 0072160  \0  \0  \0  \0  \0   V   c  \0  \0   d   o   c   P   r   o   p
< 0072200   s   /   c   o   r   e   .   x   m   l   P   K 001 002   -  \0
< 0072220 024  \0 006  \0  \b  \0  \0  \0   !  \0   H 264  \f  \n 201 003
< 0072240  \0  \0   . 021  \0  \0 022  \0  \0  \0  \0  \0  \0  \0  \0  \0
< 0072260  \0  \0  \0  \0 371   e  \0  \0   w   o   r   d   /   f   o   n
< 0072300   t   T   a   b   l   e   .   x   m   l   P   K 001 002   -  \0
< 0072320 024  \0 006  \0  \b  \0  \0  \0   !  \0 375   Y 025   8 024 003
< 0072340  \0  \0 035   +  \0  \0 024  \0  \0  \0  \0  \0  \0  \0  \0  \0
< 0072360  \0  \0  \0  \0 252   i  \0  \0   w   o   r   d   /   w   e   b
< 0072400   S   e   t   t   i   n   g   s   .   x   m   l   P   K 001 002
< 0072420   -  \0 024  \0 006  \0  \b  \0  \0  \0   !  \0   \ 326 302   Y
< 0072440 271 001  \0  \0 223 003  \0  \0 020  \0  \0  \0  \0  \0  \0  \0
< 0072460  \0  \0  \0  \0  \0  \0 360   l  \0  \0   d   o   c   P   r   o
< 0072500   p   s   /   a   p   p   .   x   m   l   P   K 005 006  \0  \0
< 0072520  \0  \0 025  \0 025  \0   K 005  \0  \0 377   o  \0  \0  \0  \0
< 0072540
---
> 0072040  \0  \0 332  \0  \0  \0
> 0072046

Looks like the files are different.

@gnohp
gnohp commented Jun 14, 2016

@PVince81 It's the same file. test2.txt is missing the tail of the file...line 1744 (0072040) onward.

@PVince81
Collaborator

Hmmm, indeed. Can you tell us the exact byte-size of the original and broken file ?

Maybe it's missing the last block for some reason. There are some loops in the encryption code and maybe it "missed a beat".

@PVince81
Collaborator

Ok, the size can be inferred from the hex dump:
test1.txt: 72540 bytes
test2.txt: 72046 bytes

Ok, so the second file is missing 494 bytes

@PVince81
Collaborator

On v9.0.1 I tried generating a file with this exact size:
dd if=/dev/urandom of=randombytes.dat bs=72540 count=1

Then I followed the steps above. When you said "modify the file" I generated a new file of the same size.

After revert+download, the md5/contents of the file is still correct.

Any reason why you added "Wait 2 - 5 minutes" ? Does the restoration take that long ? W
If it did, maybe it internally ran into a PHP timeout and the file got truncated.
Was the file from the original report shared with lots of people ? (which might explain the slowness)

@PVince81
Collaborator

Just tired doing the same steps but with a share recipient, still works fine.

@PVince81
Collaborator

Also reported by @kenyang001 here #25211

Strangely it's also about office files + encryption + versions + restore.

@gnohp @kenyang001 are you able to repeat the issue by uploading the original file, then the changed file (with the same changes) ?

It could be due to a PHP timeout, but considering that the files in question aren't big, it is less likely.
Or the encryption code is having trouble with specific files with specific contents.

@PVince81 PVince81 added sev1-critical and removed sev2-high labels Jun 22, 2016
@PVince81
Collaborator
@jknockaert
Contributor

@PVince81 No clue about what is happening here. The size of the truncated file does not correspond to an integer multiple of blocksize (encrypted or unencrypted), so I wouldn't suspect the encryption wrapper in the first place.
@gnohp Have these files been produced through an interface (web browser/webdav client), or have they been read from primary storage (i.e. the data folder of the owncloud server)?
In case the primary storage is not corrupted, I would suspect something happening at the level of the file metadata. If the (unencrypted) size is not set correctly in the metadata when manipulating file versions, file corruption will happen upon unencryption.

@PVince81
Collaborator

@jknockaert according to the original steps the file was overwritten using drag and drop in the web UI.

Good point about the size in the filecache. However I think that even if the size there is wrong, it is likely that either it's the encrypted or unencrypted size. Looking at the corrupted hex dumps above, a piece of the end of the file is missing. If the size was wrong in the database, it is unlikely that this size is smaller than the actual file. Usually the unencrypted size would be stored in the DB and in some buggy cases we had the encrypted size there instead, which itself is bigger (not smaller).

@jknockaert
Contributor

@PVince81 Ja for the file creation and manipulation it was done through the UI. But for the files posted here (test1.txt and test2.txt) it has not been specified how they were produced. It may well be that they are consistent in the data store, but that upon reproduction through the UI they fail.
If the unencrypted filesize in the filecache is wrong, corruption happens upon decryption. The size could be smaller e.g. when a more recent version was smaller, and upon reverting to the older version the filecache was not updated correctly.
But I'm seeing this very much from an encryption-wrapper point-of-view. I'm far less familiar with the filecache mechanisms.

@PVince81
Collaborator

@gnohp do you still have the corrupted file somewhere in ownCloud, in the list of versions ? If yes, can you do select * from oc_filecache where name like '%nameofcorruptfile%' ? There should be entries from the "files" folder but also from "files_versions". Goal is to find out whether the size in the "size" column is correct.

@kenyang001 can you do the same for your file ?

@jknockaert
Contributor

Well, thinking of it, it may even be that when the (wrong) unencrypted size is too big in the filecache corruption will happen. It probably would result in an error after a timeout. The user doesn't see this error; he/she just gets a corrupt file (with the error message going to the owncloud log). It would also explain the indications for a timeout that we had earlier in this discussion.

@kenyang001

@PVince81 Hi PVince81, the size is as below
the size in files folder: -rw-r--r-- 1 www-data www-data 27K Jun 20 14:56 bug-test2.xlsx
the size in files_versions folder : -rw-r--r-- 1 www-data www-data 499K Jun 22 17:02 bug-test2.xlsx.v1466405851

if the file is sharing+ encryption+restore+versions, then the corrupted file will happen.
actually, we also get file corrupt wen when not doing restore, but not sure other conditions that causes the same symptoms.
thanks.

@PVince81
Collaborator

@kenyang001 can you also check the sizes of these files in the "oc_filecache" table ? The sizes you see on the filesystem is the encrypted size. In the database you should see smaller sizes.

@PVince81
Collaborator
PVince81 commented Jun 22, 2016 edited

I think I'm onto something. Did some tests on my 9.1.0beta2 server with ODS files and versions.

My steps:

  1. Create an ODS file of a size around 32kb
  2. Upload the ODS file into a folder "test/test.ods" with the web UI
  3. Change the ODS file locally and duplicate some sheets, goal is to increase its size to 40kb
  4. Upload the ODS file in the web UI
  5. Go to "versions" tab
  6. Restore the old file
  7. Try downloading the file

At this stage I get download failures in Firefox or "truncated" messages in cadaver.

Surprisingly the size of the restored file in the file cache is zero!
Edit: I misread, the zero was from the keys folder

+--------+------------------------------------------------------------------------------+-------+----------------------------------+
| fileid | path                                                                         | size  | etag                             |
+--------+------------------------------------------------------------------------------+-------+----------------------------------+
|  92567 | files/test/test.ods                                                          | 40073 | e14742a1b21d631c2a364349f21093f7 |
|  92563 | files_encryption/keys/files/test/test.ods                                    |     0 | 576a54e8445ce                    |
|  92564 | files_encryption/keys/files/test/test.ods/OC_DEFAULT_MODULE                  |     0 | 576a54e845298                    |
|  92565 | files_encryption/keys/files/test/test.ods/OC_DEFAULT_MODULE/fileKey          |    32 | ddc73e64b590fdb9f806cb701c531c3f |
|  92566 | files_encryption/keys/files/test/test.ods/OC_DEFAULT_MODULE/vincent.shareKey |   512 | bc5e2e71e4f3c8e52a7cae798acce0fd |
|  92569 | files_versions/test/test.ods.v1466586344                                     | 32687 | 69aaaa8cff01db19fbc9b7a9472bfa29 |
|  92570 | files_versions/test/test.ods.v1466586439                                     | 40073 | 91112bb539db19aed29a80464af376f1 |
+--------+------------------------------------------------------------------------------+-------+----------------------------------+

However on disk the encrypted size is 52352, so the file is indeed there but the filecache has the wrong size.

Not sure if this is at all related to this ticket or a different issue.
I'll try and see if I can redo this with test files I could share publicly.

@PVince81
Collaborator

Okay I was wrong with the zero size.
Still, my OC doesn't seem to want to deliver this file for download.

@PVince81
Collaborator

Aha, got it: after restoring the 32kb file, the oc_filecache entry should have been set to 32687 but it still is 40073.

I suspect that if the file's size had reduced, then the size would be smaller and a truncated file would be delivered instead.

So the bug is that when restoring a file, the "size" value of the restored file isn't properly copied.

Will do more tests to confirm this.

@PVince81 PVince81 self-assigned this Jun 22, 2016
@PVince81
Collaborator

Confirmed on master / 9.1.0beta2:

  1. Enable encryption
  2. Create a text file "test.txt" and type "abc" with the text editor, save/close
  3. Edit the text file "test.txt" and replace the text with "defghjkl", save/close
  4. Go to versions tab
  5. Restore the previous version
  6. Check the database
MariaDB [owncloud]> select fileid, path, size, etag from oc_filecache where path like '%test.txt%';
+--------+-----------------------------------------------------------------------+------+----------------------------------+
| fileid | path                                                                  | size | etag                             |
+--------+-----------------------------------------------------------------------+------+----------------------------------+
|     47 | files_encryption/keys/files/test.txt                                  |   -1 | 576a58e038660                    |
|     48 | files_encryption/keys/files/test.txt/OC_DEFAULT_MODULE                |   -1 | 576a58e039763                    |
|     49 | files/test.txt                                                        |    8 | dc5e0fd4e261dbd51f6b86175aa0c0eb |
|     50 | files_encryption/keys/files/test.txt/OC_DEFAULT_MODULE/fileKey        |   32 | d5626c25b28c5a50cee50b50da03114e |
|     51 | files_encryption/keys/files/test.txt/OC_DEFAULT_MODULE/admin.shareKey |  512 | 38bfdbd8b4528ce4d33fbaf29dbaad69 |
|     55 | files_versions/test.txt.v1466587363                                   |    3 | e73985d4b8546ecc58a4d8214304cbf8 |
|     63 | files_versions/test.txt.v1466587394                                   |    8 | 68bb09fb7e723aeef67a5afcbddfca83 |
+--------+-----------------------------------------------------------------------+------+----------------------------------+

Expected: file size of "test.txt" is also restored to 3

Actual result: file size of "test.txt" is still 8 and this will mess with file downloads

The size copy issue only happens when encryption is enabled.

Next up: check which other versions are affected, 9.0 and maybe 8.2.

I'll have a look to fix this.

@jknockaert
Contributor

For the encryption wrapper to function reliably it's absolutely crucial that the (unencrypted) size is set right. There's some mechanisms to catch improperly set sizes and correct them, but in my opinion that's band aid. Fixing the filecache handling will solve issues like this as well as any other algorithm that depends on it.

@PVince81 PVince81 removed the needs info label Jun 22, 2016
@PVince81
Collaborator

Just did one pass with the debugger and I saw no code that would update the size of the original file after restoration. The contents of the file is copied with "$storage->moveFromStorage" which doesn't update any cache data.

Now I'm wondering how this could work before, will check older versions.
Else the quickfix is to add an additional cache update there.

CC @icewind1991 in case you have an idea

@PVince81
Collaborator

Just checked stable8.2 and it is affected too.
I haven't checked stable9 and it's very likely that it's affected too.

@PVince81
Collaborator

And stable8.1 is affected too... that's the version that brought the new encryption modules code.

Will now check if there was a previous version in the 8.1.x series that worked before (hoping for a bisect).

@jknockaert
Contributor

@PVince81 What are you looking for? The corruption seems to be in the filecache handling rather than anywhere else.

@PVince81
Collaborator

@jknockaert I'm trying to find whether this is a regression, whether some commit has broken it.

It turns out that it has never worked before, v8.1.0 is also affected.

Now to write a fix for this. I think we need to add code to update the filecache entry after the copy/move of the old version was done.

@jknockaert
Contributor
jknockaert commented Jun 22, 2016 edited

@PVince81 Agree, I don't expect this to be a regression.
The band aid I was referring to is here: https://github.com/owncloud/core/blob/master/lib/private/Files/Storage/Wrapper/Encryption.php#L484
So it would suffice to set unencrypted size to -1 upon rollback; then the encryption wrapper will fix it upon subsequent access.

@PVince81
Collaborator

Here's a fix in the versions code: #25225

However I'm not sure whether it's the correct fix. I observed that without encryption the size was already copied properly, so this means there is another code path that works. From what I remember the encryption code does magic with the size field so maybe that magic bypasses the regular code.

@jknockaert I just saw your comment after submitting the fix, maybe you're onto something to provide a better fix.

@PVince81
Collaborator

@jknockaert the band-aid might not be enought though, I think 8.1 doesn't have the size repairing fix

@PVince81
Collaborator

I could try and analyze which code path is taken for the non-encrypted restore vs encrypted restore, but I'm not sure it's worth investing the time if my PR here #25225 already works.

@jknockaert
Contributor
jknockaert commented Jun 22, 2016 edited

@PVince81 Your approach is better if it works, as it prevents inefficient recalculating the unencrypted size.
The band-aid was backported to 8.1 and everything since; so it could be used if your solution doesn't work.

@PVince81
Collaborator

Ok, I found the reason why it works non-encrypted: at the end of the restore, we do a touch() on the restored file, which somehow updates its size, possibly with a scanFile call.

However in the case of encryption, the wrapper prevents relying on the on-disk size of the file so it doesn't update it. That looks like it.

So this confirms that #25225 is the best fix here.

@jknockaert
Contributor

Ok, sorted out and efficient fix implemented. Great!

@PVince81
Collaborator

@gnohp @kenyang001 to manually repair the "corrupt" files, set their size to -1 do this:

  1. select * from oc_filecache where path like 'files/%test.txt%' (replace test.txt with the file name)
  2. Write down the fileid of that entry after verifying that it's the correct file
  3. update oc_filecache set size=-1 where fileid=$fileid, replace $fileid with the file id from above

This will tell ownCloud to ignore the file size on download and it will automatically redetect the correct size after one download attempt.

The files themselves are not really corrupt, it is only the download that doesn't deliver the correct amount of bytes.

@PVince81 PVince81 closed this in #25225 Jun 22, 2016
@PVince81
Collaborator

@felixboehm FYI in case file corruption has been observed after restoring a version in OC >= 8.1

@kenyang001

@PVince81 Hi Pvince81, thanks for your information. I'd like to know if next owncloud verion will fix this problem? and there are a few corrupted files for our users, do we need manually fix all of them by your method, or can they be fixed automatically when we upgrade to the next owncloud version. please advise. thanks a lot.

@PVince81
Collaborator
PVince81 commented Jul 1, 2016

@kenyang001 9.0.3 was released and contains a fix that prevents this issue to happen again.

However the existing affected files will stay broken, you can need to fix them manually with #23928 (comment)

@jknockaert
Contributor

@PVince81 Should there be a procedure somewhere to reset file sizes? Either by the user for specific files, or by the admin for specific users? The latter could be added to the occ script.

@PVince81
Collaborator
PVince81 commented Jul 1, 2016

@jknockaert the problem is how to identify the broken files ? One would need to decrypt all of them to find out whether the decrypted size mismatches the stored size. This cannot be done over the web UI as it would run into timeouts.
As for an occ command, we cannot decrypt anything without the user's password (or recovery key).

@jknockaert
Contributor

@PVince81 I meant setting the size to -1 in the database. After that the existing mechanisms will recalculate the size. But it may well be that when a folder size is calculated (triggering recalculation of all file sizes) a timeout is hit.

@PVince81
Collaborator
PVince81 commented Jul 1, 2016

@jknockaert in this case, since we can't identify which exact file is broken, maybe would need to set the size of ALL files of that specific user to -1 and let the size auto-adjust during the next download.

@jknockaert
Contributor

@PVince81 That's what I was thinking of. But I'm not sure in what contexts the recalculation of the file sizes is triggered. If that happens en-masse a timeout is probable (especially with large files on slow external storage).

@PVince81
Collaborator
PVince81 commented Jul 1, 2016

@jknockaert from what I remember, the recalculation is done on the fly at download time and shouldn't increase download time. It's just that the first download will not have a Content-Length. The next one will.

@jknockaert
Contributor

@PVince81 No, it's done upon fopen.

@kenyang001

@PVince81 Hi Vincent, thanks for this. so if's safe to set all files' size to -1 under oc_filecache table by myself?

@PVince81
Collaborator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment