Skip to content

Conversation

@schiessle
Copy link
Contributor

recalculate unencrypted size if we assume that the size stored in the db is not correct

a alternative approach to #22567

This approach re-uses the file size calculation by @jknockaert: #11659

I was under the impression that this is no longer possible with encryption 2.0. But in fact it works quite nicely if we place it directly in the storage wrapper.

@jknockaert @PVince81 please have a look. What do you think?

  • unit tests

@mention-bot
Copy link

By analyzing the blame information on this pull request, we identified @DeepDiver1975, @LukasReschke and @nickvergessen to be potential reviewers

* @throws DecryptionFailedException
*/
public function decrypt($data, $position = 0) {
error_log ("position: " .$position);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm?

@schiessle schiessle force-pushed the fix_broken_unencrypted_size branch 4 times, most recently from cea7067 to e72c9c8 Compare February 22, 2016 16:53
@schiessle schiessle added this to the 9.0-current milestone Feb 22, 2016
@PVince81
Copy link
Contributor

Too bad this still requires two downloads because content-length is already set.

Maybe if the repair is done inside of getMetaData instead ? We tried this locally and there was an infinite recursion. Might be possible to bypass this.

Another idea is to have the Sabre Node re-read the metadata after doing a fopen to have the correct size. Drawback is that this would do another SQL query to retrieve the same data, especially when encryption is disabled. And this wouldn't cover web UI download yet.
Note that this approach works only because Sabre first does a fopen() (Node->get) before it actually writes the headers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to count it? Why not check it like this:

if ($data[$blockSize - 1] ) {
   $count = $blockSize;
} else {
  $count = strlen($data);
}

This will then only count the last block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do that if it is faster. Note that there is no guarantee that when asking for $blocksize an encrypted stream will return $blockSize even in the middle of an (unseekable) stream. But your code suggestion would work regardless in the middle of a file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MorrisJobke I also thought about something like this. But I decided against it. Remember, this is a repair function. This will not be called regularly but only for broken files which doesn't happen all the time, otherwise we will have to fix something else. 😉 Therefore I prefer the easier, less error-prone code. Such optimizations make sense for code which gets called regularly but not for a repair step.

@jknockaert
Copy link
Contributor

@schiesbn This is definitely an implementation of what I had in mind. It is in my opinion consistent with the encryption wrapper design and it should be easy to maintain. I added some minor comments to the code. I'm just wondering about how the edge case of a file opened in 'w+' mode would be handled (of course a file with a bogus unencrypted filesize in the cache). Can you open the encrypted stream in 'r' mode in parallel to calculate the filesize?

@jknockaert
Copy link
Contributor

@PVince81 Only for nonseekable streams is it necessary to download the full file for the filesize calculation, whereas the right (unencrypted) filesize is only needed for the correct handling of seekable streams. The original encryption implementation worked around this dilemma by handling all encrypted files as nonseekable. But that was a serious limitation; many webdav clients make use of http-range requests and such requests are served many times faster if seekability is supported (this was my initial motivation for redesigning the encryption wrapper in OC6). When redesigning I have been considering to make a stricter distinction between seekable and nonseekable streams in order to optimise their handling in that respect. The problem is that the concept of seekability is not very strictly defined in php. Streams commonly have a seekability attribute, but it is often set incorrectly or even not set at all and I couldn't find any proper reference to it in the official php documentation. It seems even not sure if seekability is a strict property of a stream: in my understanding a stream may refuse one fseek request (return -1) and subsequently succesfully serve another fseek request. Keeping that in mind I decided that the most robust implementation of the encryption wrapper was one that transparently handles all fseek requests and simply return the behaviour of the underlying encrypted stream (even if that may make php go bonkers). And finally keep in mind that this additional download for an unseekable stream only happens once in the edge case where the filesize was set wrongly before.

@schiessle
Copy link
Contributor Author

I'm just wondering about how the edge case of a file opened in 'w+' mode would be handled (of course a file with a bogus unencrypted filesize in the cache). Can you open the encrypted stream in 'r' mode in parallel to calculate the filesize?

The repair steps happens before we actually open the file for the original operation. So it doesn't matter how the file will be opened at the end. If we think the file size is wrong we re-calculate it with the code above and afterwards read it with whatever mode given for the original operation.

@jknockaert
Copy link
Contributor

@schiesbn OK, that's even better than I expected. Sounds like a really robust implementation.

@schiessle schiessle force-pushed the fix_broken_unencrypted_size branch 2 times, most recently from d1c5102 to 4184247 Compare February 23, 2016 10:15
@schiessle
Copy link
Contributor Author

I now moved the verifyUnencryptedSize() call from fopen() to filesize() and getMetaData(). Makes much more sense to fix the size at the point where we try to read it. This way the content-length should always be correct and already the first download should work correctly.

@PVince81 please give it a try.

@schiessle schiessle force-pushed the fix_broken_unencrypted_size branch 4 times, most recently from ad57245 to 94bd327 Compare February 23, 2016 10:30
@jknockaert
Copy link
Contributor

@schiesbn The move sounds logical to me, but I'm not very familiar with the mechanisms in OC calling fopen(), filesize() and getMetaData(). If filesize() is called after fopen() then my question of the handling of 'w+' mode comes back. But other than that I don't see objections. As for the rest of the PR it gets my ascent 👍 Don't forget to port it forward in #22567.

@PVince81
Copy link
Contributor

  • TEST: webdav download repairs size
  • TEST: web UI download repairs size
  • TEST: webdav range request repairs size too
  • TEST: webdav download from seekable ext storage (SFTP)
  • TEST: web UI download from seekable ext storage (SFTP)
  • TEST: webdav download from non-seekable ext storage (S3)
  • TEST: web UI download from non-seekable ext storage (S3)
  • TEST: overwrite file that has wrong size, creates version files with the correct size
  • TEST: sync client

@PVince81
Copy link
Contributor

We'll likely want to backport this like the original PR. @karlitschek @MorrisJobke

Fixes #21751

@karlitschek
Copy link
Contributor

please backport! Thanks a lot!

@PVince81
Copy link
Contributor

Please review, we need this for 9.0 and others (critical).
Thanks.

@jknockaert
Copy link
Contributor

I will take up the review (also for the backports) by tomorrow morning. It's a bit frustrating that I cannot see what exactly changed since my previous review...

) {
// check if we already calculate the unencrypted size for the
// given path to avoid recursions
if (isset($this->fixUnencryptedSizeOf[$this->getFullPath($path)]) === false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recursion shouldn't happen in the first place...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't avoid it. Once we are in fixUnencryptedSizeOf() we need to read and decrypt the file. At some point the encryption modul ask for the metaData of the file, which is valid. But this will send us back to this place and start a endless loop.

That's the price for moving the check from fopen() to getMetaData() and filesize(). But the benefits outweigh the drawback of the additional recursion check.

@schiessle
Copy link
Contributor Author

Thanks for taking the time to review it once more!

It's a bit frustrating that I cannot see what exactly changed since my previous review...

Sorry for that.

Let me point you to the changes:

@jknockaert
Copy link
Contributor

@schiesbn I'm probably a bit late in the process with the observation that there's a fundamental difference between the implementation here and the one in #11659. The original implementation in #11659 operates on the encrypted stream, such to avoid any recursion. Moreover it's far more efficient because it only decrypts the last block to check it's unencrypted size; the other blocks are unencrypted 6126 in length by definition. That makes its implementation robust and efficient. Here I now see that we are operating on the unencrypted stream; hence the recursion and for an unseekable stream we are decrypting it entirely (by just freading through it) even if we don't check the length of the intermediate blocks. Any thoughts on this? Sorry for bringing this up so late; I didn't notice before.

@schiessle
Copy link
Contributor Author

@jknockaert I think you are wrong here.

The original implementation in #11659 operates on the encrypted stream, such to avoid any recursion.

This implementation also works on the encrypted stream. You can see it here: https://github.com/owncloud/core/pull/22579/files#diff-a36b9c0307e0126dd50baa38fb4bb74aR498. $this->storage->fopen() bypass the encryption wrapper, calls fopen on the parent wrapper and gives us the encrypted stream. If we would call $this->fopen() we would get the decrypted stream.

Moreover it's far more efficient because it only decrypts the last block to check it's unencrypted size;

We do the same here.

Compare:
https://github.com/owncloud/core/pull/22579/files#diff-a36b9c0307e0126dd50baa38fb4bb74aR531
with:
https://github.com/owncloud/core/pull/11659/files#diff-751cc04b6d1a85972264f81add5e7fa9R424

It is exactly the same, we fseek forward to the last block. Only if fseek fails we have to read the whole file, again same in both implementations

Here I now see that we are operating on the unencrypted stream; hence the recursion

The recursion happens as soon as we start to decrypt (parts) of the file and this will happen almost always (if $size mod $blockSize !== 0). Because we always have to decrypt and read the last chunk.

@jknockaert
Copy link
Contributor

@schiesbn Thanks for clarifying; sorry for the noise I caused. My second guess would then be that the recursion is initiated here:
https://github.com/owncloud/core/pull/22579/files#diff-a36b9c0307e0126dd50baa38fb4bb74aR551

@schiessle
Copy link
Contributor Author

@jknockaert yes, that's right and that's the last block which we can't avoid

@jknockaert
Copy link
Contributor

@schiesbn OK. I've been digging in the relevant encryption code, but couldn't really trace how the recursion happens.

@jknockaert
Copy link
Contributor

Provided it's on the critical path for 9.0 I would judge that this is PR is good 👍
I would appreciate if a stack trace of the recursion could be posted in this discussion (I currently don't have a test server around) such as to improve my understanding of the dynamics. Perhaps you could just generate an error for this purpose when the recursion is tested for here: https://github.com/owncloud/core/blob/fix_broken_unencrypted_size/lib/private/files/storage/wrapper/encryption.php#R468
As for the backports I have my doubts; see #22626 (comment)

@schiessle
Copy link
Contributor Author

@jknockaert for your information. The recursion happens for example here: https://github.com/owncloud/core/blob/master/apps/encryption/lib/crypto/encryption.php#L193 Which ask for the file cache, because the file cache contain the version: https://github.com/owncloud/core/blob/master/apps/encryption/lib/keymanager.php#L425 This brings us back to getMetaData().

Anyway, wen can't forbid the encryption module to perform certain operations. We need to be able to handle it. That's why the code is needed.

Thanks a lot for your review!

@jknockaert
Copy link
Contributor

@schiesbn Thanks for clarifying. The current implementation will do for now, but I will see if I can fix it without recursion after OC9 is released. I will open a separate PR for that. The recursion protection may well obfuscate other bugs when they are introduced at a later stage.

DeepDiver1975 added a commit that referenced this pull request Feb 25, 2016
Heal unencrypted file sizes at download time (second approach)
@DeepDiver1975 DeepDiver1975 merged commit efc9666 into master Feb 25, 2016
@DeepDiver1975 DeepDiver1975 deleted the fix_broken_unencrypted_size branch February 25, 2016 13:34
@PVince81
Copy link
Contributor

stable8.2: #22626
stable8.1: #22627

@lock
Copy link

lock bot commented Aug 7, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants