-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Heal unencrypted file sizes at download time (second approach) #22579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
By analyzing the blame information on this pull request, we identified @DeepDiver1975, @LukasReschke and @nickvergessen to be potential reviewers |
| * @throws DecryptionFailedException | ||
| */ | ||
| public function decrypt($data, $position = 0) { | ||
| error_log ("position: " .$position); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm?
cea7067 to
e72c9c8
Compare
|
Too bad this still requires two downloads because content-length is already set. Maybe if the repair is done inside of Another idea is to have the Sabre Node re-read the metadata after doing a fopen to have the correct size. Drawback is that this would do another SQL query to retrieve the same data, especially when encryption is disabled. And this wouldn't cover web UI download yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to count it? Why not check it like this:
if ($data[$blockSize - 1] ) {
$count = $blockSize;
} else {
$count = strlen($data);
}This will then only count the last block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do that if it is faster. Note that there is no guarantee that when asking for $blocksize an encrypted stream will return $blockSize even in the middle of an (unseekable) stream. But your code suggestion would work regardless in the middle of a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MorrisJobke I also thought about something like this. But I decided against it. Remember, this is a repair function. This will not be called regularly but only for broken files which doesn't happen all the time, otherwise we will have to fix something else. 😉 Therefore I prefer the easier, less error-prone code. Such optimizations make sense for code which gets called regularly but not for a repair step.
|
@schiesbn This is definitely an implementation of what I had in mind. It is in my opinion consistent with the encryption wrapper design and it should be easy to maintain. I added some minor comments to the code. I'm just wondering about how the edge case of a file opened in 'w+' mode would be handled (of course a file with a bogus unencrypted filesize in the cache). Can you open the encrypted stream in 'r' mode in parallel to calculate the filesize? |
|
@PVince81 Only for nonseekable streams is it necessary to download the full file for the filesize calculation, whereas the right (unencrypted) filesize is only needed for the correct handling of seekable streams. The original encryption implementation worked around this dilemma by handling all encrypted files as nonseekable. But that was a serious limitation; many webdav clients make use of http-range requests and such requests are served many times faster if seekability is supported (this was my initial motivation for redesigning the encryption wrapper in OC6). When redesigning I have been considering to make a stricter distinction between seekable and nonseekable streams in order to optimise their handling in that respect. The problem is that the concept of seekability is not very strictly defined in php. Streams commonly have a seekability attribute, but it is often set incorrectly or even not set at all and I couldn't find any proper reference to it in the official php documentation. It seems even not sure if seekability is a strict property of a stream: in my understanding a stream may refuse one fseek request (return -1) and subsequently succesfully serve another fseek request. Keeping that in mind I decided that the most robust implementation of the encryption wrapper was one that transparently handles all fseek requests and simply return the behaviour of the underlying encrypted stream (even if that may make php go bonkers). And finally keep in mind that this additional download for an unseekable stream only happens once in the edge case where the filesize was set wrongly before. |
The repair steps happens before we actually open the file for the original operation. So it doesn't matter how the file will be opened at the end. If we think the file size is wrong we re-calculate it with the code above and afterwards read it with whatever mode given for the original operation. |
|
@schiesbn OK, that's even better than I expected. Sounds like a really robust implementation. |
d1c5102 to
4184247
Compare
|
I now moved the @PVince81 please give it a try. |
ad57245 to
94bd327
Compare
|
@schiesbn The move sounds logical to me, but I'm not very familiar with the mechanisms in OC calling fopen(), filesize() and getMetaData(). If filesize() is called after fopen() then my question of the handling of 'w+' mode comes back. But other than that I don't see objections. As for the rest of the PR it gets my ascent 👍 Don't forget to port it forward in #22567. |
|
|
We'll likely want to backport this like the original PR. @karlitschek @MorrisJobke Fixes #21751 |
|
please backport! Thanks a lot! |
|
Please review, we need this for 9.0 and others (critical). |
|
I will take up the review (also for the backports) by tomorrow morning. It's a bit frustrating that I cannot see what exactly changed since my previous review... |
| ) { | ||
| // check if we already calculate the unencrypted size for the | ||
| // given path to avoid recursions | ||
| if (isset($this->fixUnencryptedSizeOf[$this->getFullPath($path)]) === false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recursion shouldn't happen in the first place...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't avoid it. Once we are in fixUnencryptedSizeOf() we need to read and decrypt the file. At some point the encryption modul ask for the metaData of the file, which is valid. But this will send us back to this place and start a endless loop.
That's the price for moving the check from fopen() to getMetaData() and filesize(). But the benefits outweigh the drawback of the additional recursion check.
|
Thanks for taking the time to review it once more!
Sorry for that. Let me point you to the changes:
|
|
@schiesbn I'm probably a bit late in the process with the observation that there's a fundamental difference between the implementation here and the one in #11659. The original implementation in #11659 operates on the encrypted stream, such to avoid any recursion. Moreover it's far more efficient because it only decrypts the last block to check it's unencrypted size; the other blocks are unencrypted 6126 in length by definition. That makes its implementation robust and efficient. Here I now see that we are operating on the unencrypted stream; hence the recursion and for an unseekable stream we are decrypting it entirely (by just freading through it) even if we don't check the length of the intermediate blocks. Any thoughts on this? Sorry for bringing this up so late; I didn't notice before. |
|
@jknockaert I think you are wrong here.
This implementation also works on the encrypted stream. You can see it here: https://github.com/owncloud/core/pull/22579/files#diff-a36b9c0307e0126dd50baa38fb4bb74aR498.
We do the same here. Compare: It is exactly the same, we fseek forward to the last block. Only if fseek fails we have to read the whole file, again same in both implementations
The recursion happens as soon as we start to decrypt (parts) of the file and this will happen almost always (if $size mod $blockSize !== 0). Because we always have to decrypt and read the last chunk. |
|
@schiesbn Thanks for clarifying; sorry for the noise I caused. My second guess would then be that the recursion is initiated here: |
|
@jknockaert yes, that's right and that's the last block which we can't avoid |
|
@schiesbn OK. I've been digging in the relevant encryption code, but couldn't really trace how the recursion happens. |
|
Provided it's on the critical path for 9.0 I would judge that this is PR is good 👍 |
|
@jknockaert for your information. The recursion happens for example here: https://github.com/owncloud/core/blob/master/apps/encryption/lib/crypto/encryption.php#L193 Which ask for the file cache, because the file cache contain the version: https://github.com/owncloud/core/blob/master/apps/encryption/lib/keymanager.php#L425 This brings us back to getMetaData(). Anyway, wen can't forbid the encryption module to perform certain operations. We need to be able to handle it. That's why the code is needed. Thanks a lot for your review! |
|
@schiesbn Thanks for clarifying. The current implementation will do for now, but I will see if I can fix it without recursion after OC9 is released. I will open a separate PR for that. The recursion protection may well obfuscate other bugs when they are introduced at a later stage. |
Heal unencrypted file sizes at download time (second approach)
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
recalculate unencrypted size if we assume that the size stored in the db is not correct
a alternative approach to #22567
This approach re-uses the file size calculation by @jknockaert: #11659
I was under the impression that this is no longer possible with encryption 2.0. But in fact it works quite nicely if we place it directly in the storage wrapper.
@jknockaert @PVince81 please have a look. What do you think?