-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap cache entries removal in a transaction #16576
Conversation
In case there is a rename transaction in progress, the scanner will not be able to delete the affected entries in case it detects the discrepancy on disk. Also fixed scanner to fail gracefully in case an error occurs during the removal of a cache entry.
A new inspection was created. |
@@ -464,12 +464,14 @@ public function inCache($file) { | |||
* @param string $file | |||
*/ | |||
public function remove($file) { | |||
\OC_DB::beginTransaction(); | |||
$entry = $this->get($file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's ok to have the "get" inside.
if ($sourceData['mimetype'] === 'httpd/unix-directory') { | ||
//find all child entries | ||
$sql = 'SELECT `path`, `fileid` FROM `*PREFIX*filecache` WHERE `storage` = ? AND `path` LIKE ?'; | ||
$result = \OC_DB::executeAudited($sql, [$sourceStorageId, $sourcePath . '/%']); | ||
$childEntries = $result->fetchAll(); | ||
$sourceLength = strlen($sourcePath); | ||
\OC_DB::beginTransaction(); | ||
$query = \OC_DB::prepare('UPDATE `*PREFIX*filecache` SET `storage` = ?, `path` = ?, `path_hash` = ? WHERE `fileid` = ?'); | ||
|
||
foreach ($childEntries as $child) { | ||
$newTargetPath = $targetPath . substr($child['path'], $sourceLength); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When testing, add sleep(1)
here
"Playing" around with transactions is always kind of dangerous - can we analyse the behavior in case of failing commits please? Furthermore we should think about nested transactions and how the behavior will/can change in case a dbms doesn't support nested transactions (I have honestly no idea about the current situations with our supported dbs - just restoring pre-historic knowledge 🙊 ). |
If we had used transactions in the first place, that's probably the way we'd have done it: group the updates that match together within a transaction. Yes, curious to see how this works with SQLite and co. I did my tests on MariaDB and it worked fine there. |
In the case where the DELETE fails, the scanner will skip the current entry and continue scanning the other children. (before I added the "catch" block, it would throw a 500 at the client, which isn't nice) Question would more be if the rename case fails: we might want to rename the file back to its original name if the DB transaction failed. |
Note: this is not really unit-testable because we cannot make both delete and rename happen in parallel to simulate the transaction clash. |
Looks good 👍 |
Refer to this link for build results (access rights to CI server needed): |
@@ -540,24 +542,24 @@ public function moveFromCache(Cache $sourceCache, $sourcePath, $targetPath) { | |||
list($sourceStorageId, $sourcePath) = $sourceCache->getMoveInfo($sourcePath); | |||
list($targetStorageId, $targetPath) = $this->getMoveInfo($targetPath); | |||
|
|||
\OC_DB::beginTransaction(); | |||
if ($sourceData['mimetype'] === 'httpd/unix-directory') { | |||
//find all child entries | |||
$sql = 'SELECT `path`, `fileid` FROM `*PREFIX*filecache` WHERE `storage` = ? AND `path` LIKE ?'; | |||
$result = \OC_DB::executeAudited($sql, [$sourceStorageId, $sourcePath . '/%']); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely unrelated but this goes 💥 such as a lot of other stuff here…
See #16580
We have a unit test that tests removing a file from cache when it disappeared on disk, that will trigger the nested transaction: https://github.com/owncloud/core/blob/master/tests/lib/files/cache/scanner.php#L199 |
I realize that this still won't eliminate all possibilities of breakage, only reduce the window further: see #13391 (comment) |
Moving to 8.2. Besides, files locking fixed the issue #13391 |
Is this then really needed? Or should we better move away from the DB transactions? |
In general I think DB transactions are always a good idea, but this is based on my limited knowledge of regular databases (non-distributed). If transactions are indeed an issue with systems like glusterfs and postgresql in distributed environments, then we might want to move away from them. (best would be to ask a DB expert, maybe @dragotin @DeepDiver1975 ?) Feel free to close this PR as obsolete then. |
or @karlitschek |
Well some costumers had problems because the database does not have time to clean up between operations because a transaction is used. Therefor the db gets slower until actions slow down in the night, were it can clean up. |
And there i the problem, that it is more likely to have conflicting transactions in a distributed DB if we have a high load on one table with common entries (parent folders up to the root). Propagating the etag up the tree will cause a problem very likely when a file is updated in a share and a file is updated in my own folder in parallel. |
Well the only thing we need is that it changes. This will never be blocked. the only question is which of the etags you have at the end. Doesn't really matter, it just must not be the same as in the beginning? |
And as said before the files are locked in the future, so you will not be able to change them when someone else is changing subitems. |
Ok, then let's close this. |
In case there is a rename transaction in progress, the scanner will not
be able to delete the affected entries in case it detects the discrepancy
on disk.
Also fixed scanner to fail gracefully in case an error occurs during the
removal of a cache entry.
Steps to reproduce here: #13391 (comment)
Fixes #13391
This helps preventing the data loss, but might not be the optimal solution in case the race condition happens outside of the DB code. But the DB code seems to be the most likely to cause the cache entries to disappear.
Note: we'll still need to invest some time to handle the failure cases #16445.
Please review @icewind1991 @MorrisJobke @karlitschek @DeepDiver1975 @nickvergessen @LukasReschke