Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap cache entries removal in a transaction #16576

Closed
wants to merge 1 commit into from

Conversation

PVince81
Copy link
Contributor

In case there is a rename transaction in progress, the scanner will not
be able to delete the affected entries in case it detects the discrepancy
on disk.

Also fixed scanner to fail gracefully in case an error occurs during the
removal of a cache entry.

Steps to reproduce here: #13391 (comment)
Fixes #13391

This helps preventing the data loss, but might not be the optimal solution in case the race condition happens outside of the DB code. But the DB code seems to be the most likely to cause the cache entries to disappear.

Note: we'll still need to invest some time to handle the failure cases #16445.

Please review @icewind1991 @MorrisJobke @karlitschek @DeepDiver1975 @nickvergessen @LukasReschke

In case there is a rename transaction in progress, the scanner will not
be able to delete the affected entries in case it detects the discrepancy
on disk.

Also fixed scanner to fail gracefully in case an error occurs during the
removal of a cache entry.
@scrutinizer-notifier
Copy link

A new inspection was created.

@@ -464,12 +464,14 @@ public function inCache($file) {
* @param string $file
*/
public function remove($file) {
\OC_DB::beginTransaction();
$entry = $this->get($file);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's ok to have the "get" inside.

@PVince81 PVince81 added this to the 8.1-current milestone May 27, 2015
if ($sourceData['mimetype'] === 'httpd/unix-directory') {
//find all child entries
$sql = 'SELECT `path`, `fileid` FROM `*PREFIX*filecache` WHERE `storage` = ? AND `path` LIKE ?';
$result = \OC_DB::executeAudited($sql, [$sourceStorageId, $sourcePath . '/%']);
$childEntries = $result->fetchAll();
$sourceLength = strlen($sourcePath);
\OC_DB::beginTransaction();
$query = \OC_DB::prepare('UPDATE `*PREFIX*filecache` SET `storage` = ?, `path` = ?, `path_hash` = ? WHERE `fileid` = ?');

foreach ($childEntries as $child) {
$newTargetPath = $targetPath . substr($child['path'], $sourceLength);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing, add sleep(1) here

@DeepDiver1975
Copy link
Member

"Playing" around with transactions is always kind of dangerous - can we analyse the behavior in case of failing commits please?

Furthermore we should think about nested transactions and how the behavior will/can change in case a dbms doesn't support nested transactions (I have honestly no idea about the current situations with our supported dbs - just restoring pre-historic knowledge 🙊 ).

@PVince81
Copy link
Contributor Author

If we had used transactions in the first place, that's probably the way we'd have done it: group the updates that match together within a transaction.

Yes, curious to see how this works with SQLite and co.
This will need extensive testing.

I did my tests on MariaDB and it worked fine there.

@PVince81
Copy link
Contributor Author

In the case where the DELETE fails, the scanner will skip the current entry and continue scanning the other children. (before I added the "catch" block, it would throw a 500 at the client, which isn't nice)

Question would more be if the rename case fails: we might want to rename the file back to its original name if the DB transaction failed.

@PVince81
Copy link
Contributor Author

  • investigate nested transaction behavior with different DBs
  • test with Sqlite
  • test with Mariadb
  • test with Postgresql
  • test with gasp Oracle

Note: this is not really unit-testable because we cannot make both delete and rename happen in parallel to simulate the transaction clash.

@icewind1991
Copy link
Contributor

Looks good 👍

@ghost
Copy link

ghost commented May 27, 2015

Refer to this link for build results (access rights to CI server needed):
https://ci.owncloud.org//job/pull-request-analyser-ng-simple/12545/
🚀 Test PASSed.🚀
chuck

@@ -540,24 +542,24 @@ public function moveFromCache(Cache $sourceCache, $sourcePath, $targetPath) {
list($sourceStorageId, $sourcePath) = $sourceCache->getMoveInfo($sourcePath);
list($targetStorageId, $targetPath) = $this->getMoveInfo($targetPath);

\OC_DB::beginTransaction();
if ($sourceData['mimetype'] === 'httpd/unix-directory') {
//find all child entries
$sql = 'SELECT `path`, `fileid` FROM `*PREFIX*filecache` WHERE `storage` = ? AND `path` LIKE ?';
$result = \OC_DB::executeAudited($sql, [$sourceStorageId, $sourcePath . '/%']);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely unrelated but this goes 💥 such as a lot of other stuff here…

See #16580

@PVince81
Copy link
Contributor Author

We have a unit test that tests removing a file from cache when it disappeared on disk, that will trigger the nested transaction: https://github.com/owncloud/core/blob/master/tests/lib/files/cache/scanner.php#L199
So we'll see...

@PVince81
Copy link
Contributor Author

I realize that this still won't eliminate all possibilities of breakage, only reduce the window further: see #13391 (comment)

@PVince81
Copy link
Contributor Author

Moving to 8.2.

Besides, files locking fixed the issue #13391

@PVince81 PVince81 modified the milestones: 8.2-next, 8.1-current Jun 23, 2015
@MorrisJobke
Copy link
Contributor

Besides, files locking fixed the issue #13391

Is this then really needed? Or should we better move away from the DB transactions?

@PVince81
Copy link
Contributor Author

PVince81 commented Jul 3, 2015

In general I think DB transactions are always a good idea, but this is based on my limited knowledge of regular databases (non-distributed). If transactions are indeed an issue with systems like glusterfs and postgresql in distributed environments, then we might want to move away from them. (best would be to ask a DB expert, maybe @dragotin @DeepDiver1975 ?)

Feel free to close this PR as obsolete then.

@PVince81
Copy link
Contributor Author

PVince81 commented Jul 3, 2015

or @karlitschek

@nickvergessen
Copy link
Contributor

Well some costumers had problems because the database does not have time to clean up between operations because a transaction is used. Therefor the db gets slower until actions slow down in the night, were it can clean up.

@MorrisJobke
Copy link
Contributor

And there i the problem, that it is more likely to have conflicting transactions in a distributed DB if we have a high load on one table with common entries (parent folders up to the root). Propagating the etag up the tree will cause a problem very likely when a file is updated in a share and a file is updated in my own folder in parallel.

@nickvergessen
Copy link
Contributor

Propagating the etag up the tree will cause a problem very likely when a file is updated in a share and a file is updated in my own folder in parallel.

Well the only thing we need is that it changes. This will never be blocked. the only question is which of the etags you have at the end. Doesn't really matter, it just must not be the same as in the beginning?

@nickvergessen
Copy link
Contributor

And as said before the files are locked in the future, so you will not be able to change them when someone else is changing subitems.

@PVince81
Copy link
Contributor Author

PVince81 commented Jul 3, 2015

Ok, then let's close this.

@PVince81 PVince81 closed this Jul 3, 2015
@PVince81 PVince81 deleted the fix-datalossonrename branch July 3, 2015 09:29
@lock lock bot locked as resolved and limited conversation to collaborators Aug 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data loss on rename of a 49 GB folder
7 participants