Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse known fileids and cache data in the scanner #10993

Merged
merged 4 commits into from Feb 12, 2015

Conversation

icewind1991
Copy link
Contributor

Saves having to do multiple getId calls for every file and folder being scanned.

Saves about 50% 80% 90% of the database class made during a filescan

cc @PVince81 @DeepDiver1975 @th3fallen

@icewind1991 icewind1991 changed the title Reuse known fileids in the scanner Reuse known fileids and cache data in the scanner Sep 10, 2014
@icewind1991
Copy link
Contributor Author

Found an even bigger improvement, now the cache data itself is being reused, bringing the optimal case (cache is up to date) to 3 queries per folder, down from 5 per folder and 3 per file.

In my local environment this brings scanning the ~44k files down to ~32k queries and 190sec compared to the original 160k queries and 374s

@icewind1991
Copy link
Contributor Author

Another round of optimization later and we're down to 1 query per folder in the best case, resulting in a total of 14k in my test cases vs 160k before optimazation

@PVince81
Copy link
Contributor

It sounds like a cool improvement, but I'm worried that we will run into situations like #10954 (comment) more often if we cache too much.

Or we need a way to auto-clean older cache entries while the scanning is in progress.

@PVince81
Copy link
Contributor

If you have the folder structure:

/a
/a/subdir
/b

As soon as scanning of /a i finished, I guess it's an opportunity to delete everything in the cache related to /a ?

@icewind1991
Copy link
Contributor Author

This actually uses (slightly) less memory as before, this re-uses objects which were already kept in memory instead of requesting another copy of the object from the database

@icewind1991
Copy link
Contributor Author

@DeepDiver1975 @PVince81 @th3fallen can this be merged?

@@ -104,9 +104,11 @@ public function getData($path) {
*
* @param string $file
* @param int $reuseExisting
* @param int $parentId
* @param array |null $cacheData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cacheData of the parent or the file to be scanned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added documentation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm - I don't see this clarified yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, added it now

@PVince81
Copy link
Contributor

Maybe after the release, when there will be more time to review OC 8 stuff...

@DeepDiver1975
Copy link
Member

fixes #8548

@icewind1991 icewind1991 force-pushed the scanner-reuse-fileid branch 3 times, most recently from 75be54a to 268c5cb Compare October 17, 2014 11:39
$folderId = $this->cache->getId($path);
}
$existingChildren = $this->getExistingChildren($folderId);
$newChildren = $this->getNewChildren($path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this addition ? Does it mean we previously rescanned all children, not only the new ones ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$newChildren here means the children as reported by the storage, as opposed to the children reported by the cache ($existingChildren)

@PVince81
Copy link
Contributor

The code looks good but I don't understand it 100%.

@@ -194,6 +194,9 @@ public function getFolderContentsById($fileId) {
$file['size'] = $file['unencrypted_size'];
}
$file['permissions'] = (int)$file['permissions'];
$file['mtime'] = (int)$file['mtime'];
$file['storage_mtime'] = (int)$file['storage_mtime'];
$file['size'] = (int)$file['size'];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this lead to problems on 32bit platforms?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@icewind1991 what about this comment ?

@MorrisJobke
Copy link
Contributor

@icewind1991 Want to rebase this and made it ready for review or is this something for after 8.0?

@icewind1991
Copy link
Contributor Author

comparison

@DeepDiver1975
Copy link
Member

Test Name   Duration    Age
 Test_Share.testShareWithUserAndUserIsExcludedFromResharing
 Stack Trace

Test_Share::testShareWithUserAndUserIsExcludedFromResharing
Failed asserting that user 4 is excluded from re-sharing
Failed asserting that 31 is identical to 15.

/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/tests/lib/share/share.php:451

    1.440918    1
 Test_Util::testIsSharingDisabledForUser.testIsSharingDisabledForUser with data set #5
 Stack Trace

Test_Util::testIsSharingDisabledForUser with data set #5 (array('g1', 'g2', 'g3'), array('g1', 'g2'), array('g1', 'g2'), true)
Failed asserting that false matches expected true.

/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/tests/lib/util.php:305

    0.056838    1
 Test_Util::testIsSharingDisabledForUser.testIsSharingDisabledForUser with data set #6
 Stack Trace

Test_Util::testIsSharingDisabledForUser with data set #6 (array('g1', 'g2', 'g3'), array('g1', 'g2'), array('g1', 'g2', 'g3'), true)
Exception: The username is already being used

/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/lib/private/user/manager.php:256
/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/lib/private/user.php:186
/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/tests/lib/util.php:289

    0.012676    1
 OCA\Files_Encryption\Tests\Webdav.testWebdavPUT
 Stack Trace

OCA\Files_Encryption\Tests\Webdav::testWebdavPUT
Undefined index: fileid

/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/lib/private/files/cache/scanner.php:146
/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/lib/private/files/cache/scanner.php:235
/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/lib/private/files/cache/updater.php:54
/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/lib/private/files/view.php:536
/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/lib/private/connector/sabre/file.php:137
/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/lib/private/connector/sabre/directory.php:84
/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/3rdparty/sabre/dav/lib/Sabre/DAV/Server.php:1647
/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/3rdparty/sabre/dav/lib/Sabre/DAV/Server.php:900
/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/3rdparty/sabre/dav/lib/Sabre/DAV/Server.php:474
/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/3rdparty/sabre/dav/lib/Sabre/DAV/Server.php:214
/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/apps/files_encryption/tests/webdav.php:257
/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/apps/files_encryption/tests/webdav.php:117

    2.032317    1
 Test_Files_Sharing_Api.testSharePermissions
 Stack Trace

Test_Files_Sharing_Api::testSharePermissions
Failed asserting that true is false.

/var/jenkins/workspace/pull-request-analyser-ng-simple@3/label/SLAVE/apps/files_sharing/tests/api.php:243

@PVince81 PVince81 added this to the 8.1-next milestone Feb 2, 2015
@PVince81
Copy link
Contributor

PVince81 commented Feb 9, 2015

Please rebase + squash and fix the unit tests, if applicable 😄

@icewind1991 icewind1991 force-pushed the scanner-reuse-fileid branch 2 times, most recently from 3c1b6e6 to ddcd9ce Compare February 10, 2015 15:06
@icewind1991
Copy link
Contributor Author

@DeepDiver1975 @PVince81 @MorrisJobke merge?

@PVince81
Copy link
Contributor

yunosquash

@icewind1991
Copy link
Contributor Author

squashed and fixed the 32bit int issue

@scrutinizer-notifier
Copy link

The inspection completed: 5 new issues, 2 updated code elements

@ghost
Copy link

ghost commented Feb 11, 2015

Refer to this link for build results (access rights to CI server needed):
https://ci.owncloud.org//job/pull-request-analyser-ng-simple/9196/
Test PASSed.

@PVince81
Copy link
Contributor

Tested against SFTP ext storage with remote changes, scanning still works correctly 👍

@MorrisJobke
Copy link
Contributor

Tested 👍

MorrisJobke added a commit that referenced this pull request Feb 12, 2015
Reuse known fileids and cache data in the scanner
@MorrisJobke MorrisJobke merged commit f4182d2 into master Feb 12, 2015
@MorrisJobke MorrisJobke deleted the scanner-reuse-fileid branch February 12, 2015 09:53
@PVince81
Copy link
Contributor

Regression found: #14169

@lock lock bot locked as resolved and limited conversation to collaborators Aug 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants