New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce pagination in files-filter report #26507

Merged
merged 3 commits into from Mar 22, 2017

Conversation

Projects
None yet
4 participants
@DeepDiver1975
Member

DeepDiver1975 commented Oct 28, 2016

Description

Allow pagination when listing files

ToDos

  • integration tests
  • add cache for file list
  • integrate on client side

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@DeepDiver1975 DeepDiver1975 added this to the 9.2 milestone Oct 28, 2016

@mention-bot

This comment has been minimized.

Show comment
Hide comment
@mention-bot

mention-bot Oct 28, 2016

@DeepDiver1975, thanks for your PR! By analyzing the history of the files in this pull request, we identified @PVince81 and @nickvergessen to be potential reviewers.

mention-bot commented Oct 28, 2016

@DeepDiver1975, thanks for your PR! By analyzing the history of the files in this pull request, we identified @PVince81 and @nickvergessen to be potential reviewers.

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81
Member

PVince81 commented Nov 9, 2016

for #13915

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 6, 2016

Member

add cache for file list

memcache ? 😦

Member

PVince81 commented Dec 6, 2016

add cache for file list

memcache ? 😦

$requestedProps[] = $propVal['name'];
}
$requestedProps = $report->properties;
$filterRules = $report->filters;

This comment has been minimized.

@PVince81

PVince81 Dec 6, 2016

Member

really ? did the format change ?

@PVince81

PVince81 Dec 6, 2016

Member

really ? did the format change ?

use Sabre\Xml\Reader;
use Sabre\Xml\XmlDeserializable;
class FilterRequest implements XmlDeserializable {

This comment has been minimized.

@PVince81

PVince81 Dec 6, 2016

Member

oohhhh, clever

@PVince81

PVince81 Dec 6, 2016

Member

oohhhh, clever

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 6, 2016

Member

Code looks good so far

Member

PVince81 commented Dec 6, 2016

Code looks good so far

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 19, 2016

Member

Test report report.xml:

<?xml version="1.0" encoding="utf-8" ?>
<oc:filter-files xmlns:a="DAV:" xmlns:oc="http://owncloud.org/ns" >
        <a:prop>
                <a:getetag/>
        </a:prop>
        <oc:limit page="0" size="10" />
</oc:filter-files>
% curl -u admin:admin -X REPORT -H "Content-Type: text/xml" --data-binary "@report.xml" 'http://localhost/owncloud/remote.php/dav/files/admin/'
Member

PVince81 commented Dec 19, 2016

Test report report.xml:

<?xml version="1.0" encoding="utf-8" ?>
<oc:filter-files xmlns:a="DAV:" xmlns:oc="http://owncloud.org/ns" >
        <a:prop>
                <a:getetag/>
        </a:prop>
        <oc:limit page="0" size="10" />
</oc:filter-files>
% curl -u admin:admin -X REPORT -H "Content-Type: text/xml" --data-binary "@report.xml" 'http://localhost/owncloud/remote.php/dav/files/admin/'
@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 19, 2016

Member

Some concerns: the regular filter report returns a flat list of files from all folders. This flat list can be paginated in this report, this is fine.

However we will also need a paginated call for folder-specific contents, without recursing. Should we make a separate report time for this maybe ? Or add a property <oc:recursive /> to make it a flat list, and by default only return for the current folder ?

Member

PVince81 commented Dec 19, 2016

Some concerns: the regular filter report returns a flat list of files from all folders. This flat list can be paginated in this report, this is fine.

However we will also need a paginated call for folder-specific contents, without recursing. Should we make a separate report time for this maybe ? Or add a property <oc:recursive /> to make it a flat list, and by default only return for the current folder ?

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 19, 2016

Member

@DeepDiver1975 I've rebased this.

Also I discovered today that Sabre has a shorter way of compiling results from multiple nodes using getPropertiesForMultiplePaths. Since we're touching this report anyway, I added it here f4aa5d2

There will still be a performance issue here because we need to resolve every node individually.
Ideally for pagination we might already know when we're fetching from a single folder, so this fetching should be optimizable. Might be doable using the IMultiget interface which getPropertiesForMultiplePaths uses internally: https://github.com/fruux/sabre-dav/blob/master/lib/DAV/IMultiGet.php#L34. When implementing this, we could sort the results by folder and optimize fetching props of specific grouped folders.

Side note: if we ever implement Webdav sync it also uses IMultiGet 😉

Member

PVince81 commented Dec 19, 2016

@DeepDiver1975 I've rebased this.

Also I discovered today that Sabre has a shorter way of compiling results from multiple nodes using getPropertiesForMultiplePaths. Since we're touching this report anyway, I added it here f4aa5d2

There will still be a performance issue here because we need to resolve every node individually.
Ideally for pagination we might already know when we're fetching from a single folder, so this fetching should be optimizable. Might be doable using the IMultiget interface which getPropertiesForMultiplePaths uses internally: https://github.com/fruux/sabre-dav/blob/master/lib/DAV/IMultiGet.php#L34. When implementing this, we could sort the results by folder and optimize fetching props of specific grouped folders.

Side note: if we ever implement Webdav sync it also uses IMultiGet 😉

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 19, 2016

Member
Member

PVince81 commented Dec 19, 2016

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 19, 2016

Member

Arghh, no, not good. getPropertiesForMultiplePaths would redo a getChild() on the parent nodes because we haven't created any Sabre node yet for all the results.

I suspect that the old implementation also suffered from this. Since we have the files API node already, we should already be able to wrap it (it's a FileInfo after all) into a Sabre node and treat it directly as a result. No need for DB refetching...

I'll revert my commit because it will degrade the performance. From what I see we already had created the matching Sabre File and Directory in findNodesByFileIds.

More about this in a minute after the revert...

Member

PVince81 commented Dec 19, 2016

Arghh, no, not good. getPropertiesForMultiplePaths would redo a getChild() on the parent nodes because we haven't created any Sabre node yet for all the results.

I suspect that the old implementation also suffered from this. Since we have the files API node already, we should already be able to wrap it (it's a FileInfo after all) into a Sabre node and treat it directly as a result. No need for DB refetching...

I'll revert my commit because it will degrade the performance. From what I see we already had created the matching Sabre File and Directory in findNodesByFileIds.

More about this in a minute after the revert...

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 19, 2016

Member

So, I've reverted the commit instead of deleting it, in case we still want it. (Squashing will kill it later).

Now looking at findNodesByFileIds(): this has BAD performance because it will resolve every fileid using Folder::getById(). Now for arbitrary results like "/test/x1", "/test/sub/x2", "/abc/def" it makes sense as we're not in the same folder. This was valid for the "filter by tags" and "filter by favorite" feature in which results are usually not inside the same folder.

However if we want REPORT to be used for pagination on simple folders, then we need a different approach there. Usually calling getChildren() on the parent is more efficient as it does a single DB query for getDirectoryContents().

So to solve the above question we really need to discuss "recursive / flattened / arbitrary results" vs "only show results from the current folder".

One idea would be to provide one generic REPORT type that only does a PROPFIND on the files endpoints (or even any part of the tree!) and paginates whatever the PROPFIND / getChildren() returns. All results are children of the current node.

Then we provide another REPORT, which is the one from this PR, to really do search-like queries. In such queries the results are not limited to the current folder, so we have to use getById() for every result.

Or a hybrid variant is to have a single REPORT type but with a oc:recursive property. When unset, we'd resort to a simple getChildren() + post-filter on the results. When set, we'd do the other approach that gets the file ids of the results first, then resolve then. Note that both implementations are completely different.

@DeepDiver1975 thoughts ?

Member

PVince81 commented Dec 19, 2016

So, I've reverted the commit instead of deleting it, in case we still want it. (Squashing will kill it later).

Now looking at findNodesByFileIds(): this has BAD performance because it will resolve every fileid using Folder::getById(). Now for arbitrary results like "/test/x1", "/test/sub/x2", "/abc/def" it makes sense as we're not in the same folder. This was valid for the "filter by tags" and "filter by favorite" feature in which results are usually not inside the same folder.

However if we want REPORT to be used for pagination on simple folders, then we need a different approach there. Usually calling getChildren() on the parent is more efficient as it does a single DB query for getDirectoryContents().

So to solve the above question we really need to discuss "recursive / flattened / arbitrary results" vs "only show results from the current folder".

One idea would be to provide one generic REPORT type that only does a PROPFIND on the files endpoints (or even any part of the tree!) and paginates whatever the PROPFIND / getChildren() returns. All results are children of the current node.

Then we provide another REPORT, which is the one from this PR, to really do search-like queries. In such queries the results are not limited to the current folder, so we have to use getById() for every result.

Or a hybrid variant is to have a single REPORT type but with a oc:recursive property. When unset, we'd resort to a simple getChildren() + post-filter on the results. When set, we'd do the other approach that gets the file ids of the results first, then resolve then. Note that both implementations are completely different.

@DeepDiver1975 thoughts ?

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 19, 2016

Member

I wrote down some ideas about implementing a generic pagination report that would work for every PROPFIND-enabled endpoint: #26840

Member

PVince81 commented Dec 19, 2016

I wrote down some ideas about implementing a generic pagination report that would work for every PROPFIND-enabled endpoint: #26840

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 19, 2016

Member

Regarding recursive vs children, I think we could even call them differently:

  • recursive: is more a SEARCH operation because you're searching for all nodes that match a specific criteria in all subcollections. So the approach is more like asking an index (if we had one) to give results in many folders and then compile these results in one result set.
  • children: is more of a filter where you basically just get the contents of the current collection (children) and then apply a filter on their properties. All data is already there within the current collection, so no need for an index, just need to remove the nodes from the result that do not match the filter criteria.
Member

PVince81 commented Dec 19, 2016

Regarding recursive vs children, I think we could even call them differently:

  • recursive: is more a SEARCH operation because you're searching for all nodes that match a specific criteria in all subcollections. So the approach is more like asking an index (if we had one) to give results in many folders and then compile these results in one result set.
  • children: is more of a filter where you basically just get the contents of the current collection (children) and then apply a filter on their properties. All data is already there within the current collection, so no need for an index, just need to remove the nodes from the result that do not match the filter criteria.
@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 19, 2016

Member

So for "recursive" vs "flat" the answer from @DeepDiver1975 was using the "Depth" header.
Which means that to search on a single collection one specifies "Depth: 1" and in all subcollections "Depth: infinite".

Member

PVince81 commented Dec 19, 2016

So for "recursive" vs "flat" the answer from @DeepDiver1975 was using the "Depth" header.
Which means that to search on a single collection one specifies "Depth: 1" and in all subcollections "Depth: infinite".

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Feb 28, 2017

Member

Rebased.

Also I found a way to simplify the response code to use a more Sabr-y way to do it: c355695

All discovered while working on the customgroups REPORT: owncloud/customgroups#27.

Note: in the customgroups pagination I used a different format for the search pattern + pagination:

<?xml version="1.0" encoding="utf-8" ?>
<oc:search-query xmlns:a="DAV:" xmlns:oc="http://owncloud.org/ns">
        <a:prop>
                <oc:role></oc:role>
                <oc:display-name></oc:display-name>
        </a:prop>
        <oc:search>
                <oc:pattern>us</oc:pattern>
                <oc:limit>5</oc:limit>
                <oc:offset>0</oc:offset>
        </oc:search>
</oc:search-query>
Member

PVince81 commented Feb 28, 2017

Rebased.

Also I found a way to simplify the response code to use a more Sabr-y way to do it: c355695

All discovered while working on the customgroups REPORT: owncloud/customgroups#27.

Note: in the customgroups pagination I used a different format for the search pattern + pagination:

<?xml version="1.0" encoding="utf-8" ?>
<oc:search-query xmlns:a="DAV:" xmlns:oc="http://owncloud.org/ns">
        <a:prop>
                <oc:role></oc:role>
                <oc:display-name></oc:display-name>
        </a:prop>
        <oc:search>
                <oc:pattern>us</oc:pattern>
                <oc:limit>5</oc:limit>
                <oc:offset>0</oc:offset>
        </oc:search>
</oc:search-query>
@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Mar 15, 2017

Member

How about we finish this without the cache ? It will be crude but at least it will make it possible for clients to at least request the first page with limit.

Member

PVince81 commented Mar 15, 2017

How about we finish this without the cache ? It will be crude but at least it will make it possible for clients to at least request the first page with limit.

DeepDiver1975 and others added some commits Oct 28, 2016

Use a more API-style way to generate REPORT PROPFIND
This will also properly return 404 properties
@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Mar 17, 2017

Member

Rebased and pushed some changes:

  • pagination (offset+limit) now works also for tags/favorite
  • disabled case where no filter is specified because getChildren() was not enough and would only search one level deep. Actually we need to use Node::search() but when I tried I noticed that it was missing attributes in its FileInfo properties (like owner) which makes it currently unusable. Something to dig into in the future (<= @DeepDiver1975)
  • added unit tests

Next up: integration tests for pagination when filtering.

Member

PVince81 commented Mar 17, 2017

Rebased and pushed some changes:

  • pagination (offset+limit) now works also for tags/favorite
  • disabled case where no filter is specified because getChildren() was not enough and would only search one level deep. Actually we need to use Node::search() but when I tried I noticed that it was missing attributes in its FileInfo properties (like owner) which makes it currently unusable. Something to dig into in the future (<= @DeepDiver1975)
  • added unit tests

Next up: integration tests for pagination when filtering.

@PVince81 PVince81 changed the title from [WIP] Introduce pagination in files-filter report to Introduce pagination in files-filter report Mar 20, 2017

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Mar 20, 2017

Member
21:23:41     /var/lib/jenkins/workspace/owncloud-core_core_PR-26507-XBSZ3PUQKGNP3NP62UQHR4G7JKGONNKXFRY3NXEAITQYFZVK5JDQ/tests/integration/features/favorites.feature:81
21:23:41     /var/lib/jenkins/workspace/owncloud-core_core_PR-26507-XBSZ3PUQKGNP3NP62UQHR4G7JKGONNKXFRY3NXEAITQYFZVK5JDQ/tests/integration/features/favorites.feature:93
Member

PVince81 commented Mar 20, 2017

21:23:41     /var/lib/jenkins/workspace/owncloud-core_core_PR-26507-XBSZ3PUQKGNP3NP62UQHR4G7JKGONNKXFRY3NXEAITQYFZVK5JDQ/tests/integration/features/favorites.feature:81
21:23:41     /var/lib/jenkins/workspace/owncloud-core_core_PR-26507-XBSZ3PUQKGNP3NP62UQHR4G7JKGONNKXFRY3NXEAITQYFZVK5JDQ/tests/integration/features/favorites.feature:93

@PVince81 PVince81 assigned PVince81 and unassigned DeepDiver1975 Mar 20, 2017

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Mar 20, 2017

Member

Fixed integration tests and also added a new integration test for pagination.

@DeepDiver1975 @jvillafanez please review

Member

PVince81 commented Mar 20, 2017

Fixed integration tests and also added a new integration test for pagination.

@DeepDiver1975 @jvillafanez please review

Show outdated Hide outdated apps/dav/lib/Connector/Sabre/FilesReportPlugin.php Outdated
Show outdated Hide outdated apps/dav/lib/Connector/Sabre/FilesReportPlugin.php Outdated
}
}
$obj = new self();

This comment has been minimized.

@jvillafanez

jvillafanez Mar 21, 2017

Member

Any reason to use self instead of explicitly calling the current class?

@jvillafanez

jvillafanez Mar 21, 2017

Member

Any reason to use self instead of explicitly calling the current class?

This comment has been minimized.

@PVince81

PVince81 Mar 21, 2017

Member

Written by @DeepDiver1975 and likely copied from other Sabre deserializer classes where self was used. More convenient than rewriting the current class name I guess.

@PVince81

PVince81 Mar 21, 2017

Member

Written by @DeepDiver1975 and likely copied from other Sabre deserializer classes where self was used. More convenient than rewriting the current class name I guess.

This comment has been minimized.

@jvillafanez

jvillafanez Mar 22, 2017

Member

If it was copied from Sabre, i guess it's fine. Don't like it, but ok.

@jvillafanez

jvillafanez Mar 22, 2017

Member

If it was copied from Sabre, i guess it's fine. Don't like it, but ok.

' . $filterRules . '
</oc:filter-rules>
</oc:filter-files>';
<oc:filter-files xmlns:a="DAV:" xmlns:oc="http://owncloud.org/ns" >

This comment has been minimized.

@jvillafanez

jvillafanez Mar 21, 2017

Member

We have to find a better way to generate this.

@jvillafanez

jvillafanez Mar 21, 2017

Member

We have to find a better way to generate this.

This comment has been minimized.

@PVince81

PVince81 Mar 21, 2017

Member

Agreed. I'll let @owncloud/qa go wild to improve this separately.

@PVince81

PVince81 Mar 21, 2017

Member

Agreed. I'll let @owncloud/qa go wild to improve this separately.

return new File($this->fileView, $filesNode);
} else if ($filesNode instanceof \OCP\Files\Folder) {
return new Directory($this->fileView, $filesNode);
}

This comment has been minimized.

@jvillafanez

jvillafanez Mar 21, 2017

Member

We should log something here unless this null is properly detected and reported. I wouldn't like to miss this case if something goes horribly wrong.

@jvillafanez

jvillafanez Mar 21, 2017

Member

We should log something here unless this null is properly detected and reported. I wouldn't like to miss this case if something goes horribly wrong.

This comment has been minimized.

@PVince81

PVince81 Mar 21, 2017

Member

I'll throw a "WTF this can't be" style exception

@PVince81

PVince81 Mar 21, 2017

Member

I'll throw a "WTF this can't be" style exception

This comment has been minimized.

@jvillafanez

jvillafanez Mar 22, 2017

Member

Any chance that this exception is catched in the middle and doesn't show up? just making sure this will be visible.

@jvillafanez

jvillafanez Mar 22, 2017

Member

Any chance that this exception is catched in the middle and doesn't show up? just making sure this will be visible.

This comment has been minimized.

@PVince81

PVince81 Mar 22, 2017

Member

It's wrapped in a SabreException later upstream:

<?xml version="1.0" encoding="utf-8"?>
<d:error xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns">
  <s:exception>Exception</s:exception>
  <s:message>Unrecognized Files API node returned, aborting</s:message>
</d:error>
@PVince81

PVince81 Mar 22, 2017

Member

It's wrapped in a SabreException later upstream:

<?xml version="1.0" encoding="utf-8"?>
<d:error xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns">
  <s:exception>Exception</s:exception>
  <s:message>Unrecognized Files API node returned, aborting</s:message>
</d:error>
@jvillafanez

This comment has been minimized.

Show comment
Hide comment
@jvillafanez

jvillafanez Mar 21, 2017

Member

I haven't checked deeply the FilesReportPlugin.php file, in case someone wants to recheck that file

Member

jvillafanez commented Mar 21, 2017

I haven't checked deeply the FilesReportPlugin.php file, in case someone wants to recheck that file

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Mar 21, 2017

Member

I haven't checked deeply the FilesReportPlugin.php file, in case someone wants to recheck that file

There aren't many changes there so I think it's ok.

Basically the response stays the same except that the array we pass there is sliced.
Most of the added logic is to slice at the right moment.

@jvillafanez adjusted, see last two commits

Member

PVince81 commented Mar 21, 2017

I haven't checked deeply the FilesReportPlugin.php file, in case someone wants to recheck that file

There aren't many changes there so I think it's ok.

Basically the response stays the same except that the array we pass there is sliced.
Most of the added logic is to slice at the right moment.

@jvillafanez adjusted, see last two commits

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Mar 22, 2017

Member

Final review ? @jvillafanez @DeepDiver1975
Thanks.

Member

PVince81 commented Mar 22, 2017

Final review ? @jvillafanez @DeepDiver1975
Thanks.

@jvillafanez

This comment has been minimized.

Show comment
Hide comment
@jvillafanez

jvillafanez Mar 22, 2017

Member

#26507 (comment) and #26507 (comment) . Other than that 👍

Member

jvillafanez commented Mar 22, 2017

#26507 (comment) and #26507 (comment) . Other than that 👍

Adjust REPORT search query, perf and unit tests
- disable REPORT without filter

Because we need to use Node::search($pattern) to find all matching nodes
in all the subfolder recursively, but the result nodes contain
incomplete information like owner.

- remove unneeded trailing slash when buildling response href

This got obsoleted when switching to generateMultiStatus()

- add integration pagination test for favorites REPORT

- throw exception for unknown node types in FilesReportPlugin
@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Mar 22, 2017

Member

Added comment, squashed my commits into a smaller set.

Since there are no actual code changes -> instamerge

Member

PVince81 commented Mar 22, 2017

Added comment, squashed my commits into a smaller set.

Since there are no actual code changes -> instamerge

@PVince81 PVince81 merged commit b31294a into master Mar 22, 2017

0 of 2 checks passed

Scrutinizer Installing Code
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

@PVince81 PVince81 deleted the filelist-pagination-webdav-report branch Mar 22, 2017

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Mar 22, 2017

Member

Raised doc ticket with some instructions: owncloud/documentation#2950

Member

PVince81 commented Mar 22, 2017

Raised doc ticket with some instructions: owncloud/documentation#2950

@michaelstingl michaelstingl referenced this pull request Sep 5, 2017

Open

Add a search field above the TableView - text search #17

0 of 5 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment