Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add search support on the REPORT method #25373

Closed
2 tasks done
PVince81 opened this issue Jul 6, 2016 · 39 comments
Closed
2 tasks done

Add search support on the REPORT method #25373

PVince81 opened this issue Jul 6, 2016 · 39 comments
Assignees
Milestone

Comments

@PVince81
Copy link
Contributor

PVince81 commented Jul 6, 2016

Currently the REPORT method on the files DAV (from #22112) provides a way to filter by tags.

It should be extended to also allow specifying search attributes.
Eventually it should be possible for the search app to use this method instead of a private endpoint.

Backend changes

  • TODO: improve REPORT to work with oc:search and oc:depth: 2-4 md
  • TODO: improve REPORT to work without any search pattern (for PROPFIND pagination): 0.5-1md
  • new REPORT type for searching files, plugged into the existing PHP search API (which will deliver full text search when search_lucene or search_elastic is enabled): Search API based on webDAV report #31873
  • Add capability

Frontend changes

Follow up

@PVince81
Copy link
Contributor Author

PVince81 commented Jul 6, 2016

Additionally we should add pagination on the REPORT method that can be used regardless of the search.

Ref: #13915

@DeepDiver1975
Copy link
Member

Taken from a draft spec which was planned for 9.1:

Requirements

Any client using ownCloud APIs wants to search in files based on meta information as well as content. Clients are our mobile clients as well as the web ui – even the desktop client can gain some benefits from it (hint: pagination).

Since everything which is related to files is handled via WebDAV – search will be just the same.
This fits quite will into the overall architecture: we do tagging and comments via webdav as well and other dav based protocols support search mechanisms as well (see https://tools.ietf.org/html/rfc6352#section-8.6 and https://tools.ietf.org/html/rfc4791#section-7.8)

There is even a search specification on its own: https://tools.ietf.org/html/rfc5323
Requirements for the search API:

  • pagination
  • search in content (full text search)
  • search in meta data like name, mtime, mime type, size
  • ? integrate tags and comments ?
  • ? exif information ?
  • limit search to a sub folder
  • ? limit to share related information like shared with and shared by ?

Feature description

We should follow the calendar and addressbook query approach.
First of all because these concepts have proven to work quite well in the area of caldav and carddav.

The WebDAV search spec introduces a completely new verb – this will pretty sure result in deployment issues. Furthermore the search spec is supported by not one single webdav client.
Last but not least the spec is pretty rich and its implementation can be rather time consuming.

Use-cases

Gallery wants to search in a sub folder Pictures for all jpeg and a special camera model as set in the exif information
An Android music player wants to search for all mp3s with an id3 tag set to a specific artist
The desktop client wants to list all files in a subfolder but paginated with 1000 files per page/request.

Next Steps

  1. Have a look at calendar and addressbook query specs and define the query xml for files
  2. Decide on the search scope

@DeepDiver1975 DeepDiver1975 modified the milestones: 9.2, backlog Jul 7, 2016
@DeepDiver1975
Copy link
Member

setting this to 9.2 - we need this for paginated file list in files.

@DeepDiver1975
Copy link
Member

@georgehrke you already did come up with an proposal for the report object - please add here - THX

@georgehrke
Copy link
Contributor

@DeepDiver1975
Copy link
Member

@georgehrke feel free to submit a wip-pr to see how the implementation can look like - THX

@PVince81
Copy link
Contributor Author

Make it a separate report called "search-files" ?

Because if we mix it together with the existing "filter-files" report, it will become messy, especially if we want to combine "search by X but also filter by favorites", etc

@DeepDiver1975
Copy link
Member

In addition let's try the json way .... see https://youtu.be/sqgvjidj7iQ

@supportreq
Copy link

@PVince81 are we ready with this?

@PVince81
Copy link
Contributor Author

PVince81 commented Dec 7, 2017

Search query:

<?xml version="1.0" encoding="utf-8" ?>
<oc:filter-files xmlns:a="DAV:" xmlns:oc="http://owncloud.org/ns" >
        <a:prop>
                <oc:id/>
                <oc:fileid/>
                <oc:permissions/>
                <a:getlastmodified/>
                <a:getetag/>
                <a:getcontenttype/>
                <a:resourcetype/>
                <oc:downloadURL/>
                <oc:ddC/>
                <oc:size/>
                <oc:owner-id/>
                <oc:owner-display-name/>
                <oc:size/>
                <oc:checksum />
                <oc:tags />
                <a:quota-used-bytes/>
                <a:quota-available-bytes/>
        </a:prop>
        <oc:search>
                <oc:pattern>pattern-goes-here</oc:pattern>
                <oc:offset>0</oc:offset>
                <oc:limit>5</oc:limit>
        </oc:search>
</oc:filter-files>

Properties are PROPFIND-properties.

Response is PROPFIND response with a list of result nodes.

@PVince81
Copy link
Contributor Author

PVince81 commented Dec 7, 2017

We could use the Depth header or have oc:depth to specify:

  • depth: 0: only search or return results from current folder
  • depth: inf: search / results from all folders.

To do a pagination PROPFIND without search: use REPORT with Depth: 0/1 + oc:offset, oc:limit
To do a search in all folders: use REPORT with Depth: Inf + oc:offset, oc:limit

@PVince81
Copy link
Contributor Author

PVince81 commented Dec 7, 2017

  • move files app search field to use REPORT
  • instead of using PROPFIND for file list in web UI, use REPORT + pagination

=> moved to top post

@PVince81 PVince81 modified the milestones: backlog, triage Dec 7, 2017
@PVince81
Copy link
Contributor Author

PVince81 commented Dec 7, 2017

  • TODO: improve REPORT to work with oc:search and oc:depth: 2-4 md
  • TODO: improve REPORT to work without any search pattern (for PROPFIND pagination): 0.5-1md

Beware that it will not be possible to combine oc:search with oc:filter-rules.

=> moved to top post

@PVince81
Copy link
Contributor Author

PVince81 commented Dec 7, 2017

  • decide whether to use oc:depth or the header. I do like to see everything related to the query inside of the body.

@DeepDiver1975
Copy link
Member

decide whether to use oc:depth or the header. I do like to see everything related to the query inside of the body.

make it part of the xml body

@michaelstingl
Copy link

@PVince81 Go for the simplest approach for v1, and we'll test-drive it in the new iOS app (owncloud/ios-app#53) and provide feedback…

@jvillafanez
Copy link
Member

Regarding the expression stuff: it is likely that the search_lucene or search_elastic already have a language you can use. All that matters here is that we pass the string as is to the search providers then who then decide how to parse and use the expression.

Yes. The interpretation of the search string will depend on each particular search provider. Our search service will just send the string to the provider without any manipulation on our part.
This implies no changes on our plans.

In general I'd expect people to only have a single search app enabled. I'm not sure whether we really support having multiple search providers enabled at the same time. @butonic

Having multiple search apps is possible. The results will be merged. If we only want to support a search provider we have to limit it somehow in the search service.

Note that this limitation isn't planned, so we'll need to check possiblities.

Is there logic for the existing search field that disables filename-based search whenever another search provider is enabled ?

No. The search_elastic app removes the core's provider and inserts its own, while the search_lucene just inserts the provider.

The current REPORT method used for searching favorites can have results with the same file names but different href URLs. In case of duplicates we might need to deduplicate based on the href field. I'm not sure if we should concern ourselves with such duplicates at this stage.

I don't think we should worry about duplicates as long as the clients are aware. If there are duplicates within a provider, blame the provider. We might want to separate the results somehow based on the provider: my provider returns these files, your provider these ones, etc, this would clarify what results come from what provider.

Does the web UI search have pagination ?

Yes, but it isn't properly implemented: it fetches all the results and then returns the slice we want. I don't think the provider will scale for large datasets, and fetching all the data (because the provider doesn't support pagination) won't help.

The question here is whether we want to support pagination or not taking into account that there might be several search providers, at least until we decide to have only one provider.

Instead of pagination we can have a limit, kind of "give me the first X results". This can be easier to handle on our side. The drawback is that we won't handle pages, so the <oc:offset>0</oc:offset> doesn't make sense in this scenario.

As far as I remember the web UI has two different search routines:

The backend call is the same. The UI present the results differently.

@PVince81
Copy link
Contributor Author

No. The search_elastic app removes the core's provider and inserts its own, while the search_lucene just inserts the provider.

Is there already an API that aggregates the results ? I'd assume that the web UI search field calls said API. We could just use the same and not worry about single or multiple providers.

Regarding pagination:

Yes, let's only add a limit and no offset. If no limit given, set a default one (or fail with 400, whatever we already do now). Don't let the server return infinite results.

Did you check what format the results are returned in ? Are these easy to pack into PROPFIND-like response format ? Is additional lookup required ?

@jvillafanez
Copy link
Member

Is there already an API that aggregates the results ? I'd assume that the web UI search field calls said API. We could just use the same and not worry about single or multiple providers.

Not really. The search service appends all the results from all the providers. It's part of the service.

Did you check what format the results are returned in ? Are these easy to pack into PROPFIND-like response format ? Is additional lookup required ?

Feels BAD. https://github.com/owncloud/core/blob/master/lib/public/Search/Result.php is what we should be handling at webdav level, and https://github.com/owncloud/core/blob/master/lib/private/Search/Result/File.php is what we'll likely be handling, with the risk of being private API.

Looks like the intention was to use json_encode and run away with it. I don't know if we want to add a layer to be able to change what's below at some point in the future...

@jvillafanez
Copy link
Member

There is already a report for tags, so I'll create another one for the search. It doesn't seem a good idea to mix things.

@PVince81
Copy link
Contributor Author

Should we add Result->getObject() on the public API that returns the wrapped object for search results ? Since we're only processing File results we could directly get the FileInfo then.

@jvillafanez
Copy link
Member

PR ready in #31873

I think most of the "public" details are commented in the PR.

Regarding performance, without changing the search service and likely replace it with a new one, I don't think we can optimize more than that. Maybe fiddling with the webDAV layer, but I'd prefer to touch as less as possible and let webDAV handle it own stuff.

There are several cases that haven't been tested yet, mostly with large data (thousands of files) and some error handling.

@PVince81
Copy link
Contributor Author

@jvillafanez master PR merged, please backport.

Also please add a capability so the clients know this feature is available.

In general it should already be possible to do an OPTIONS call on the files endpoint to find out that the new report type is available. However considering that most capabilities are listed in our capabilities endpoint, better add it there as well. I wonder if we should also expose a value telling what search API type is available (ex: regular search, full text search, etc) so the clients can display a text accordingly.

@jvillafanez
Copy link
Member

Capabilities added in #31943

For the record, it's possible to check what reports are available through webDAV:

  1. Create a "supported.xml" file (for example), with the following content (it might contain more things, but this is the minimum to get what we want):
    <?xml version="1.0" encoding="UTF-8"?>
    <A:propfind xmlns:A="DAV:">
     <A:prop>
       <A:supported-report-set/>
     </A:prop>
    </A:propfind>
    
  2. Send a propfind request to the target node:
    curl -H "Depth: 0" -X PROPFIND --data "@supported.xml" -u admin:Password 'http://10.0.2.8:7080/remote.php/dav/files/admin' | xmllint --format -
    

The expected response would be something like:

<?xml version="1.0"?>
<d:multistatus xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns" xmlns:oc="http://owncloud.org/ns">
  <d:response>
    <d:href>/remote.php/dav/files/admin/</d:href>
    <d:propstat>
      <d:prop>
        <d:supported-report-set>
          <d:supported-report>
            <d:report>
              <oc:filter-files/>
            </d:report>
          </d:supported-report>
          <d:supported-report>
            <d:report>
              <oc:search-files/>
            </d:report>
          </d:supported-report>
          <d:supported-report>
            <d:report>
              <oc:filter-comments/>
            </d:report>
          </d:supported-report>
        </d:supported-report-set>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
  </d:response>
</d:multistatus>

Note that the <oc:search-files/> is only expected to appear in the root folder of the user (either "/remote.php/webdav/" or "/remote.php/dav/files/user/") and not in other places. (#31943 fixes this to make it consistent with the current implementation)

I'll backport both PRs at the same time.

@PVince81
Copy link
Contributor Author

@jvillafanez you can also use the OPTIONS method and/or check response headers, likely easier

@jvillafanez
Copy link
Member

It won't be enough. OPTIONS might check only if the REPORT method is available in the target endpoint, but it doesn't say anything about what reports are available

curl -v -X OPTIONS -u admin:Password 'http://10.0.2.8:7080/remote.php/dav/files/admin/foo'
*   Trying 10.0.2.8...
* TCP_NODELAY set
* Connected to 10.0.2.8 (10.0.2.8) port 7080 (#0)
* Server auth using Basic with user 'admin'
> OPTIONS /remote.php/dav/files/admin/foo HTTP/1.1
> Host: 10.0.2.8:7080
> Authorization: Basic YWRtaW46UGFzc3dvcmQ=
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Fri, 29 Jun 2018 06:45:38 GMT
< Server: Apache
< Strict-Transport-Security: max-age=15768000; preload
< Set-Cookie: oc8geiifsytk=o600f4hla6dp6ov56tdioqgrfp; path=/; HttpOnly
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: no-store, no-cache, must-revalidate
< Pragma: no-cache
< Set-Cookie: oc_sessionPassphrase=DMKgcg6HAkeZs5ngDj3tdz0Tj5DmsXuYrCkib9IWBfItxyVNq322WKTwCgLppGZ5mNo5KMUxpjFt%2FsY%2F0rjb24SIxHs9VeNJv9rBi9kQQILeVWkV%2BtsJS%2FZlWYE7wECG; path=/; HttpOnly
< Content-Security-Policy: default-src 'none';
< Set-Cookie: oc8geiifsytk=nbotj9i0irkbktajmld062ths2; path=/; HttpOnly
< Set-Cookie: cookie_test=test; expires=Fri, 29-Jun-2018 07:45:39 GMT; Max-Age=3600
< Allow: OPTIONS, GET, HEAD, DELETE, PROPFIND, PUT, PROPPATCH, COPY, MOVE, REPORT
< DAV: 1, 3, extended-mkcol
< MS-Author-Via: DAV
< Accept-Ranges: bytes
< Content-Length: 0
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< X-Robots-Tag: none
< X-Frame-Options: SAMEORIGIN
< X-Download-Options: noopen
< X-Permitted-Cross-Domain-Policies: none
< Content-Type: text/html; charset=UTF-8
< 
* Connection #0 to host 10.0.2.8 left intact


curl -v -H "Depth: 0" -X PROPFIND --data "@supported.xml" -u admin:Password 'http://10.0.2.8:7080/remote.php/dav/files/admin/foo' | xmllint --format -
*   Trying 10.0.2.8...
* TCP_NODELAY set
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to 10.0.2.8 (10.0.2.8) port 7080 (#0)
* Server auth using Basic with user 'admin'
> PROPFIND /remote.php/dav/files/admin/foo HTTP/1.1
> Host: 10.0.2.8:7080
> Authorization: Basic YWRtaW46UGFzc3dvcmQ=
> User-Agent: curl/7.58.0
> Accept: */*
> Depth: 0
> Content-Length: 125
> Content-Type: application/x-www-form-urlencoded
> 
} [125 bytes data]
* upload completely sent off: 125 out of 125 bytes
< HTTP/1.1 207 Multi-Status
< Date: Fri, 29 Jun 2018 06:49:10 GMT
< Server: Apache
< Strict-Transport-Security: max-age=15768000; preload
< Set-Cookie: oc8geiifsytk=2ja5ntq35becat6cigf6e388dq; path=/; HttpOnly
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: no-store, no-cache, must-revalidate
< Pragma: no-cache
< Set-Cookie: oc_sessionPassphrase=86vjXSjlSFsqVwl7GdppfNhAR91rpQ4dKb%2FHJkfPzrxNLBNmd%2FdjqMZwIwVF93cRkf2gOmQeXSBdH1cq74LYiuuRsqW6mU6KpuxEPRANZ5TEGCaHKIMjmt5f9B9XW1%2F3; path=/; HttpOnly
< Content-Security-Policy: default-src 'none';
< Set-Cookie: oc8geiifsytk=9r9cgopcefjtcsjvagoq3nqjur; path=/; HttpOnly
< Set-Cookie: cookie_test=test; expires=Fri, 29-Jun-2018 07:49:10 GMT; Max-Age=3600
< Vary: Brief,Prefer
< DAV: 1, 3, extended-mkcol
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< X-Robots-Tag: none
< X-Frame-Options: SAMEORIGIN
< X-Download-Options: noopen
< X-Permitted-Cross-Domain-Policies: none
< Content-Length: 499
< Content-Type: application/xml; charset=utf-8
< 
{ [499 bytes data]
100   624  100   499  100   125   2952    739 --:--:-- --:--:-- --:--:--  3692
* Connection #0 to host 10.0.2.8 left intact
<?xml version="1.0"?>
<d:multistatus xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns" xmlns:oc="http://owncloud.org/ns">
  <d:response>
    <d:href>/remote.php/dav/files/admin/foo/</d:href>
    <d:propstat>
      <d:prop>
        <d:supported-report-set>
          <d:supported-report>
            <d:report>
              <oc:filter-files/>
            </d:report>
          </d:supported-report>
          <d:supported-report>
            <d:report>
              <oc:filter-comments/>
            </d:report>
          </d:supported-report>
        </d:supported-report-set>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
  </d:response>
</d:multistatus>

@PVince81
Copy link
Contributor Author

@jvillafanez thanks for confirming. I might have mixed up with the "DAV" header when remembering.

@PVince81
Copy link
Contributor Author

PVince81 commented Jul 4, 2018

@jvillafanez please raise documentation ticket and copy your usage examples there. This will be useful for client devs to start programming against this.

@jvillafanez
Copy link
Member

documentation ticket in https://github.com/owncloud/documentation/issues/4241

@PVince81
Copy link
Contributor Author

PVince81 commented Jul 5, 2018

Considering that Phoenix is providing the future files app, let's not invest time in the old files app regarding integration with this API.

I've raised owncloud/web#172 for Phoenix integration.

The remaining advanced search topics will be adressed in this separate ticket: #31993

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants