Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] PROPFIND with Depth:infinity and streaming XML #950

Closed
michaelstingl opened this issue Apr 13, 2021 · 15 comments · Fixed by #1002
Closed

[POC] PROPFIND with Depth:infinity and streaming XML #950

michaelstingl opened this issue Apr 13, 2021 · 15 comments · Fixed by #1002
Assignees
Milestone

Comments

@michaelstingl
Copy link
Contributor

michaelstingl commented Apr 13, 2021

@DeepDiver1975 did the server side arrive in the master branch? Link to pr/issue? Is there a capability for the clients, so they can check for such magic?

@felix-schwarz what else do you need to start a prototype implementation?

@DeepDiver1975
Copy link
Member

did the server side arrive in the master branch?

no

Link to pr/issue?

owncloud/core#38583

Is there a capability for the clients, so they can check for such magic?

infinity depth is in the code base since day one .... it was simply breaking apart until today ;-)

but we can add a cap to tell client when to use this .....

@felix-schwarz
Copy link
Contributor

@DeepDiver1975 Does it make sense to develop against owncloud/core#38583 yet?

@michaelstingl I only need a server to develop and test against (or a way to set one up locally in a Ubuntu VM or Docker).

@DeepDiver1975
Copy link
Member

@DeepDiver1975 Does it make sense to develop against owncloud/core#38583 yet?

yes

@michaelstingl
Copy link
Contributor Author

@michaelstingl I only need a server to develop and test against (or a way to set one up locally in a Ubuntu VM or Docker).

I'll take care.

@felix-schwarz
Copy link
Contributor

@DeepDiver1975 Great!
@michaelstingl Thanks! Please ping me when something is ready.

@michaelstingl
Copy link
Contributor Author

Okay, I performed the following steps in my docker-compose setup:

Replace stable 10.7 with daily *.tar

docker-compose exec owncloud bash

cd /var/www
mv owncloud owncloud.docker
wget https://download.owncloud.org/community/daily/owncloud-daily-master.tar.bz2
tar xjf owncloud-daily-master.tar.bz2 
rm -rf owncloud/config/
mv owncloud.docker/config owncloud/
mv owncloud.docker/custom owncloud/
occ upgrade

Apply patch

wget https://patch-diff.githubusercontent.com/raw/owncloud/core/pull/38583.patch
cd owncloud
patch -p1 < ../38583.patch

Add 3906 directories and 19530 files to user bob

cd /var/www
wget https://raw.githubusercontent.com/LLNL/fdtree/master/fdtree.bash
chmod +x fdtree.bash
bash fdtree.bash -f 5 -d 5 -l 5 -C
mv LEVEL0* /mnt/data/files/bob/files/
occ files:scan bob

@michaelstingl
Copy link
Contributor Author

michaelstingl commented Apr 14, 2021

I saw data streaming from the first second:

% curl -XPROPFIND --user "bob" https://prop-infinity.shniq.cloud/remote.php/webdav -H "Depth:infinity" | xmllint --format - | grep href | wc -l
Enter host password for user 'bob':
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.0M    0 13.0M    0     0   760k      0 --:--:--  0:00:17 --:--:--  765k
   23444

23444 items in 17 seconds doesn't sound too bad, and this was a tiny CPX11 Wölkchen from Hetzner (2vCPU, 2GB RAM, 40GB Disk space on local NVMe SSD)

@michaelstingl
Copy link
Contributor Author

michaelstingl commented Jul 8, 2021

Detailed/full instructions for local Docker setup:

Start local 10.7 Docker

docker run --rm -d \
  --name owncloud \
  -p 18080:8080 \
  -e OWNCLOUD_APPS_ENABLE=oauth2 \
  -e ADMIN_USERNAME=admin \
  -e ADMIN_PASSWORD=admin \
  owncloud/server:10.7

Replace stable 10.7 with daily *.tar

docker exec -ti owncloud bash

cd /var/www
mv owncloud owncloud.docker
wget https://download.owncloud.org/community/daily/owncloud-daily-master.tar.bz2
tar xjf owncloud-daily-master.tar.bz2 
rm -rf owncloud/config/
mv owncloud.docker/config owncloud/
mv owncloud.docker/custom owncloud/
occ upgrade

Apply patch

wget https://patch-diff.githubusercontent.com/raw/owncloud/core/pull/38583.patch
cd owncloud
patch -p1 < ../38583.patch

Create user bob & connect to create data dir

curl -X POST http://admin:admin@localhost:8080/ocs/v1.php/cloud/users -d userid="bob" -d password="password"
curl -XPROPFIND --user "bob:password" "http://localhost:8080/remote.php/webdav"

Add 3906 directories and 19530 files to user bob

cd /var/www
wget https://raw.githubusercontent.com/LLNL/fdtree/master/fdtree.bash
chmod +x fdtree.bash
bash fdtree.bash -f 5 -d 5 -l 5 -C
mv LEVEL0* /mnt/data/files/bob/files/
occ files:scan bob

Try PROPFIND with Depth:Infinity

curl -XPROPFIND --user "bob:password" "http://localhost:8080/remote.php/webdav" -H "Depth:infinity" -o "propfind.xml"

Result:

root@00340d875dfe: /var/www # curl -XPROPFIND --user "bob:password" "http://localhost:8080/remote.php/webdav" -H "Depth:infinity" -o "propfind.xml"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12.9M    0 12.9M    0     0  2910k      0 --:--:--  0:00:04 --:--:-- 2980k

Result with new WebDAV endpoint:

root@00340d875dfe: /var/www # curl -XPROPFIND --user "bob:password" "http://localhost:8080/remote.php/dav/files/bob" -H "Depth:infinity" -o "propfind2.xml"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.1M    0 13.1M    0     0  2893k      0 --:--:--  0:00:04 --:--:-- 2973k

@michaelstingl
Copy link
Contributor Author

michaelstingl commented Jul 8, 2021

I thought about the following error cases:

  • Auth token expiration faster than XML download
  • Invalid XML somewhere in the middle of the response
    (ignore everything after XML-error for now, maybe log "good XML before error size" and "full response size")
  • Plenty of valid XML, but response ends with HTTP error

@DeepDiver1975
Copy link
Member

With respect to error handling this needs to be deeper worked on.

In the non streaming case:

  • we walk the file tree
  • if errors occur -> exception is thrown
  • exception is converted to http error status
  • resp body holds error xml

Current streaming case:

  • http response status code 207 is sent
  • we walk the file tree and start writing xml response
  • if error occurs -> exception is thrown
  • we stop processing the file tree -> xml response is unfinished and at least partially invalid
  • no error xml is sent to the client

Possible solution I:

  • do a pre-processing step to catch potential errors
  • in case of errors -> http error status and xml is sent to client
  • if successful -> stream xml
  • PROBLEM: the preprocessing step might require to walk the full file tree - this is time consuming and I fear we loose all the performance gain

Possible solution II:

  • just like the current implementation
  • in case of errors no exception is throws which terminates the tree walk but an xml entity with status 40x is written to the xml response body
  • this requires in depth analysis and testing of the code base to make sure not a single exception is accidentally slipping through

@felix-schwarz
Copy link
Contributor

First results testing the iOS implementation:

Test set prop-infinity.shniq.cloud

  • Folders: 3908
  • Files: 19535
  • Total: 23443 items

Results

Method Traffic in Traffic out CPU Network time Processing time Total time (mm:ss)
Individual 20 MB 2 MB 200% (app+FP) n/a n/a 7:09
Infinite 17 MB 17 KB 99% (app) 0:15 0:18 0:33

Error handling

  • the implementation keeps track of which folders have been fully received
  • if parsing is cancelled by the user or an XML error is encountered:
    • already parsed items are kept
    • schedules individual scans for folders whose contents has not yet been fully received (so item metadata will be fully retrieved eventually)

Summary

  • Infinite PROPFIND is 13 x faster in total compared to individual PROPFINDs (in "split" implementation)
  • "parallel" implementation, where incoming PROPFIND XML data is parsed immediately, could give a total speedup of up to 23 x

User interface

The infinite PROPFIND is automatically performed after account setup, while keeping the user informed and providing an option to cancel it (through a button labeled Skip)

HTTP phase parsing phase
Simulator Screen Shot - iPhone 12 Pro - 2021-07-12 at 14 53 56 Simulator Screen Shot - iPhone 12 Pro - 2021-07-12 at 14 54 11

@felix-schwarz
Copy link
Contributor

The PR for these changes: #1002

@felix-schwarz
Copy link
Contributor

#1002 adds support for infinite PROPFIND streaming parsing. Preliminary result: the parser keeps up well with the stream it receives, reducing the time to the time it takes to receive the infinite PROPFIND response. New numbers under "Results 2" in #1002.

@felix-schwarz
Copy link
Contributor

Results from bigger instances

Results from building and benchmarking different local Docker test environments:

Files Folders Total occ files:scan time Receive/Parse (total) time Streamed parsing time Stream performance
58595 11720 70315 02:16 not tested 0:40 1757 items / sec
78125 15626 93751 02:35 not tested 0:52 1802 items / sec
97655 19532 117187 02:54 1:07 / 0:50 (1:57) 1:09 1698 items / sec

Observations

  • parsing performance here is considerably better as I turned off all memory debug build options for these tests
  • there's only a neglible time difference between "downloading" the metadata - and parsing it while receiving it
  • the number of items received and parsed per second stays at around 1700+ even when increasing the number of items by 66%
  • Docker CPU usage while responding to the infinite PROPFIND request was just around 4%
  • I attempted but eventually stopped and did no longer pursue building an extremely large test instance with 1.8 million items:
    • the occ files:scan did not complete even after hours
    • then cancelling the scan with Ctrl+C resulted in a very slow server instance, with the infinite PROPFIND only returning a very low number of items / second
    • chances are high that my system was the limiting factor here

Visual changes

With streaming parsing, statistics show almost immediately, so I added a title line to provide permanent context of what is going on:
Bildschirmfoto 2021-08-06 um 00 01 25

Configuration changes

The infinite PROPFIND is now configurable via MDM with bookmark.prepopulation. Possible parameters:

  • doNot(default): do not perform prepopulation via an infinite PROPFIND
  • streaming: perform prepopulation with an infinite PROPFIND and streaming parsing
  • split: perform prepopulation in two steps: fully receive the infinite PROPFIND, then parse it

I also prepared support to allow enabling bookmark prepopulation via the capabilities. Since streaming has shown to be superior to split in every way, the stub for that future capability is boolean, switching between doNot and streaming.

@michaelstingl michaelstingl linked a pull request Aug 9, 2021 that will close this issue
3 tasks
@michaelstingl
Copy link
Contributor Author

michaelstingl commented Oct 11, 2021

@felix-schwarz you can find the final capability here: owncloud/core#38583 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants