Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebDAV directory listing is too slow #561

Open
archekb opened this issue Apr 11, 2024 · 5 comments
Open

WebDAV directory listing is too slow #561

archekb opened this issue Apr 11, 2024 · 5 comments

Comments

@archekb
Copy link

archekb commented Apr 11, 2024

📝 Describe the bug

WebDAV directory file listing preparing too slow.
When I try to get files listing by method PROPFIND via URL /dav/e2-store/backup listing for 7500 files is preparing more than 20 minutes (result file is 4MB), when in the web interface I can see all files (39 pages x 200 items per page) without any delay.

First 2mb ready after 5 minutes, third one ready after more 5 minutes and last one needs about 10 minutes

Test via internet with NginX proxy:

curl -X PROPFIND --user 'user:pass' 'https://example.com/dav/e2-store/backup/' --basic -o backup_listing
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4533k    0 4533k    0     0   3758      0 --:--:--  0:20:35 --:--:--  4959

Test on cells localhost:

curl -X PROPFIND --user 'user:pass' 'https://localhost:10443/dav/e2-store/backup/' --basic --insecure -o backup_listing
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4533k    0 4533k    0     0   3070      0 --:--:--  0:25:11 --:--:--  6437

I added storage from local FS and create 7500 files by script:

for i in {0001..7500}
do
  echo "test" > "file_${i}"
done

and tried to get listing

curl -X PROPFIND --user 'user:pass' 'https://localhost:10443/dav/test' --basic --insecure -o test
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4307k    0 4307k    0     0  27532      0 --:--:--  0:02:40 --:--:-- 29116

it more faster than s3-compatible storage, but 3 minutes it still too slow, I believe. When on web interface I can see it without any delay (get for 1 page / 200 items is 400ms via internet, for full dataset it will be about 16 sec and this is 38 requests, if it were one request, I believe it will be faster).

⚙️ How-to Reproduce

Steps to reproduce the behavior:

  1. Create 7500 files in your directory on s3 storage
  2. Try to get WebDAV listing

🩺 Environment / Setup

Complete the following information:

Server Versions:

  • Cells Version: 4.4.0
  • Server OS: Ubuntu 22.04 (docker)
  • Other dependencies: Nginx 1.25 (no proxy buffering)

Client used for testing:

  • Browser chrome, safari
  • Client OS: iOS 17.4 / Ubuntu 24.04 (curl)

Additional context:

  • Datasource type: iDrive e2 (s3 compatible) structured
@archekb
Copy link
Author

archekb commented Apr 14, 2024

I tried the same operation with fusion s3fs connected as local folder, the result are more worst, about 57 minutes for get the listing.

I found project sftpgo, which support s3 and wabdav (written on go), and for this folder result - 1 second:

curl -X PROPFIND --user 'user:pass' 'https://example.com/dav/e2-store/backup' --basic -o backup_listing
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4564k    0 4564k    0     0  2798k      0 --:--:--  0:00:01 --:--:-- 2797k

I believe you can compare they're webdav implementation and yours and improve your version.

PS Environment is the same.

@cdujeu
Copy link
Member

cdujeu commented Apr 15, 2024

Hi @archekb

Thanks for your reports.
As a disclaimer, DAV is a crappy API and heavily depends on the DAV client performing many requests. Same for fuse-based connectors that may trigger multiple "stat" requests on each files to build their listing info. Furthermore, listing 200 files vs. 7500 is a very different operation, even page by page. And I am not even pointing your "sftpgo" example, that is probably listing a local folder, where as Cells is listing an index, loading metadata from various services, and checking permissions (ACL) during the same flow. So you may compare apple and oranges here.

That said, the DAV endpoint may need some performance optimisation, as we applied to the main API used by the frontend (heavy use of caching).

In between we would recommend using the cells-client if you want to interact programmatically with the server (uses the same APIs as frontend)

-c

@archekb
Copy link
Author

archekb commented Apr 15, 2024

Hi, @cdujeu , thanks for your reply.

heavily depends on the DAV client

In all cases I used curl for getting listing, the conditions was equal s3 store and get listing for folder with 7500 files and we are talking about one operation aka one request PROPFIND. I believe we can skip discussion about clients. (I was test a few clients and behavior was the same.)

Furthermore, listing 200 files vs. 7500 is a very different operation, even page by page

I believe you can optimize current mechanism, for webdav, and make 38 requests to function which prepare listing for html frontend (I believe it do the same loading metadata from various services, and checking permissions (ACL)), merge it together and convert results to webdav xml response. And I believe it will more faster then 25 minutes.

And this is real problem, when curl can wait 25 minutes for listing, but other webdav clients can't wait 25 minutes for listing, usually it's timed out after 600 sec, maybe earlier.

"sftpgo" example, that is probably listing a local folder

No, my bad, I do not describe it clearly, but it's still S3-compatible iDrive E2 storage. "sftpgo" prepare listing for folder with 7500 files from s3 storage less than 1 sec. Even if it were a local folder, Cells do the same for 2 min 40 seconds seconds, it is 160x times slower. But actually it was s3 storage, and Cells was for 1500x times slower.

So you may compare apple and oranges here.

I compare two application which allows me to get access to my s3 storage via webdav. I believe it's reasonable compare.

we would recommend using the cells-client

Thanks, but my task is to take a backup the phone, customers, many times ask you implement this simple feature, which allows automatically sync files from phone with cloud by schedule here, here, here and here. But who's care?

That's why I should use third party software and webdav for sync my files with cloud.

@archekb
Copy link
Author

archekb commented Apr 21, 2024

Correct config NginX proxy for WebDAV:

 location /dav/ {
        proxy_buffering off;
        proxy_request_buffering off;
        proxy_pass_request_headers on;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-Host $host;
        proxy_set_header X-Forwarded-Port $server_port;

        # Proxy timeouts between successive read/write operations, not the whole request.
        proxy_connect_timeout              10m;
        proxy_send_timeout                 10m;
        proxy_read_timeout                 10m;
        proxy_pass https://localhost:8443;
    }

This mandatory, if you don't wanna have a randoms files rejects and errors in NginX logs:

        proxy_buffering off;
        proxy_request_buffering off;
        proxy_set_header Host $host;

@cdujeu you can add it in to NginX proxy doc section, because I spend a few days for proxy_request_buffering, because error from Nginx was [error] 98#98: *15 SSL_write() failed (32: Broken pipe) while sending request to upstream, request: "PUT /dav/... HTTP/2.0", upstream:...

@cdujeu
Copy link
Member

cdujeu commented Apr 29, 2024

hi @archekb we'll dig deeper for the performances - it's probable that the current implementation (based on golang standard dav library that wraps our internal datasources are virtual filesystem) does make unnecessary operations (like OpenFile when it's all about getting stats).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants