Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce server requirements by redirecting webdav-transfer directly to object storage #1486

Closed
fds2610 opened this issue Sep 22, 2016 · 4 comments

Comments

@fds2610
Copy link

fds2610 commented Sep 22, 2016

Hi folks,

I have been playing around with a couple of different storage-systems in the last few months and there materialized an idea how to reduce server-requirements if scaling up more than thousands of users:

If I am right,

  • (some) Intel E7 / i7 processors support Multithreading technology which allows one CPU-core to handle two threads simultaneously
  • every user-connection produces one thread in apache which reserves half of one processor-core on such Intel-based machines
  • every file being downloaded must be handled by such a thread and blocks CPU resources for transferring data
  • to handle more than 1000 concurrent users you need currently more than 500 cores resulting in 22 CPUs (Xeon E7-8880) which means at least 6 - 12 very large Enterprise servers with 2-4 sockets

Idea:
To handle more datatransfer simultaneously we can redirect pure file-transfer to be served directly by a capable storage-system that can talk "WebDav" protocol. The result would be that for WebDav requests only directory-related tasks would be handled on Server-CPU but the file-transfer itself would be handled externally, freeing up resources for another thread.

(Possible) solution:

  • EMC provides with such a capable Storage-Server called ECS (Elastic Cloud Server)
  • The Webdav module could get a configurable option to activate file-transfer-redirection to the ECS
  • ECS acts as an S3 Object Storage replacing /data directory for all User's files or ECS is mounted via NFS to /data
  • via the RESTful API the Webdav-Module must "convince" the ECS to send its file directly through the client-connection back to the client. The Connection-details must be handed over and the ECS will have to send its data directly.
  • Both Systems will have to be configured in the same network segment so the Firewall might accept the File-stream from the ECS as the answer to clients request to the apache

Contribution (resources) from my side:

  • I could provide a complete remote operatable test-system for developing and testing against an EMC ECS as this is not a common storage system for most programmers
  • I would also provide a Testing VM connected to the ECS storage including snapshoting it once a day to provide possible rollback-requests
  • I could provide contact to EMC experts for discussing programming issues

What else does a developer need ?

Please comment on this idea, regards,
Felix

img_0080

Web-resources:

@nickvergessen
Copy link
Member

I don't think this is easily possible, due to authentication and privilege-testing?
How would the underlying webdav storage know, whether a user is authenticated and which groups he belongs to and ....
?

@MorrisJobke
Copy link
Member

I don't think this is easily possible, due to authentication and privilege-testing?

I guess you are right. @karlitschek @icewind1991 for some technical details.

Otherwise I would close this.

@icewind1991
Copy link
Member

every user-connection produces one thread in apache which reserves half of one processor-core on such Intel-based machines

An OS has no problem scheduling multiple software threads per "hyperthread"

every file being downloaded must be handled by such a thread and blocks CPU resources for transferring data

While downloading we're mostly waiting for IO (either downloading from s3/whatever or waiting for the transfer to the client) especially for larger files the cpu will not be a bottleneck for files.

An average server should have no problems running at least a dozen of downloads per core/hyperthread (database load will probably be a bottleneck first)

That said there are some gains to be had from bypassing Nextcloud in the download process.

The best way to achieve something like this is to provide a "direct download" url to a client in the directory listing which would be used instead never making a request to Nc in the first place or having the Nc server redirect to the "direct download".

The first approach far from trivial though since it needs to ensure that file permissions and access is properly handled, most ways to solve this would put just as much load on the Nc server or more to do the permission verification.

The 2nd approach again has the permissions issue, we can't just give the client the full s3 credentials that we use on the server so a one time key would need to be generated. By the time you've created the one time key (involving a round trip to s3) and handeling the redirect you'll have used more resources that it takes to just proxy the download unless you're working with very large files.

Doing "direct download" is only really adventitious in a very small percent of the use cases and involves a significant amount of work so I don't see this ending up high in our priority anytime soon. (That said anyone is free to start hacking ofc. 😄 )

@MorrisJobke
Copy link
Member

Doing "direct download" is only really adventitious in a very small percent of the use cases and involves a significant amount of work so I don't see this ending up high in our priority anytime soon. (That said anyone is free to start hacking ofc. 😄 )

Said this I would like to close this. Nevertheless: we are happy about contributions in that regards. If you want to share some working code doing the one time key generation: we are happy to review it and check it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants