Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Allow upload_chunk_size to change in an existing installation #1876

Open
jmdeboer-surfsara opened this issue Apr 29, 2024 · 2 comments
Labels
feature new feature request

Comments

@jmdeboer-surfsara
Copy link

jmdeboer-surfsara commented Apr 29, 2024

The situation as I understand it:

In the current implementation of filesender, the upload chunk size is determined at the time of deploying the service by setting the 'upload_chunk_size' option in config.php. If this option is not specified, it defaults to 5 MB.

Once this choice has been made it is not possible to change this without making it impossible to already existing retrieve data from the storage, since the data will not reassemble correctly and decryption will fail.

Why can this be seen this a problem?

We are running a filesender instance, using our own S3 service as a storage backend. When the service was deployed, the default was not altered hence data is stored in 5 MB chunks (there is a suggestion in the documentation to use 40 MB for greatly increased performance but this was overlooked).

Of course there are good reasons to use chunks. Also, one of the limits outlined in the documentation mentions PHP memory usage. But with memory being cheap, larger chunks (like 100MiB) are probably preferred.

However, not being able to change the chunk size once it has been chosen is sub-optimal imho. This leads to unnecessary load on our storage cluster resources, something we now cannot rectify.

How might this be solved?

I suggest making upload_chunk_size an attribute of the stored file, with the value for any new uploads being the config value as it is used now.

First of all an int field name upload_chunk_size could be added to the (file?) db table. The migration for this should also make the value of this field the same as the current upload_chunk_size for existing records.

Next, the code that retrieves the data should look at the value of this field for the file being retrieved, and use that for fetching/decrypting the chunks and reassembling the file.

Lastly, the code that does the chunking and encryption for uploads can remain largely the same; it just looks at the global upload_chunk_size, uses that, and stores this value with each file.

This will make it possible to change the chunk size on an existing installation during a maintenance window. Any data will be read with the chunk size that was in effect at the time of storage.

A bonus:

Using a larger chunk size (or being able to change it afterwards) comes with an additional benefit if you are not using your own S3 storage, but are using AWS; after all one of the metrics they bill you on is the number of requests. I am sure Jeff would be happy if we all use smaller chunks, that just means more requests Amazon can bill you for...

This are just some initial thoughts and I am probably overlooking things, but I am wondering what your thoughts on the matter are?

@monkeyiq
Copy link
Contributor

I think this is a good idea. I would probably target the Transfer table for storing the upload_chunk_size used as I think it can just apply to all the files for the transfer.

There has been discussion in the past about being able to make the chunk_size dynamic. So it can ramp between a range, say 1mb to 200mb depending on your config.php. This way something starting at 4mb might move upwards if the network speed can allow it or drop back to 1mb if the network transfer is not reliable. But dynamic chunk size per file can be seen as another extension.

With that said, the crypto code uses for example
www/js/crypter/crypto_app.js:

crypto_chunk_size:   window.filesender.config.upload_chunk_size,

So it would need to be extended to pass the chunk size around instead of assuming this static default.

@monkeyiq monkeyiq added the feature new feature request label May 21, 2024
@WebSpider
Copy link
Collaborator

Yeah, we've been experimenting with the chunk sizes combined with various S3 storage vendors, and it makes a lot of difference, but also is a double-edged sword.

On the one hand, storage performance, and thus upload and download speeds for end-users really ramp up if you increase the chunk sizes. On the other hand, since a retry is per chunk, the impact of losing a chunk is also a lot bigger.

For now, in countries with high bandwidths, it seems acceptable to go up to 50MB per chunk, which is tenfold of the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature new feature request
Projects
None yet
Development

No branches or pull requests

3 participants