Support storing multiple files in the same backend object ("fragments") #52

Nikratio · 2018-12-28T12:53:44Z

[migrated from BitBucket]

Storing lots of small files is very inefficient, since every file requires its own block.

We should add support for fragments, so that multiple files can be stored in the same block.

With the new bucket interface, we should be able to implement this relatively easily:

Upload workers get list of cache entries, new blocks may be coalesced into single object
CommitThread() and expire() only call to worker threads once they have a reasonably big chunk of data ready
We keep objects until reference count of all contained blocks is zero
Therefore, blocks may continue to exist with refcount=0 and can possibly be reused
s3qladm may need a "cleanup" function to get rid of these blocks
When downloading object, db can be used to determine which blocks in the object belong to files (and should be added to cache) and which ones can be discarded
Minimum size of cache entries passed to workers could be adjusted dynamically based on upload bandwith, latency, and compression ratio of previous uploads

szepeviktor · 2018-12-28T13:01:28Z

Please add an option to enable/disable fragments.

Nikratio · 2018-12-31T11:03:55Z

Another option would be to use range downloads to download only the fragment that is needed at the time.

Nikratio · 2019-01-04T10:53:51Z

Google Storage supports batched object uploads and downloads. This would give us all the advantages of fragments without any of the drawbacks. Need to check if S3 has something similar and, if not, if this is reason enough to stick with the old plan...

Nikratio · 2019-01-04T10:54:52Z

Not that if we drop the plan to implement fragments we'd also be able to simplify the metadata schema and drop one table completely.

Nikratio · 2019-01-04T11:41:08Z

S3 doesn't support batched operations.

But maybe the latency issue can be addressed by decoupling the number of parallel uploads from the number of upload threads (which can't be very high because it determines the amount of concurrent compressions and encryptions).

Nikratio · 2019-01-04T13:15:04Z

I've decided not to implement this. Revisiting the pros and cons, it is not worth it.

Upload speed is better increased by using batched uploads or more concurrent connections (i.e, in the backend layer). For download of many small files, we should get much better results by implementing some sort of readahead then by hoping that files happen to be in the same fragment.

Thus opened #63 and #62 instead (I won't create a bug for read-ahead unless someone actually plans to work on it).

segator · 2019-01-04T15:12:17Z

There are some providers that are banning by amount of requests, in case we have millons of little files this is a problem because to put or get files get lot of time.
I think it is a good idea to have fragments, others FS implemented it and works pretty fast!

szepeviktor · 2019-01-04T16:11:35Z

e.g. when backing up servers I put /etc in a tar to make it one file only.

Nikratio · 2019-01-04T16:17:15Z

Could you provide some examples of providers where this is a problem, examples of some "other FS" that do this, and eloborate on what "pretty fast" means in this context? Otherwise I remain unconvinced :-).

segator · 2019-01-04T17:03:07Z

for example GDrive and blackblaze is banning when too much requests in a fixed time. so if you are running 100,000 1kb files you will get some time to upload them because the banning, of course s3ql retry can handle it waiiting and retrying but it got lot of time and you get soft-ban.

proxyFS use fragments concept. https://github.com/swiftstack/ProxyFS.

Sorry about "pretty fast" is not so descriptive 🥇
I mean is better to upload in case of 100K little files, 50 fragments than 100K objects, and then in case of
s3ql crash you need to pass fsck, fsck is also slow depending on the provider.
for example GDRive only allow you get 1000 objects by request and it get time to response all of them. in my case for example I have a FS with 900K objects I got more than 1h to pass fsck.
with fragments this can be easily reduced to the half or less.

too much requests it's a problem because the latency and Providers soft-ban.

Anway @Nikratio thank you for your amazing work!! I love s3ql.

szepeviktor · 2019-01-04T17:14:23Z

in case of s3ql crash you need to pass fsck, fsck is also slow

I second that.

Nikratio · 2019-01-04T19:48:36Z

Neither GDrive nor Backblaze are currently supported by S3QL, so I don't think that rate limits on their end should influence this decision.

The fsck time is an interesting point - but I do not fully understand the problem. With 900k objects you'd need 900 separate requests. Even when assuming a (pretty high) 0.5 seconds round-trip latency, that's only 7.5 minutes. To need 60 minutes, a single request would have to take 4 seconds - that seems too high. Could you file a separate issue about this? Please include the backend that you are using.

If the time needed for bucket listings really is a problem, I can also thing of some other solutions (issue multiple listing requests in parallel), we don't need to introduce fragments just for that.

Nikratio · 2019-01-04T19:51:05Z

Is there documentation for ProxyFS somewhere? Looking at the README file, it sounds to me as if ProxyFS is actually mapping files to objects 1:1.

segator · 2019-01-04T20:05:26Z

Neither GDrive nor Backblaze are currently supported by S3QL, so I don't think that rate limits on their end should influence this decision
hope gdrive it will be suported soon.

gdrive list object is slow, can take 2-10s per request
about parallel is not possible at least with gdrive, you get a pagination token to go the next page, I cannot jump between the whole list. I need to interate in order.
ProxyFS I talked with one of the developers in their slack to know more about how it works, but I finally See that have exactly same problem as s3ql, only single mount at a time is possible.

Nikratio added the enhancement label Dec 28, 2018

Nikratio mentioned this issue Jan 4, 2019

Drop "blocks" table from metadata schema #62

Closed

Nikratio closed this as completed Jan 4, 2019

Nikratio added the wontfix label Jan 4, 2019

Nikratio changed the title ~~Support fragments~~ Support storing multiple files in the same backend object ("fragments") Jan 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support storing multiple files in the same backend object ("fragments") #52

Support storing multiple files in the same backend object ("fragments") #52

Nikratio commented Dec 28, 2018

szepeviktor commented Dec 28, 2018

Nikratio commented Dec 31, 2018

Nikratio commented Jan 4, 2019 •

edited

Nikratio commented Jan 4, 2019

Nikratio commented Jan 4, 2019

Nikratio commented Jan 4, 2019

segator commented Jan 4, 2019

szepeviktor commented Jan 4, 2019

Nikratio commented Jan 4, 2019

segator commented Jan 4, 2019

szepeviktor commented Jan 4, 2019

Nikratio commented Jan 4, 2019

Nikratio commented Jan 4, 2019

segator commented Jan 4, 2019

Support storing multiple files in the same backend object ("fragments") #52

Support storing multiple files in the same backend object ("fragments") #52

Comments

Nikratio commented Dec 28, 2018

szepeviktor commented Dec 28, 2018

Nikratio commented Dec 31, 2018

Nikratio commented Jan 4, 2019 • edited

Nikratio commented Jan 4, 2019

Nikratio commented Jan 4, 2019

Nikratio commented Jan 4, 2019

segator commented Jan 4, 2019

szepeviktor commented Jan 4, 2019

Nikratio commented Jan 4, 2019

segator commented Jan 4, 2019

szepeviktor commented Jan 4, 2019

Nikratio commented Jan 4, 2019

Nikratio commented Jan 4, 2019

segator commented Jan 4, 2019

Nikratio commented Jan 4, 2019 •

edited