Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunkservers: SPARSE chunk files [feature] #370

Closed
onlyjob opened this issue Jan 10, 2016 · 19 comments
Closed

chunkservers: SPARSE chunk files [feature] #370

onlyjob opened this issue Jan 10, 2016 · 19 comments

Comments

@onlyjob
Copy link
Member

onlyjob commented Jan 10, 2016

While native chunk compression #366 is a worthwhile feature I'd like to request SPARSE file support for chunk files. Ideally enabled by chunkserver option (or even by default) it can dramatically improve space efficiency and performance for small files by utilising sparse file feature of the underlying file system. On ext4 and xfs file systems sparsification reduces some chunk files from 68K to 4K which is 17 times smaller.
Implementation is trivial so I hope this can be implemented soon-ish. ;-) Thanks.

@Zorlin
Copy link

Zorlin commented Jan 10, 2016

+1, sparse file support is an easy big win.
On 10 Jan 2016 15:41, "Dmitry Smirnov" notifications@github.com wrote:

While native chunk compression #366
#366 is a worthwhile feature
I'd like to request SPARSE file support for chunk files Ideally enabled by
chunkserver option (or even by default) it can dramatically improve space
efficiency and performance for small files by utilising sparse file
https://enwikipediaorg/wiki/Sparse_file feature of the underlying file
system On ext4 sparsification reduces some chunk files from 68K to 4K
which is 17 times smaller
Implementation is trivial so I hope this can be implemented soon-ish ;-)
Thanks


Reply to this email directly or view it on GitHub
#370.

@onlyjob
Copy link
Member Author

onlyjob commented Jan 10, 2016

Thanks Zorlin. :-) By the way this is my 100th bug report to LizardFS so I reckon I should celebrate. ;-)

@Zorlin
Copy link

Zorlin commented Jan 10, 2016

@onlyjob I'll open a beer in your honour. 💯

@onlyjob
Copy link
Member Author

onlyjob commented Jan 11, 2016

Here is an example of how to write a sparse file: https://sources.debian.net/src/util-linux/2.27.1-3/sys-utils/fallocate.c/#L176

@onlyjob
Copy link
Member Author

onlyjob commented Feb 24, 2016

Please prioritise implementation of this feature. It is straightforward to implement but it may have profound effect on performance: chinks are about 60k in size and for small 10 KiB files ~50k is wasted in chunk file. That is 500% extra space and 500% penalty for reading and writing. Most file systems have 4k clusters so saving sparse chunks increases performance for small files and optimises use of storage capacity. Savings in storage is not negligible: I manually sparsified all chunks on 2 TB HDD (80% utilisation) with fallocate -v --dig-holes which saved about 100 GiB !

@DarkHaze
Copy link
Contributor

Done in patch http://cr.skytechnology.pl:8081/#/c/2430/. Should be included in next release.

@onlyjob
Copy link
Member Author

onlyjob commented Mar 10, 2016

Awesome by why not enable by default??

@Zorlin
Copy link

Zorlin commented Mar 10, 2016

Not an LFS dev, but I suspect it needs more testing and tire kicking before
it graduates to a default feature :)
On 10 Mar 2016 21:07, "Dmitry Smirnov" notifications@github.com wrote:

Awesome by why not enable by default??


Reply to this email directly or view it on GitHub
#370 (comment).

@onlyjob
Copy link
Member Author

onlyjob commented Mar 10, 2016

Fair enough but what kind of concerns are there? Bugs? New feature should be tested at least a little before commit so I suppose it will be OK to enable it by default...

@DarkHaze
Copy link
Contributor

We don't want to enable it by default (for now) because we are not sure about performance impact of this feature. Remember that punching holes depends on underlying file system (ext4, xfs, brtfs support it) and it might be not too efficient in some of them.

We just don't want to break LizardFS installation for someone.

@onlyjob
Copy link
Member Author

onlyjob commented Mar 17, 2016

"performance impact" is reduced I/O on last chunks of almost all files (except very few that are perfectly aligned to chunk size and have no holes).

@DarkHaze, I appreciate your conservative approach but if this feature does not qualify to be enabled by default then I don't know what else would... :-)

I don't know where this feature won't be efficient...
It appears to be useful in all scenarios I could think of...

@blink69
Copy link
Contributor

blink69 commented Mar 17, 2016

Someone should check this option on non supported filesytems :) and then we can enable this by default :)

@onlyjob
Copy link
Member Author

onlyjob commented Mar 17, 2016

You are right of course. I can only think of layered FUSE file system and I'm wondering whether there should be some sort of safeguard for disabling this feature conditionally depending on file system type or feature detection...

@4Dolio
Copy link

4Dolio commented Mar 20, 2016

This is a note to myself to test this out on strange filesystems.

@FlorianHeigl
Copy link

Should probably not be enabled on ext4 without addnl testing of the ext4 module version:
https://bugzilla.redhat.com/show_bug.cgi?id=1323577

@onlyjob
Copy link
Member Author

onlyjob commented Apr 24, 2016

With Linux 4.3 and 4.4 I've seen no problems on ext4. I'm using sparse chunks ever since this feature was implemented. It looks like this problem have no (practical) impact on LizardFS...

@onlyjob
Copy link
Member Author

onlyjob commented Apr 25, 2016

Only most recent kernels seems to be affected. Since it is also a security vulnerability in kernel it will be fixed fairly quick. I doubt we need to avoid using great feature due to temporary regression in kernel.

@FlorianHeigl
Copy link

i just said to test for affected versions, not to not use it. :-)
I'm definitely not going to be affected... XFS, i'm just as eager about the functionality as you'd be but:
if someone may have that kernel, that someone may lose data, so it has to take into account.

@onlyjob
Copy link
Member Author

onlyjob commented Apr 26, 2016

I understand your concerns. :)

Apparently no data loss is happening on 4.4 or maybe Linux kernel in Debian is already patched. LizardFS is quite resilient to errors and there were no errors detected on ext4 chunkservers... I think we're all good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants