New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log files moderate fragmentation #22940
Comments
We do use fallocate. Whenever we append something we allocate more space via fallocate, in 8M steps. |
|
"Thanks" for immediately closing the issue despite the evidence that this is not working as intended. |
|
All the nine files that I have are in "8M steps" and all of them are fragmented as hell. |
|
Not even trying to confirm or deny the bug report just closing it. Amazing attitude. |
|
Reopening this to give it a closer look (and maybe /cc @DaanDeMeyer). |
|
With the hole-punching of archives change it's not exactly unexpected for archived journals to be fragmented. |
|
I'm using ext4 with a ton of free space. Mount options: |
|
Fragmentation is indeed expected because we grow the file at 8MB increments using ftruncate() and punch holes in it when archiving. When BTRFS is used, we rewrite the entire file so we can enable COW. The side effect is that we get rid of fragmentation as well. We could rewrite the file unconditionally to always get rid of fragmentation but that would end up doubling our write rate since we'd write every journal file twice effectively. Without gathering some actual data to compare the tradeoffs, I have no idea whether this would be a good idea or not. (Of course when rewriting there's a few extra things we could do, like coalescing entry arrays that might make rewriting worth it) |
|
Fragmentation is expected for files written piece meal. I see no problem wit that. 40 fragments isn't terrible. 40000 would be terrible. Quite frankly I doubt it's worth the fuss. We do what we can to minimize fragments, and the results are not terrible, so unless people can show it's worth generating additional IO to remove the fragments on archival I am not sure we should really bother. btrfs with cow is a different story, since writing to the middle of files will cause heavy fragmentation, way beyond what is seen on ext4, that's why we rewrite files with cow disabled, and rewrite it to reenable it on archival. Anyway, I'd just close this. Files with complex write patters cause fragmentation, there's no news in that. |
|
That's fair but in this particular issue it's 8MiB journals and we allocate them in 8MiB chunks, so that's not really relevant. My assumption is this is only the archived journals and due to the hole punching. WRT @DaanDeMeyer's suggestions on rewriting archives to defragment them, it really feels like the kernel should just be giving us an ioctl to say "hey could you optimize this file when you get a chance since I'm done writing to it, kthx". That way the underlying filesystem could do clever things about making in-place reorganization and/or compression succeed in low space scenarios. It feels like we're working around shortcomings doing it in userspace, and depriving the filesystems of the opportunity to do it better. |
There's one of those for btrfs ( |
|
there's a defrag ioctl, and we actually used to issue it (but that was dropped in #21598, though I think mostly by accident). But not sure it's supported outside of btrfs. Defragging is not an obvious choice though. The IO you generate this way doesn't come for free, and thus the benefit of removing fragments must heavily outscale the benefit of minimal IO. I don't see that here. My educated guess is that 40 frags don't matter, 40000 would. Unless anyone actually shows that generating a lot of defrag io for maybe reducing 40 frags to a bit less is worth it I think we should close this. |
in the fs layer, usually ioctls start out in one fs and then get renamed when implemented in others. |
|
8MiB archives are kind of an edge case scenario, this gh issue arguably exists because of that IMO. Nobody aware of journald's preallocation of 8MiB increments would expect to see 40 fragments backing them on an empty filesystem, so it's totally understandable that @birdie-github filed this under the impression something is misbehaving. Through the lens of their minimal size, one might extrapolate that indeed you'd have problematic numbers of fragments for a larger journal. It's this very same edge case that drove me to uncover such disproportionate wasted space in preallocated tiny journals. Now instead of a complaint that the space is wasted, we have a complaint that the file is fragmented, thanks to hole-punching. But I think this particular fragmentation is probably harmless. Closing is fine with me, @birdie-github does what you're seeing make more sense now? |
Not really or maybe I'm too stupid. What I've picked up is that instead of preallocating space you punch a hole and it's bound to be fragmented because holes are not guaranteed to be continuous. Would be great if there were a config option for that. I don't use BTRFS and I don't really care about other CoW fileystems. |
It's not instead of preallocating. It's that when a journal gets archived, in the interests of reclaiming wasted space, a hole-punching pass is performed to deallocate substantial unused (zeroed) regions. It's basically sparsifying the archive. It's really not a big deal, accesses in holes are being fulfilled with generated zeroes now without hitting the backing store at all. The fragments straddling the holes should be left in their prior layout from the contiguous preallocation. The holes have just been made available for reuse, it's up to the filesystem to pack appropriately sized objects in those holes without ill effect for accessing those objects. Can you demonstrate an actual measurable performance problem resulting from this? If not, it's mostly just cosmetic, and please close the issue. |
|
As far as I understand this functionality, "a hole-punching pass", is only needed for CoW filesystems. If I'm correct would be nice to get an option to disable it altogether. |
That's not the case. We added hole-punching when archiving to reclaim space journald prepared for use but ended up archiving the file before actually writing anything other than zeroes into. This doesn't only apply to CoW filesystems, what gives you that impression? |
|
Anyway, closing for now. We can certainly reopen this if anybody can show this is a real performance bottleneck, and that the benefit of defragging or so could bring real benefits. But without numbers, a 40 frags doesn't make me nervous... Hope it's ok if I hence close |
|
Is it possible to disable this hole-punching/rotation/whatever? I see nothing in It'd be great if old files were simply deleted and new binary log files got created. |
I'm using systemd-249.9-1.fc35.x86_64 in Fedora 35 with pretty much everything by default.
I'm quite appalled by how horribly fragmented systemd log files are.
The other log files extents are: 39, 28, etc. All files are exactly
8388608bytes.I was under the impression that systemd is capable of preallocating file space using
fallocate(2), why doesn't it do it?Please do it by default. This must not be happening.
The text was updated successfully, but these errors were encountered: