Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't suggest running btrfs balance regularly #3203

Closed
tom-- opened this issue Jan 2, 2018 · 20 comments
Closed

Don't suggest running btrfs balance regularly #3203

tom-- opened this issue Jan 2, 2018 · 20 comments

Comments

@tom--
Copy link
Contributor

tom-- commented Jan 2, 2018

The text (screenshot below) explaining the btrfs allocation graph is very good but I think it's not wise to suggest

You can keep your volume healthy by running the btrfs balance command on it regularly.

The btrfs faq says

Q: Do I need to run a balance regularly?

A: In general usage, no. A full unfiltered balance typically takes a long time, and will rewrite huge amounts of data unnecessarily. You may wish to run a balance on metadata only (see Balance_Filters) if you find you have very large amounts of metadata space allocated but unused, but this should be a last resort.

btrfs allocation caption

@ktsaou
Copy link
Member

ktsaou commented Jan 2, 2018

thanks!

@Ferroin have a look please.

I have to admit that I run metadata balance daily and full balance weekly on my SSDs, but this is probably a left-over from around 1,5 year ago, that btrfs had serious issues reclaiming free space. So, it didn't sound strange to me.

@Ferroin
Copy link
Member

Ferroin commented Jan 8, 2018

I would personally argue that the BTRFS FAQ needs to be updated. While technically accurate, the wording there is somewhat misleading. You don't need to run a balance regularly in most cases, but it's still a good idea to do so (although what @ktsaou mentions is probably overkill on an SSD unless the filesystem is very active).

I'll look at getting a discussion started on the official BTRFS mailing list about this and see if the developers and power users there can come to a consensus on this, and then update both the FAQ and the message on the netdata dashboard to match.

@tom--
Copy link
Contributor Author

tom-- commented Jan 8, 2018

To explain how I came to add this comment...

I've used ZFS for years but started with BTRFS only a few months ago. I tried to educate myself. I learned that routine maintenance includes monthly btrfs scrub. But the first I learned about routine btrfs balance was this caption in a netdata dashboard.

So read about it and found that balance is drastic, has complicated options and confusing documentation, and it's hard to know exactly how and when to use it. If BTRFS filesystems indeed require regular balance then either a) this requirement is not documented, or b) BTRFS is deficient software. It has to be one of the two. (The documentation's remark that balance should in future be an automatic background process suggests the latter.)

So I hope you can start that conversation @Ferroin because I'd value better guidance on this.

@Ferroin
Copy link
Member

Ferroin commented Jan 8, 2018

A full balance is drastic. It functionally rewrites the entire filesystem, with near zero means of limiting its impact on system performance (because balance operations run from kernel context, and no sane person should be messing around with kthread priorities from userspace). Filtered balances (which are what the comment in netdata was intended to suggest) are much less so, rewriting only parts of the filesystem with usually much less impact on system performance.

As far as options,, from a practical perspective you only really need to worry about balance filters, and there are only four filters for balance that matter for normal users:

  • convert: Does exactly what it sounds like, converting from one profile to another.
  • soft: Only useful with convert, this limits processing of chunks to only those that aren't already the profile specified for convert (so you can use this to do incremental conversions if you want).
  • usage: This selects chunks based on the percentage of space used in the chunk. It ends up matching all chunks of the given type who's usage is at most X% (so -dusage=0 matches only empty data chunks, while -musage=20 matches only metadata chunks that are at most 20% full (or at least 80% empty)).
  • limit: Is a simple limit on the number of chunks processed.

All the other filters are used to do really complicated stuff for testing and can be safely ignored by regular people who aren't doing development work or trying to fix very specific issues with the filesystem by hand.

As far as being automatic, certain things should be ideally (and actually, at least one such common case is now, BTRFS didn't used to deallocate empty chunks automatically), but not everything makes sense to be automatic. ZFS gets away with it's background scrubs and resilvering because they have very low easily controlled impact on performance, but BTRFS isn't really as good about that. We could probably stand to have the kernel automatically slowly repack blocks in mostly empty chunks into free space in other chunks, but even that isn't going to be cheap.

As far as the 'balances keep your FS healthy' thing, I've for a couple of years now recommended to people that they run a balance similar to the following n a daily basis on most BTRFS volumes to help keep empty space in partially full chunks from building up.:

btrfs balance start -dusage=50 -dlimit=2 -musage=50 -mlimit=4

That particular command should complete in at most a few minutes even on multi-terabyte volumes on cheap SATA disks that are mostly full, and will significantly help in avoiding one of the biggest cases of issues (filling up all of one chunk type and not being able to allocate another chunk of that type).

@tom--
Copy link
Contributor Author

tom-- commented Jan 8, 2018

@Ferroin Thanks for the detailed information. I'll experiment with your recommended daily invocation.

I think the lesson of this experience is that I don't understand BTRFS well enough to be using it.

@Ferroin
Copy link
Member

Ferroin commented Jan 8, 2018

I think the lesson of this experience is that I don't understand BTRFS well enough to be using it.

While you may feel that way, I don't think it's anywhere near that bad. The very fact that you brought up this issue in the first place shows you did more research than most users ever will, which is a good thing and the fact that you're coming to this conclusion before dealing with catastrophic failures is a good sign too. Together, those two facts make you better qualified than a pretty significant percentage of the people out there who try BTRFS, most end up having a problem, and then just give up and go elsewhere.

FWIW, subscribing to the official mailing list on http://vger.kernel.org/ is a reasonably good first step if you want to learn more about BTRFS. It's not a particularly high-volume list most of the time (average is probably about 30-50 messages a day), but paying reasonable attention to the discussions there can help significantly in learning how to work with BTRFS (it's both an development and users list).

@ktsaou
Copy link
Member

ktsaou commented Jan 13, 2018

@Ferroin you rock!

I removed my btrfs balance cron jobs and now I am running only this daily:

btrfs balance start -dusage=50 -dlimit=2 -musage=50 -mlimit=4 /

I also have a weekly scrub:

btrfs scrub start -B -d -c 2 -n 4 /

@tom--
Copy link
Contributor Author

tom-- commented Jan 13, 2018

@ktsaou how do you ensure they cannot be underway at the same time?

@ktsaou
Copy link
Member

ktsaou commented Jan 13, 2018

I don't.
It is simple to do it though, just prefix each btrfs command with this:

flock -x /tmp/btrfs.lck btrfs ... BTRFS PARAMS ...

and they will run one after another.

To test it, open 2 consoles and run:

# on first shell
flock -x /tmp/btrfs.lck sleep 10

# on second shell
flock -x /tmp/btrfs.lck echo ok

The second shell will print ok when the first shell completes.

@Ferroin
Copy link
Member

Ferroin commented Jan 15, 2018

I would suggest adding that, based on the discussion on the ML, it seems that there have been some reports of corruption caused by concurrent balance/scrub/defrag operations on the same volume.

Also, just a general update, I think we're reasonably close to figuring out exact recommendations for the FAQ, and once that's done, I'll probably just open a PR to switch things here to point at the FAQ (because the recommendations are liable to get really long).

@joanventura
Copy link

I have questions, I'm newbi about this. Today my system was on low performance, and i found this process running automatically by the OS.

btrfs balance start –v –dusage 1 /

I decided to let the process end, because i don't know what I'm dealing with. But when the process ended. started again with -dusage 5.

I know I need to cron a daily process to prevent this again. But my question are.

how many -dusage will be execute? -dusage 1 on balance status had 134 chunks, -dusage 5 on balance status had 68 chunks.

or How many chunks per -dusage are there? so, i can calculate how long it will take.

@tom--
Copy link
Contributor Author

tom-- commented Oct 21, 2018

@joanventura Those are BTRFS, not netdata, question. And I think there's an XY Problem in them too. All the same...

It seems you've two problems, a) if balance –dusage 1 is taking a long time you've got badly fragmented usage over the block groups, b) for routine maintenance, balance –dusage 1 on its own seems dubious.

  • You could attempt to figure out how your file system got in this state. Might not be easy and you should look for help elsewhere.

  • You probably should ensure mounts have the nossd option.

  • You probably should replace the existing routine balance job with @Ferroin's magic formula

      btrfs balance start -dusage=50 -dlimit=2 -musage=50 -mlimit=4
    

    which might also be enough to fix your existing allocation troubles after some days, weeks, ... idk

  • If unallocated space looks perilously low, you could take the system offline, make backups, run balance without filters (or something aggressive like that) to ensure a tidy BTRFS, in which case the magic is very likely to suffice.

@joanventura
Copy link

I have questions, I'm newbi about this. Today my system was on low performance, and i found this process running automatically by the OS.

btrfs balance start –v –dusage 1 /

I decided to let the process end, because i don't know what I'm dealing with. But when the process ended. started again with -dusage 5.

I know I need to cron a daily process to prevent this again. But my question are.

how many -dusage will be execute? -dusage 1 on balance status had 134 chunks, -dusage 5 on balance status had 68 chunks.

or How many chunks per -dusage are there? so, i can calculate how long it will take.

@joanventura Those are BTRFS, not netdata, question. And I think there's an XY Problem in them too. All the same...

It seems you've two problems, a) if balance –dusage 1 is taking a long time you've got badly fragmented usage over the block groups, b) for routine maintenance, balance –dusage 1 on its own seems dubious.

* You _could_ attempt to figure out how your file system got in this state. Might not be easy and you should look for help elsewhere.

* You _probably should_ ensure mounts have the `nossd` option.

* You _probably should_ replace the existing routine balance job with @Ferroin's magic formula
  ```
    btrfs balance start -dusage=50 -dlimit=2 -musage=50 -mlimit=4
  ```
  which might also be enough to fix your existing allocation troubles after some days, weeks, ... idk

* If **`unallocated`** space looks perilously low, you _could_ take the system offline, make backups, run balance without filters (or something aggressive like that) to ensure a tidy BTRFS, in which case the magic is very likely to suffice.

Thank you for reply.

Its a brand new installation, like 3 weeks old. The previous OS was really old, and i dont think it has this automatically routing.

It took like 3 hours. And yes I added the Magic formula to a Daily cron job, and it only takes like like 2 or 3 minutes.

My problem is that our database system is being used 24/7 by more than 500 people, high demand, read, write, delete, update. When this process started everybody was complaining. I added the daily cron job a midnight when there are only 10+ working on the database, so they don't notice it.

I have checked the unallocated space and it has a lot space. the system has 1.2tb raid 10 and it only has on use like 300gb.

I notice something good, before the process started it has on use 32% of the hard drive, when it finished it went to 22%. So it release a lot of unused space.

@tom--
Copy link
Contributor Author

tom-- commented Oct 24, 2018

@Ferroin
Copy link
Member

Ferroin commented Oct 24, 2018

Its a brand new installation, like 3 weeks old. The previous OS was really old, and i dont think it has this automatically routing.

Given the age, it sounds like you've got a workload that's pathologically bad on BTRFS (and highly active RDBMS is one of the big ones). it's still possible to run such workloads on BTRFS, it just requires extremely proactive maintenance to keep things from getting really bad.

It took like 3 hours. And yes I added the Magic formula to a Daily cron job, and it only takes like like 2 or 3 minutes.

Both of those sound about normal given what you say below about space usage. The 'magic' formula processes up to two half-full data chunks and up to four half-full metadata chunks, which translates in most cases to at most 1GB of data being processed.

My problem is that our database system is being used 24/7 by more than 500 people, high demand, read, write, delete, update. When this process started everybody was complaining. I added the daily cron job a midnight when there are only 10+ working on the database, so they don't notice it.

Yeah, unfortunately balance operations run in kernel context, so there's not any easy way to limit what impact they have on system performance. On the note of the databases themselves, if you can get things set up such that you're using database transactions efficiently, and the database engine writes out any given transaction in a single pass, you can probably significantly improve how BTRFS behaves. Most of the issue it has with databases is the sheer volume of small writes that are used to update databases. Compacting the database files to clean up unused space (if the backend supports that) on a regular basis may also help significantly.

I have checked the unallocated space and it has a lot space. the system has 1.2tb raid 10 and it only has on use like 300gb.

That's actually a really good balance if you've got such an active system. I would suggest planning to enlarge that array when you get up to about 800-900GB of space usage, otherwise the daily balances may not be enough to keep things from getting stuck.

I notice something good, before the process started it has on use 32% of the hard drive, when it finished it went to 22%. So it release a lot of unused space.

Yeah, this is actually pretty typical for usage like yours. What happens is that when BTRFS goes to allocate a new block for a write, it tries to find some minimum amount of space first, which depending on how the systme is configured may be unreasonably large, so it ends up scattering writes. You can mitigate this to some degree if you're using devices that BTRFS thinks are SSD's by passing nossd in the mount options.

@tom--
Copy link
Contributor Author

tom-- commented Oct 24, 2018

To reiterate @Ferroin's last point. Learn about nossd.

I'm pretty sure that my first production use of BTRFS ran itself out (nearly but not quite, thanks to netdata) of unallocated data space real fast because nossd. It could be a similar story for you.

@cakrit
Copy link
Contributor

cakrit commented Dec 16, 2018

I am closing this since there haven't been any more comments in over a month. Feel free to reopen if there are more questions.

@camoz
Copy link

camoz commented Aug 29, 2021

In case someone is interested, here's a link to the mentioned discussion on the btrfs mailing list: Recommendations for balancing as part of regular maintenance?

@endolith
Copy link

Also, just a general update, I think we're reasonably close to figuring out exact recommendations for the FAQ, and once that's done, I'll probably just open a PR to switch things here to point at the FAQ (because the recommendations are liable to get really long).

The FAQ continues to say the same that as the initial comment, 6 years later.

Should this issue be reopened until it's updated?

@camoz
Copy link

camoz commented Apr 28, 2023

@endolith Hm, the wiki has a warning at the top:

OBSOLETE CONTENT
This wiki has been archived and the content is no longer updated

So I think it's fine I guess?

The now obsolete main wiki page also links to the new docs (which currently do not seem to have a "maintenance" section and from a quick search it seems that the advice to run regular balances is not present in the new docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants