Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make system more responsive by using the fadvise DONTNEED #252

Open
chrysn opened this issue Mar 24, 2015 · 39 comments
Open

make system more responsive by using the fadvise DONTNEED #252

chrysn opened this issue Mar 24, 2015 · 39 comments

Comments

@chrysn
Copy link

chrysn commented Mar 24, 2015

while attic is running, file system access is slowed down for the rest of the system. that can be expected, but its effects could be mitigated if attic used posix_fadvise(POSIX_FADV_DONTNEED) on the files it is backuping. this tells the operating tystem that the "data will not be accessed in the near future" (man 2 posix_fadvise).

this should minimize the amount of disk cache contents dropped from ram to accomodate attic's reads, while not slowing down attic. (it won't read over the same file itself (will it?), but the kernel can't know that without being told, and might keep the files around just in case attic wants to look at them again).

@aurel32
Copy link

aurel32 commented Apr 4, 2015

fadvise DONTNEED basically tells the kernel that the data in the cache is not needed anymore, but the data is already in the cache at that point. If the data is actually read only once, I think the solution would be to bypass the kernel cache by opening the file with O_DIRECT.

@ThomasWaldmann
Copy link
Contributor

https://github.com/ThomasWaldmann/attic/commits/o_direct I did some O_DIRECT changes there (read the commit comments). Somehow I still see the cache growing rather quickly - I suspect it is due to writes (I only changed input file reads to use O_DIRECT).

Note: I gave up the O_DIRECT route. It is just a pain to use due to the alignment limitations imposed by O_DIRECT and python not supporting that.

@ThomasWaldmann
Copy link
Contributor

See PR #279 for posix_fadvise based solution, it works (on linux, py >= 3.3). \o/

Note:
With py 3.2, the repo writes will still spoil the cache as 3.2 does not have os.posix_fadvise.
The input data reads won't spoil the cache though as that is implemented in C and independent of Python version.

@jborg
Copy link
Owner

jborg commented Apr 13, 2015

Is POSIX_FADV_DONTNEED really what we want? Just because we know that we will not need a specific piece of data again it is not our business to tell the kernel to remove it from the cache. We have no way of knowing if the data was originally loaded by us or by another process and how actively used it is.

Has anyone checked what (if any) posix_fadvise settings are used by other backup solutions (In the default configuration)?

@ThomasWaldmann
Copy link
Contributor

from http://linux.die.net/man/2/posix_fadvise :
"""
Programs can use posix_fadvise() to announce an intention to access file data in a specific pattern in the future, thus allowing the kernel to perform appropriate optimizations.

The advice applies to a (not necessarily existent) region starting at offset and extending for len bytes (or until the end of the file if len is 0) within the file referred to by fd. The advice is not binding; it merely constitutes an expectation on behalf of the application.
"""

The phrasing "programs can ... announce an intention" and "constitutes an expectation on behalf of the application" rather clearly means to me that the scope of this is "application", not "system-wide". So the advice of the application "dontneed" is correct in our case.

I didn't do specific performance measurements, but I watched how the cache behaved:

  • without fadvise, attic blows up the cache to occupy almost all memory and I'ld bet it kills a lot of cached content of other applications all the time as long as it is running.
  • with fadvise, the cache doesn't grow, impact is minimal. and speed is about the same. I'ld expect that if another application running in parallel would need some cache, it would get and keep it most of the time - even if the non-local behaviour you suspect would happen for a little while (== after we called fadvise for that application's files).

@dnnr
Copy link

dnnr commented Apr 14, 2015

As far as the specification goes, I'd agree with your interpretation. The actual implementation is, however, what ultimately counts. From what I could tell, Linux currently responds to a DONTNEED fadvise by immediately invalidating the pages, regardless of their use by any other process. One could argue that this isn't in the spirit of the specification of posix_fadvise(), but that doesn't change the fact that such a behavior is undesirable for any backup program.

The issue with your observation of cache usage is that it you can't infer anything from it. Assume for a moment that DONTNEED does in fact evict cache pages: Would anything change in your observation?

@adept
Copy link
Contributor

adept commented Apr 14, 2015

Rsync does use fadvise.

This page seems relevant: http://insights.oetiker.ch/linux/fadvise.html

On Tue, Apr 14, 2015 at 7:17 AM, dnnr notifications@github.com wrote:

As far as the specification goes, I'd agree with your interpretation. The
actual implementation is, however, what ultimately counts. From what I
could tell, Linux currently responds to a DONTNEED fadvise by immediately
invalidating the pages, regardless of their use by any other process. One
could argue that this isn't in the spirit of the specification of
posix_fadvise(), but that doesn't change the fact that such a behavior is
undesirable for any backup program.

The issue with your observation of cache usage is that it you can't infer
anything from it. Assume for a moment that DONTNEED does in fact evict
cache pages: Would anything change in your observation?


Reply to this email directly or view it on GitHub
#252 (comment).

Dmitry Astapov

@jborg
Copy link
Owner

jborg commented Apr 15, 2015

Rsync does use fadvise. This page seems relevant: http://insights.oetiker.ch/linux/fadvise.html

This page seems to talk about a patch for rsync. Has this been accepted upstream?

@adept
Copy link
Contributor

adept commented Apr 15, 2015

I believe that in the middle of that page it says "the patch has been
accepted upstream"

On Wed, Apr 15, 2015 at 10:02 PM, Jonas Borgström notifications@github.com
wrote:

Rsync does use fadvise. This page seems relevant:
http://insights.oetiker.ch/linux/fadvise.html

This page seems to talk about a patch for rsync. Has this been accepted
upstream?


Reply to this email directly or view it on GitHub
#252 (comment).

Dmitry Astapov

@adept
Copy link
Contributor

adept commented Apr 15, 2015

Hmm. Maybe I spoke too soon. Looking at https://tobi.oetiker.ch/patches/,
there are patches for several rsync versions there, so maybe it remained a
patch...

On Wed, Apr 15, 2015 at 10:06 PM, Dmitry Astapov dastapov@gmail.com wrote:

I believe that in the middle of that page it says "the patch has been
accepted upstream"

On Wed, Apr 15, 2015 at 10:02 PM, Jonas Borgström <
notifications@github.com> wrote:

Rsync does use fadvise. This page seems relevant:
http://insights.oetiker.ch/linux/fadvise.html

This page seems to talk about a patch for rsync. Has this been
accepted upstream?


Reply to this email directly or view it on GitHub
#252 (comment).

Dmitry Astapov

Dmitry Astapov

@wscott
Copy link

wscott commented Apr 15, 2015

It is pretty trivial to test. Download rsync and check.
(apt-get source rsync; grep -R fadvice rsync-3.1.0)

Nope, not in rsync here.

On Wed, Apr 15, 2015 at 5:08 PM, Dmitry Astapov notifications@github.com
wrote:

Hmm. Maybe I spoke too soon. Looking at https://tobi.oetiker.ch/patches/,
there are patches for several rsync versions there, so maybe it remained a
patch...

On Wed, Apr 15, 2015 at 10:06 PM, Dmitry Astapov dastapov@gmail.com
wrote:

I believe that in the middle of that page it says "the patch has been
accepted upstream"

On Wed, Apr 15, 2015 at 10:02 PM, Jonas Borgström <
notifications@github.com> wrote:

Rsync does use fadvise. This page seems relevant:
http://insights.oetiker.ch/linux/fadvise.html

This page seems to talk about a patch for rsync. Has this been
accepted upstream?

Reply to this email directly or view it on GitHub
#252 (comment).

Dmitry Astapov

Dmitry Astapov

Reply to this email directly or view it on GitHub
#252 (comment).

@ThomasWaldmann
Copy link
Contributor

you need to grep again for fadvise (with "s").

@wscott
Copy link

wscott commented Apr 16, 2015

Heh, well that is embarrassing. However the typo was in the email, I had
searched for the right string. Still not there.

On Wed, Apr 15, 2015 at 6:44 PM, TW notifications@github.com wrote:

you need to grep again for fadvise (with "s").

Reply to this email directly or view it on GitHub
#252 (comment).

@anarcat
Copy link

anarcat commented Jun 1, 2015

take a look at the bup side before jumping into that ship. i heard they discoverd performance problems where such policies would actually remove good contents from the cache that was unrelated to backups, seriously impacting performance on production servers.

it is quite possible that POSIX_FADV_DONTNEED actually removes good pages from the cache! this could be a serious problem for database servers for example.

@ThomasWaldmann
Copy link
Contributor

@anarcat do you have some more specific info? fadvise acts on a open filehandle (that belongs to the specific file opened by the backup process for reading).

I could imagine that a simplistic fadvise kernel implementation kills the cached blocks of THAT file for all processes, but even that would be better than not using fadvise because of the lower cache flooding pressure of all the files that are not used by any other process and that do not end up / remain in the cache when using fadvise.

@anarcat
Copy link

anarcat commented Jun 1, 2015

the #bup people were kind enough to send me a few refs:

https://www.percona.com/blog/2010/04/02/fadvise-may-be-not-what-you-expect/
https://groups.google.com/forum/#!topic/bup-list/7D9b2at3MMc
https://groups.google.com/forum/#!topic/bup-list/nQ24WCT1g4E

it's still subject to discussion on the bup mailing list, but please do be careful about this - i don't believe it is process-specific...

it's nice to optimise attic: but if it's done at the depends of the rest of the system that is being backed up, that doesn't sound like a good tradeoff. :)

@anarcat
Copy link

anarcat commented Jun 1, 2015

apparently, the issue came up in this thread:

https://groups.google.com/d/msg/bup-list/TXfSAgD9-ZM/saofDu1CdxcJ

where bup would trash the sqlite cache of a big file, which had to be reloaded in memory, which was basically breaking the site...

@ThomasWaldmann
Copy link
Contributor

@anarcat I looked through the first 3 links. Lots of guessing and gut feelings (I can do that, too and I even posted reasons why I think it is good, while they didn't really reason about why they think it's worse than without).

I didn't find anything in the google groups link of previous post about fadvise, did you post wrong url?

@anarcat
Copy link

anarcat commented Jun 1, 2015

the last link was where they discovered the issue apparently.

for me it makes sense that trashing the cache will have a performance impact. when you load a page in the kernel VM and tell the kernel to drop it when you close the FD, it will drop the page - it seems logical to me. the fact that another process was using it at the same time probably doesn't change anything.

but that's just me.

@ThomasWaldmann
Copy link
Contributor

See my April 14 comment.

@jdchristensen
Copy link
Contributor

It shouldn't be too hard to test this experimentally. E.g., on a system with 8GB RAM, create eight 1GB files named file1 through file8 as well as a 4GB file named big. Then do:

sync; echo 3 > /proc/sys/vm/drop_caches; sync
time dd if=big of=/dev/null  [should be slow]
time dd if=big of=/dev/null  [should be fast]
use attic to backup file1 file2 ... file8
time dd if=big of=/dev/null  [might be slow]

Then do exactly the same thing, but using a version of attic patched to use fadvise. Hopefully the last line would remain fast.

To address the concerns about losing data that you want in the cache, run the test a third time, this time including the file big in the attic backup (listed first?). Hopefully the last line would remain fast.

For the record, my instinct is with TW here. Why would the kernel provide this feature if it could trash the cache used by other processes? But the only way to know is to test. And from what I've read, it may depend on kernel version, so multiple testers is probably a good idea. Maybe someone can provide a program that does the analog of cat > /dev/null with a flag indicating whether to use fadvise, to make it easier to test this cleanly?

@jbms
Copy link
Contributor

jbms commented Jun 2, 2015

This article provides a very good explanation of the complexity of using
fadvise on Linux:
http://insights.oetiker.ch/linux/fadvise.html

It is indeed the case that FADV_DONTNEED will purge the file from the cache
immediately if it is not dirty (and will do nothing if it is dirty).

I agree that this behavior isn't very helpful, but that is how it is.

It seems to me that the mincore hack in the article is not worth using.

On Mon, Jun 1, 2015 at 4:05 PM, Dan Christensen notifications@github.com
wrote:

It shouldn't be too hard to test this experimentally. E.g., on a system
with 8GB RAM, create eight 1GB files named file1 through file8 as well as
a 4GB file named big. Then do:

sync; echo 3 > /proc/sys/vm/drop_caches; sync
time dd if=big of=/dev/null [should be slow]
time dd if=big of=/dev/null [should be fast]
use attic to backup file1 file2 ... file8
time dd if=big of=/dev/null [might be slow]

Then do exactly the same thing, but using a version of attic patched to
use fadvise. Hopefully the last line would remain fast.

To address the concerns about losing data that you want in the cache,
run the test a third time, this time including the file big in the attic
backup (listed first?). Hopefully the last line would remain fast.

For the record, my instinct is with TW here. Why would the kernel provide
this feature if it could trash the cache used by other processes? But the
only way to know is to test. And from what I've read, it may depend on
kernel version, so multiple testers is probably a good idea. Maybe someone
can provide a program that does the analog of cat > /dev/null with a flag
indicating whether to use fadvise, to make it easier to test this cleanly?


Reply to this email directly or view it on GitHub
#252 (comment).

@ThomasWaldmann
Copy link
Contributor

@jdchristensen that test just shows if fadvise dontneed removes the file from cache (or not). while that is a bit interesting, more interesting is comparing the effect from permanently flooding the cache with a lot of data only needed once (attic without fadvise) vs. avoiding to flood the cache (attic with fadvise).

@jbms I've read that article back then (but didn't want to put a lot of [C] code, like shown there, for a maybe negligible effect).

@jdchristensen
Copy link
Contributor

@ThomasWaldmann I proposed running attic three times. The difference between run 1 and run 2 would exactly show that not flooding the cache gives an improvement for other applications (dd, in this case). The difference between run 1 and run 3 would show whether fadvise removes a file from the cache that was already there. Both bits of information seem important for this discussion.

If the information at
http://insights.oetiker.ch/linux/fadvise.html
is correct, it does seem like the mincore hack would be good to use.

@sourcejedi
Copy link

👎 We shouldn't DONTNEED the user's files. bup reverted this to fix a reported bug. We still haven't gathered any positive proof, and Linux was fixed in 2011:

It's possible Linux can still be improved - as suggested by the un-merged patch which implements NOREUSE as a gentler alternative. (Or that it's regressed :).

Everyone has this problem. If we don't have the resources to test this properly (or implement the mincore hack), that means we just don't have the resources to do this.

If we had any reason to be worried in the first place, we could keep the DONTNEED on attic files only. It shouldn't hurt anyone else; it'll hurt us but probably not where we care. It could account for about half the cache buildup (when we're not working with virtual machine image files or similar).

@ThomasWaldmann
Copy link
Contributor

Sorry, but I can't follow what you wanted to say.

But I am doing the fadvise DONTNEED thing in borg (after practically seeing beneficial effects), so comparing attic vs. borg (or borg with/without that call) should be easy.

I'll change things / accept pull requests for borg as soon as there is practical proof that change is needed / beneficial.

@jdchristensen
Copy link
Contributor

@ThomasWaldmann Since you have both attic and borg with fadvise handy, you could run the three tests I proposed and see if there is a problem.

@ThomasWaldmann
Copy link
Contributor

@jdchristensen here are the results, completely as expected for me.

http://paste.thinkmo.de/rNXe3Mmm#fadv_test.txt

The problem is just that they are not that helpful in deciding whether fadvise DONTNEED is helpful or not.

With fadvise DONTNEED, they show that the cache isn't killed by the backup process (as expected).
Without fadvise DONTNEED, they show that backing up some stuff kills the cache for everything else.

With fadvise DONTNEED, it kills files from the cache which are backed up (as expected, due to the simple implementation in linux).
Without fadvise DONTNEED, it does not kill files from the cache which are backed up (also as expected).

So one might think one is as good as the other, but I still think fadvise DONTNEED is way better as it avoids that the cache is flooded with useless data (potentially for hours) which is much more benefit than killing the currently backed up file from the cache (in that moment, it can be cached again a second later) is harmful for the case when that file is in use.

The tests you proposed can't show that, though (and any simple test might be a bit unrealistic compared to real system behaviour).

@anarcat
Copy link

anarcat commented Aug 17, 2015

hmm... well, the problem described earlier is specifically with stuff like mysql databases that get totally flushed out of the cache, having a major performance impact on the whole system. from what i understand, that performance problem is confirmed by those tests?

since this is a corner case (and linux doesn't deal well with this (yet)), maybe we should make DONTNEED optional somehow?

@GreenReaper
Copy link

For what it's worth, I've found rsync also causes linux to fragment memory like no tomorrow creating cache entries for tiny files and then releasing some (but not all) of them over the next few hours. If you're not going to use the data again right away, it makes sense to avoid that.

YMMV, but you might check system memory usage - a significant free proportion and lots of small blocks in /proc/buddyinfo that can't be compacted with compact_memory but are when you drop_caches are a sign of fragmentation wastage. Note however that fragmentation can take time to build up - days in some cases, depending on how actively the system is used.

@sourcejedi
Copy link

fadvise SEQUENTIAL is supposed to help as well, since about 2009. Unfortunately not - at least not as well as DONTNEED. Negative results published for any other project looking at this :).

I guess the DB case was a worst-case problem because there's suddenly a lot that needs reading back, and the reads will be very random (lots of disk seeks). Btw with DONTNEED we're purging the entire file after reading each chunk - that "live database" case is really going to hate us. (Not that I think it was a good case; for a database he should really have been using LVM snapshots).

The internet says you can hack O_DIRECT reads from pure python (using mmap and readinto). I think the real annoyance would be needing manual buffering and a readahead thread. If it wasn't for the threading I'd be pretty eager to code it.

mincore() looks pretty ugly especially with the mmap(). Maybe the performance side isn't too bad, if you avoid actually touching any of the pages (and you can batch the calls up a bit).

@jdchristensen
Copy link
Contributor

@ThomasWaldmann Thanks for doing the experiment! Now we know for sure how linux handles DONTNEED (at least for your kernel version). It's really unfortunate that DONTNEED kills things that were already in the cache, but that's life.

I suspect that for most uses, using DONTNEED will be much better. But I suspect that some use cases might suffer, so providing a command line option to disable it seems reasonable (as @anarcat suggested).

@ThomasWaldmann
Copy link
Contributor

@sourcejedi i also tried using O_DIRECT, but that was a total pain.

@perguth
Copy link

perguth commented Aug 20, 2015

@ThomasWaldmann, @jdchristensen: My results, running it (Borg, 8cf0ead693) on a HDD via USB3, Intel Core M 5Y10 using Fedora 22: https://gist.github.com/pguth/481980bd67993984eda4

@perguth
Copy link

perguth commented Aug 20, 2015

Testing this way the newest code ("call fadvise DONTNEED for the byterange we actually have read") I got these results (before/after): https://gist.github.com/pguth/4b436cf15c58549cbc4d/revisions

@sourcejedi
Copy link

Right, so my concern in #158 wasn't an issue (<1%), and it was tested on a HDD over USB (so low-speed io). Another entry for the journal of negative results, good work everyone :).

I'm surprised. I guess this HDD might actually do enough read-ahead internally, or the kernel code we're using doesn't work how I thought. (I still like not hammering DONTNEED multiple times if someone else is reading the file too :).

@ThomasWaldmann
Copy link
Contributor

@sourcejedi yep, just had the same idea. :)

@neutrinus
Copy link

I think, the "cleaning cache of db files when using DONTNEED problem" is because people are doing things wrong way. Why do they backup a working database? If you need a consistent backup, please either:

  • stop the db and perform backup
    OR
  • perform DUMP and backup that one

You don't want to backup the live&working db because of possible inconsistencies. If you use the right way (dump or db halt), there will be no problems with DONTNEED.

@ThomasWaldmann
Copy link
Contributor

@neutrinus very true! I already thought the same, but did not document it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests