-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chunkserver: bypass OS cache (posix_fadvise/POSIX_FADV_DONTNEED) #212
Comments
+1. I originally read this and went "no, that's crazy, it'd wreck performance" until I realized you were talking about only caching directory and file listings instead of whole files. |
Thanks. Cache displacement may be a primary reason for significant performance degradation unless you running chunkserver on dedicated machine. Chunkservers may be quite active -- I observe over 10000 cached chunk files creating enough pressure to notice slowdown in everything else running on the same server... |
I'm particularly interested in this patch as we have some machines serving as much as 60-80TB from a single box (via Supermicro JBOD with consumer drives). What would be the (ballpark) performance impact on that sort of dedicated machine? |
I'm not qualified to prepare this patch -- I'm simply incompetent in C/C++ these days as the last time I did C coding was back in 1995... I'm starting chunkservers using
and although I've been doing it only for limited time subjectively it feels like everything runs smoother, cache no longer seems over-utilised etc.. I think we're not talking about any performance "impact" whatsoever, even on dedicated machines. Indeed cache would be better used for directories, executables and whatnot rather than wasted for chunks because if chunks are cached everything else will be eventually displaced from the cache. |
I meant to say that on dedicated machines you will not see performance improvement (it will just run as usual or slightly better) while most beneficial it will be on shared servers where other services are running as well. |
Hi @onlyjob, I ask because we have something like 100 million chunks per server and I suspect at that scale this could have more impact than you think. |
I would also be interested in objective measurements. I would suggest pulling data from the CGI or probe under each of the following conditions:
|
Yes, if we're talking about impact of overusing cache when cache hit ratio is extremely low on large data set as yours... :) Easy enough you should be able to get some data yourself although please remember that Also I'd suggest to run at least for several hour before comparing stats. |
Dramaticaly reduce cache pressure by using POSIX_FADV_DONTNEED to advise OS to dismiss cached chunk after when it is closed. Signed-off-by: Dmitry Smirnov <onlyjob@member.fsf.org>
On 2.5.4 I verified that thousands of files in chunkserver's directories are completely or partially cached. It will be best to avoid caching of chunkserver's data due to low probability of cache hit. Therefore I recommend implementing posix_fadvise/POSIX_FADV_DONTNEED to exclude chunks from operating system cache which will improve co-existence of chunkserver with other applications. Currently chunkserver's activity displace OS cache which negatively affect performance of other services with very little hope for cache hit.
Not caching chunkserver's data will reduce cache pressure and will help to improve overall system performance by using cache more effectively.
Please note that it will affect only data (file's contents) but not the cache of directory entries etc.
Please implement posix_fadvise/POSIX_FADV_DONTNEED to prevent chunkserver's data caching.
P.S. There is an interesting related project: nocache
The text was updated successfully, but these errors were encountered: