-
Notifications
You must be signed in to change notification settings - Fork 38
groot: question about mmap use / large RSS #1019
Description
Forgive me if this is a stupid question, but what is the motivation for using mmap in groot/riofs to read files, instead of just a plain os.Open? A side effect of using mmap is that the resident set size in memory to keep growing, if the system isn't under enough pressure to drop unused pages. AFAIK this is normally a harmless accounting error, but it may become important if RSS is monitored and used for anything.
Context:
I was just running some microbenchmarks comparing ROOT I/O to uproot to groot, and I noticed higher than expected RSS on the groot side. This reminded me that in #885 there was a test commit (sbinet-hep@839e08f) where os.Open was used. Testing it out again, this reduces the RSS as I would naive expect, and it doesn't seem to cause any performance regressions, but linux's disk I/O cache makes this annoyingly hard to test properly. I thought I should just ask before I get too carried away trying to test it, in case I'm overlooking some need to use mmap here.
My theoretical concern is that high RSS may cause monitoring tools (like the ones used in batch systems at HEP computing sites) to incorrectly flag the program as being over its memory budget, and preemptively kill it, before the system is under enough pressure for the mmaped pages to be dropped. Or reschedule it to a slot with a needlessly large memory allocation, leading to a less efficient use of computing resource (in a way which may be hard to detect). Using os.Open for regular file access, and letting the OS disk I/O cache speed up subsequent reads, may avoid triggering false positives in such cases. That being said, most batch systems probably rely on network I/O, which ideally wouldn't be mmaped (if files aren't be accessed through a FUSE mount or something), so this example may be a moot point.