-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add read locking through rrd_open() #800
Conversation
loking good to me ... and the testsuit seems happy too ! ... thanks! |
have you done any performance measurements ? |
No... It would be interesting to check
Linux does not implement the drop-behind that I'm worrying about with the RRD header, so my test 3) should not show any improvement. Regarding lock contention: I don't have a big setup, I just installed cacti and smokeping on a home server. I'm not sure how to model a real-world heavily-loaded server. In the first plan I was concerned about holding locks while drawing graphs, but then I learnt the grapher uses rrd_fetch, and so the lock only covers reading the data. I notice something interesting about rrdcached. Since readers first request a flush, it seems like readers will always be blocked by writes (which I think is fine), but writes will not tend to be blocked by readers. There will be some small cpu overhead from additional syscalls (lock, and my extra madvise). On Windows contended locks will cost extra cpu, from rrdtool waking up 100 times per second to check the lock. Because I was too lazy to deal with this abomination and acquire a Windows build environment to compile-test the result. |
FIXME: _locking() on Windows says to make sure to unlock before close, because "If a process terminates with a portion of a file locked or closes a file that has outstanding locks, the locks are unlocked by the operating system. However, the time it takes for the operating system to unlock these locks depends upon available system resources. Therefore, it is recommended that your process explicitly unlock all files it has locked when it terminates. If this is not done, access to these files may be denied if the operating system has not yet unlocked them." |
@sourcejedi about the fixme ... are you going to look into this ? |
Sorry, yes. I meant I need to fix it, so it shouldn't be merged yet. |
Do you mind if I rebase this? The "Fix my Windows locking code" commits being separate would be a pain for anyone who has to bisect this. |
sure ... rebase and push --force |
I would actually sqash the whole thing ... github lets me do this when merging :) |
although at the moment CI is failing so there is another problem with the PR |
haha this part isn't even new, but I'm probably going to look at it first :). |
src/rrd_open.c
Outdated
int close_and_unlock( | ||
int fd) | ||
{ | ||
int ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed this ret needs initializing. Not to spread the blame, but I'm surprised the compiler doesn't warn about it. It warned me when I had the same problem in rrd_close().
* rados * don't leak rrd_file->rados * don't mysteriously close STDIN * !HAVE_MMAP * don't leak rrd->stat_head I have no explanation for why we were chosing to leak rrd->stat_head (and only rrd->stat_head). If it _was_ required for some reason, there was a bug anyway, as we did not leak it if we fail to allocate live_hdr. If it's required by legacy ABI, then we should remove rrd_open() from the public API (and bump the soname). It was originally scheduled for removal in v1.4.
When a rados RRD is closed, don't `munmap(0, rrd_file->file_len)` ! Note munmap() does not return an error when there were no pages mapped, so our callers wouldn't notice even if they check the result of rrd_close(). Also fix the code for munmap errors - so it doesn't format one message and then immediately replace it with a less specific one. (If both munmap() and close() fails though, we will describe the later error only). Also, there's no point rrd_close() setting rrd_file to NULL at the end, it won't affect the variable used by the caller.
Read lock or write lock is taken according to whether RRD_READONLY or RRD_READWRITE is passed. For librrd compat, the lock is only taken when explicitly requested by passing RRD_LOCK. Similarly rrd_lock() is still available, but it should not be needed by new code. So `RRD_LOCK|RRD_READONLY` is intended to take a read lock, however I must admit to limitations of this new code: * On rados, our write lock expires after 2 seconds. This write-lock is probably useful as a way to detect conflicts. However I would not want to rely on a lock that works like this to guarantee consistent reads for a backup. I did not add read locking on rados. If rados users need to e.g. obtain a consistent backup, they should create a rados snapshot. (This is able to work because we use a single atomic rados operation when when we *write* to RRDs). * On the Windows _locking() function, read locks are treated the same as write locks. https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/locking The Windows code required several changes, and has not been compiled as of this moment. With Windows _locking(), we should have been unlocking before close(). Apparently the automatic unlock on close() is not guaranteed to occur synchronously. This isn't specific to _locking(), it's a limitation of winapi LockFileEx(). https://docs.microsoft.com/cpp/c-runtime-library/reference/locking https://msdn.microsoft.com/en-us/library/windows/desktop/aa365203(v=vs.85).aspx To manually unlock, we need to know what we actually locked. But on Windows, rrd_lock() was locking the bytes starting at the current rrd_file->pos, instead of always starting from 0. Also, I thought RRD_LOCK should lock the file immediately, to guarantee we don't have conflicting writes. But on Windows we only tried to lock the bytes that already existed in the file... which would be 0 bytes for RRD_CREATE. Fortunately, even Windows recognizes this problem and allows locking bytes that don't exist yet. So we can just lock the maximum number of bytes allowed. Locking 1 byte would probably work equally well for us, but using LONG_MAX seems cleaner to me.
Yup, that was the test failure. And I noticed yet another oversight in my Windows locking. I've squashed this down, and stripped out the non-interfering fixes for later. My next task is a PR for the READAHEAD/madvise() fix. I have a performance test that clearly shows READAHEAD was broken. However, I need to look into this more. Because It also showed that my commit which targeted READAHEAD actually regressed a bit - the best performance was caused by a different commit. The times are slightly non-deterministic, but i can also see an exact difference in the number of "major" page faults. |
are you happy with this one too ? |
Yep, I'd be happy for this to merge now. |
thanks! |
The implementation is shaped slightly differently to the first plan in #799. Please disregard the first plan, you can just read the messages for the commits with
RRD_LOCK
in the title.I have not addressed the 2 already existing issues at the end of the #799 description.
This series is affected by numerous #ifdefs. I have not tested it beyond
make
on my machine. If you want to look at the preparatory commits first, that's cool (it should be trivial to rebase though, so I'm presenting this together for initial review).Highlighted excerpt:
So
RRD_LOCK|RRD_READONLY
is intended to take a read lock, however I mustadmit to limitations of this new code:
On the Windows _locking() function, read locks are treated the same as
write locks.
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/locking
On rados, our write lock expires after 2 seconds. This write-lock is
probably useful as a way to detect conflicts. However I would not want
to rely on a lock that works like this to guarantee consistent reads for
a backup. Fortunately rados implements snapshots, and when we write to
RRDs, we use a single atomic operation.