New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lock the datafile at the start, only release at the end #94
Conversation
(Hmm, looks like that went out as “requested changes”… I meant it as “r+ from me once the below is handled”, sorry!) |
まったく. We should really return to sqlite for the config file. |
Do you think it'd be a good idea to explicitly have a lockfile that we lock? Then all of the file writing stuff won't have to even care about locking since we can just assume we're the only one running - I think it'll be a lot clearer. |
That feels right to me! |
Great, I'll implement that this weekend. |
Since literally no-one uses SunOS outside of toy usages (I am one of those), we should not care about it, it just adds noise. I actually do have several SPARC machines running Solaris so I am fully qualified to say that anyone who does this and expects support is insane.
I think this is a pretty simple solution - we use a lock file to ensure only one instance runs per user. If you run a lot of instances in parallel, you see (in the debug logs) that the next instance only acquires the lock after the previous instance finishes. I also removed all of the "lock" options, since we always lock. So this solution is simpler, better, and doesn't break anything. |
Can we have a test that tests this somehow? |
Yeah, I'll write one. I think maybe the best way to test it would be to run some processes in parallel, make sure each "lock acquired" log message only appears after another process has written the data file, and also make sure all processes terminate successfully (before this change, some fail due to files not existing). |
Previously, we were depending on the chdir done in an earlier test.
While writing the test for this, I eliminated the need to specify PYTHONPATH (the test script adds .. to sys.path itself) and PATH (the test script always uses the r2e script located in the directory above the directory with the test script). I was trying to diagnose and fix the test failures due to this error:
But this file clearly should exist, right? I don't understand what's happening, how is Travis running the tests such that |
what’s the status of this? Can we rebase on master, I think we changed the logic to how the tests add PYTHONPATH |
I'll try to get these tests working this weekend, but I have no idea what's wrong, surely the file |
There, I fixed the tests! Things to note:
I'm pretty sure this is good to be merged now. One comment (that doesn't block this PR from being merged): why do we run the CI unit tests in Nix? This means that if there is a Nix related break in the tests, potential contributors will have to learn something extra to debug it, causing frustration (e.g. in #97). I think we should just use the Travis way of running test for multiple Python versions and keep Nix for local development. That way contributors would just have to know what pip and setup.py do, which they hopefully already do. |
I want to merge this soon, the test improvements in particular. Any objections? @Ekleog @Profpatsch? |
Looks good to me! There's just the removal of |
Thanks! test_send was split up, not removed exactly, but it's not obvious from the diff - all the same tests are run. I'll merge this later, after making sure the merge will work correctly. |
This PR corrects how the locking works, which caused issues in e.g. #83. The problem is outlined in a comment I wrote there but basically, the desired behaviour is that if there are multiple instances of r2e about, they should take it in turns to run and avoid all running at the same time, possibly mangling the data file or the config file.
I implement this by acquiring an exclusive lock on the data file right at the start and releasing it right at the end. As far as I can tell, this eliminates the error @ezbik was experiencing, but it's always possible I broke something else subtle.
Also, there is a branch that only happens if we're not on Solaris...I certainly hope no-one is.