-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Locking problems on MACOS Ventura 13.1 (22C65) #2599
Comments
Encountered this same locking issue when used jTraverser to open a tree. This was when using an experimental build of MDSplus for MacOS Ventura 13.4.1 on Apple Silicon. Note that Josh's bug was for MacOS Ventura on Intel. |
Another easy way to trigger the bug is to add a node. Following output was generated by an experimental build of MDSplus for MacOS Ventura 13.4.1 on Apple Silicon.
|
The "Open File Descriptor" (OFD) locks are failing on MacOS Ventura. Using the LLDB debugger, traced the problem to The problem is on the
This investigation triggered the bug by adding a node to a tree. Experiment done with MacOS Ventura running on Apple Silicon, using an experimental build of MDSplus. |
Disabling the OFD locks (and instead using "process" locks) allows both Although "process" locks are OK for a single program that is multi-threaded, the "open file descriptor" locks are better if have multiple programs simultaneously accessing the same tree. This difference in the locking merits more investigation and discussion before a decision is made regarding a fix.
The above examples were with an experimental build of MDSplus for MacOS Ventura running on Apple Silicon. (Also need to run these examples on MacOS Ventura for Intel.) |
I would recommend putting in this fix with a #ifdef for apple while we investigate |
What is that file descriptor pointing to? The actual tree file, or some temporary lock file? |
Because this bug is blocking issue #2597, it inherits the "U.S. Priority" label from that issue. |
The LLDB debugger confirms that the file descriptors are pointing to the tree file, and not to a temporary lock file. |
Using "process" locks instead of "open file descriptor" locks also eliminates the error message associated with the However, now the |
Using "process" locks instead of "OFD" locks works on Intel MacOS Ventura 13.5. There are no error messages about locking, nor segfaults. For this experiment, used an experimental "CMake" build of MDSplus that was done on an Intel Mac. Additional testing with Aarch64 (aka Arm64) Ubuntu22 also didn't trigger a segfault with the |
The previous post is not quite correct. The If "OFD" locks are used on Intel MacOS Ventura, then three tests fail: As for Apple Silicon MacOS Ventura, there are ~20 tests that segfault. (Perhaps a compiler flag is missing.). Conjecture is that when the segfaults are fixed, that the "process vs. OFD" locking issue triggered by |
Correction to my post above (the one with the jTraverser screenshot) -- which got the facts backwards. The OFD locks are needed for multi-threaded programs; the standard "record" locks (bound to a process) are for synchronizing file access by cooperating processes. |
The OFD locks do not appear in the On MacOS Ventura, the file is found at this location:
And although the PureDarwin project added OFD locks a few years ago, it is unclear if those changes are in MacOS Ventura. (Both MacOS and PureDarwin are derived from Apple's open source Darwin project, but the forks have likely diverged.) |
The OFD locks fail on both the APFS and HFS+ file systems. (HFS+ is also known as "Mac OS Extended".) |
The source code for ventura kernel is here: https://github.com/apple-oss-distributions/xnu. The kernel code for OFD locks are there, in the rel/xnu-8792 branch... which is the kernel for Ventura. What I'm not understanding is how we even got OFD locks to compile? (see below!) If they aren't defined in fcntl.h. In the kernel they are defined in xnu/bsd/sys/fcntl.h:
These definitions seem to be added for xnu-3247... (and did not exist in kernel before that time.) Internally this is implemented using the vfs calls.. where this is implemented. So.. looks like a private API.. which probably matches your non-finding of it in the library. However it currently compiles... there is some code that puts THIS in:
So.. that was basically undocumented behavior at that point. Who knows what it is supposed to do? I assume it did a no-op. I wonder what would happen if you add:
This is definitely evil, but... |
Thanks for digging into the MacOS Ventura kernel. I will conduct that "evil" test later today. |
Hacked up the Thus if the undocumented OFD locks in MacOS Ventura don't work, the fallback option for MacOS users of MDSplus will be to only use single-threaded programs for I/O to trees. The test was performed with an experimental build of MDSplus using CMake on Intel MacOS Ventura. |
Looking back at the history of this code, seems to me that this change maybe should just be reversed. This is the merge request that effectively was evil, basically assuming that if it isn't windows, its linux. The threaded behavior was there before, so you can see where we could put it back. Oh.. and obviously, if we want MacOS as a first class citizen, we need to run unittests on it as well. |
I found another project on Github that uses my "evil" solution. Who knows. Also.. here's a blog post that says that the apple shipped version of sqlite3, which is custom and built into MacOS.. also uses this private api. |
Issue #2163 changed many files, but has been in place for two or three years. My hunch (perhaps incorrect) is that it would be safer to keep #2163 and instead just make the minor changes needed to get MDSplus to work with MacOS Ventura. Issue #2198 is a related change that is incompatible with the existing multi-threaded
|
Using the undocumented OFD values for MacOS works!!! (Thanks to Darren for suggesting the "evil" experiment.) An experimental build with OFD locks on Intel MacOS Ventura was able to run We should probably use the "evil" solution to fix this MacOS Ventura locking issue. Nonetheless, before using an undocumented feature of the kernel, would be good for the team to at least briefly discuss the pros / cons of doing so. |
Given that Apple is using it, and it's been built in for 5 past MacOS versions, I think its pretty safe. Watching MacOS evolve over the past 20 years, I think this is much more likely to move from private API to public, then to be removed. It adds compatibility with other codes, is much better than the broken SysV and POSIX semantics, and I can't see a security issue with it. Of course, we should document the code and keep the test software so the fault shows up quickly with never OS versions. |
Apple introduced the OFD locks in the kernel, So I concur that it is reasonable to use the undocumented OFD locks in MDSplus. |
Apple uses open source software in MacOS, thus publishes the internals on GitHub. For more information about the OFD locks, refer to the following links. https://opensource.apple.com/releases/
https://github.com/apple-oss-distributions/xnu/blob/rel/xnu-3247/bsd/sys/fcntl.h
|
Affiliation
MIT Plasma Science and Fusion Center
Version(s) Affected
MDSplus version: 7.139.28
Release: alpha_release_7.139.28
Platform
MACOS
At least on Ventura 13.1 (22C65)
Describe the bug
When trying to read data from a local tree a lock error is detected that is reported
as:
To Reproduce
Steps to reproduce the behavior:
. /usr/local/mdsplus/setup.sh
(we should have better directions for installing / using)
export default_tree_path=/users/jas/trees
Expected behavior
contents of the node displayed
Additional context
I have not tested this on any other version of MACOS
The text was updated successfully, but these errors were encountered: