-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reworked allocation APIs, STW machinery, and replaced semispace with Immix #3616
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was a well-intentioned idea but not practical or useful. The idea was to have the compiler help check where in call paths a garbage-collection cycle could run. Unfortnately, adding this in as an after-thought resulted in all the places where GCTokens are created from thin air deep in some call path. It didn't change the fact that GC could happen pretty much anywhere. In a managed runtime, either GC can happen everywhere or it should only happen at a very small number of extremely well-defined points. The middle ground of "it can happen at all these places" is an invitation for a low budget horror movie, dismembered objects strewn throughout the code. Along with the rework of the stop-the-world mechanism, the removal of GCToken and restricting the invocation of a GC cycle to a single well-defined method call in a few well-defined locations, and finally, making all allocation paths GC-safe (ie GC will NOT run when allocating an object), Rubinius will have much better defined GC behavior. The GC safe allocation path is important for cases like the string_dup instruction, where a young GC cycle could run when allocating the dup and the original String (eg a literal String in a method) is in the young generation and moved. Since the original String is on the C stack and not in a GC root object, the dup fails when copying the contents of the original String. It's better to make allocation GC-safe than to accept the performance cost of the GC root in these sorts of cases. Also, that case is only one well-defined instance of the issue. There are more complicated ones.
These changes introduce a couple things: 1. All allocation paths are GC-safe. What that means is that when requesting a new object be created, the request will be fulfilled *unless* the system (or process limits prevent it) *without* GC running. In other words, there are two possible results of allocating an object: 1) a new object, or 2) an exception because no more memory is available to the process. In either case, from the point the object is requested until that request returns (or the return is bypassed by the exception unwind), the GC will not run. There is a trade-off here between running the GC at the instant that some threshold is breached (eg the eden space is exhausted) and loosening some requirements that must be maintained for a generational, moving garbage collector (ie every object reference must be known to the GC at the time the GC runs). Since we run GC on method entry and loop back branches, there is no reasonable scenario in which deferring GC until allocation has completed will result in unwanted object graph thresholds being breached pathologically (eg an execution path where allocation can grow unbounded). 2. All objects are allocated from the various heaps *uninitialized* and a protocol is established to call an initialization routine on the objects. The initialization routine is `T::initialize(State* state, T* obj)`, where T is the type of object being allocated. The method is a static method of the class of the object. This breaks with the protocol that Ruby uses where `new` is a module method and `initialize` is an instance method. The primary reason for choosing a static (ie C++ class) is to avoid an instance method operating on an incompletely initialized object. One purpose of this initialization protocol is to eliminate or reduce the double initialization that we were doing (ie setting all fields to nil and then initializing them to other default values). The main initialization method shown above may be an empty body, in which case the compiler will elide it anyway and there's no overhead to the protocol. In that case, another initialization method should be called on the newly created object. Since the allocation method is templated and if the initialization method is visible (ie in the header file), the compiler should be able to elide remaining double initialization in most contexts.
In the case of `Thread.new`, the OS thread will never run because a ThreadError exception is raised when no block is passed. If we track the VM object that would ultimately contain the reference to the OS thread, we either need a way to remove the VM object when eg `Thread.new` raises an exception or we will leak these objects. Instead of tracking and then untracking the VM object, we create the object untracked and track it if the OS thread starts executing.
Since a SpinLock is a simple integer on which CAS operations are performed, there is no way to go afoul of 'ownership' during fork(). This appears to solve a spordic issue where the child was not able to reset the fork_exec_lock_ inherited from the parent process.
Before returning from Thread.new (or any similar methods), we ensure that all the state for tracking Thread instances and any related state are completely initialized. This prevents a case where the process calls fork() immediately after creating a Thread and the Thread's state is only partially initialized before the fork() call completes.
This is a mess. See bonzini/qemu@0f087e8
This reverts commit 17f6450.
Basically, almost everything that is in util/ should not be. These major components need to be well-integrated with the rest of Rubinius. As in the previous case of Immix and the present case of the logger here, that means things like taking STATE, VM, etc as parameters and interfacing with things like process locks and process phases around fork/exec, etc.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.