Permalink
Commits on Nov 6, 2017
  1. Next is 1.3.0.Beta2

    dmlloyd committed Nov 6, 2017
Commits on Nov 3, 2017
  1. [MSC-153] Fixing pom license.

    ropalka committed Nov 3, 2017
  2. Merge pull request #59 from dmlloyd/threadpool

    ropalka committed Nov 3, 2017
    [MSC-194] Introduce new thread pool implementation
Commits on Nov 1, 2017
  1. [MSC-153] Adding license file.

    ropalka committed Nov 1, 2017
Commits on Oct 31, 2017
  1. Merge pull request #58 from ropalka/master-fixes

    ropalka committed Oct 31, 2017
    Synchronizing MSC master with 1.2 branch
  2. Updating javadoc.

    ropalka committed Sep 21, 2017
  3. Refactoring - no functional change.

    ropalka committed Sep 21, 2017
    Changing internal interface method names and updating javadoc.
    This change doesn't affect MSC public API ABI.
  4. Optimizing transition START_INITIATING -> STARTING.

    ropalka committed Sep 21, 2017
    Controller shouldn't enter STARTING substate if shouldStart() returns false.
  5. Merge pull request #56 from ropalka/master-fixes

    ropalka committed Oct 31, 2017
    Synchronizing MSC master with 1.2 branch
  6. [MSC-193] Fixing broken ServiceContainer shutdown process.

    ropalka committed Sep 14, 2017
    There were two kind of problems with previous solution:
    [1] shutdownListener was executed in regular listeners chain
    [2] race condition between shutdown() and install() methods
    
    Both aforementioned problems sometimes caused unexpected
    premature executor shutdown while there were still
    pending tasks (either scheduled or not) to be executed.
    
    This caused many broken scenarios to happen.
    
    One example of broken scenario (assuming [1] was the problem):
    1) There is only one service A in registry in UP state.
       It has 1 registered transition listener.
    2) Thread {1} called Container.shutdown() and this method
       completed successfully. It associated shutdownListener
       with controller for A and scheduled it for removal.
       There are two listeners now in controller for A
       and A is transitioning to REMOVED state.
    3) A is not in REMOVED state yet
    4) User code executing in Thread {2} calls
       Container.awaitStability(). This method hangs
       because previous Container.shutdown() asked
       service A to stop, but A is still transitioning.
    5) A is entering REMOVED state and its RemoveTask is finished.
    6) Last step is to notify listeners that REMOVING -> REMOVED
       transition happened.
    7) shutdownListener is executed first and because it
       was registered just with one service it calls
       executor.shutdown(). Executor is terminated.
    8) Problem is there is pending last listener task not executed yet
       and because it didn't execute yet stability notification
       was not sent to service container that controller stability
       was achieved. User code in Thread {2} will hang forever.
    
    Similar example of broken scenario can be constructed
    assuming [2] was the initial problem.
    
    This fix:
     * removes terminateListener from regular controller
       listeners chain and creates separate and last controller phase
       intended just for container termination listeners.
       When controller enters REMOVED state and its stability
       is achieved (i.e. all its tasks have been executed)
       terminate listener is being called. This is the only
       appropriate time to call it.
     * container instrinsic lock is being held
       while container.install() is associating controller with
       primary and alias registrations. Only so it can be guaranteed
       shutdown() will not see stale registry data (no race condition)
  7. Refactoring - no functional change.

    ropalka committed Jun 9, 2017
    Changing transition() method signature.
  8. Refactoring - no functional change.

    ropalka committed Sep 5, 2017
    Changing ServiceControllerImpl.internalSetMode() method
    signature - eliminating unused taskList parameter.
  9. [MSC-192] Eliminating notion of 'transitive notifications'

    ropalka committed Mar 16, 2017
    that didn't work properly before (because of broken invariants)
    plus disabling majority of ServiceListener notification methods.
    This change does not affect MSC public API ABI.
    
    Concretely we are disabling support for the following methods:
     * serviceRemoveRequested()
     * serviceRemoveRequestCleared()
     * dependencyFailed()
     * dependencyFailureCleared()
     * immediateDependencyAvailable()
     * immediateDependencyUnavailable()
     * transitiveDependencyAvailable()
     * transitiveDependencyUnavailable()
    Aforementioned listener notification methods are never called since now on.
    The following ServiceListener notification methods remain functional:
     * listenerAdded()
     * transition()
    These notification methods must be still supported because they are
    heavily used in WildFly code base. There is no alternative available.
    
    Further more this change removes notion of 'transitive notifications'.
    Since now on controllers care only about their direct neighbourhood.
    This fix eliminates transitiveDependencyUn/Available notifications and
    replaces them with immediateDependencyUn/Available notifications.
    
    Reasons for eliminating support for above notification methods are:
     * immediateDependencyUn/Available and transitiveDependencyUn/Available
       notifications never worked properly in previous releases
       (it was quite common MSC ServiceControllerImpl had broken invariants)
     * they degrade MSC state machine performance (more tasks to execute)
     * ServiceListener have been deprecated (deprecation since EAP 7.0)
     * these notification methods were never used in WildFly/EAP code base
  10. [MSC-190] Controller can create new tasks as response for incoming

    ropalka committed Jun 20, 2017
    notifications if and only if it was successfully installed and was
    not removed yet.
  11. [MSC-189] Fixing controller async tasks count invariant breakage.

    ropalka committed Jun 19, 2017
    The problematic usecase was:
    1) User called ServiceContainerImpl.shutdown().
       This call iterates all services registered in
       registry and sets REMOVE mode on all of them
    2) There is also child service C and parent service P in registry
    3) C entered REMOVED state and scheduled RemoveTask
    4) P's either in START_FAILED or STOPPED state and
       P's RemoveChildrenTask was not executed yet
    5) C's RemoveTask is being executed and it calls parent.removeChild()
    6) There's asyncTasks decrement implemented there
       that should be called if and only if P's RemoveChildrenTask
       was executed (before this fix it was not true)
  12. [MSC-188] Fixing controller locked forever in START_REQUESTED state i…

    ropalka committed May 13, 2017
    …ssue.
    
    This happened when some ACTIVE service A depended on PASSIVE service P
    which was parked in WAITING state (e.g. because some dependency
    PASSIVE service depends on was not installed).
    In such case dependent service A remained in START_REQUESTED state
    (because stoppingDependencies > 0) without possibility to move
    to some other REST state.
    
    This issue was causing serious problems for stability monitoring.
    Stability monitors never reached stability because START_REQUESTED state
    is not REST state.
    
    The fix is to fire ServiceUn/Available events when entering/leaving
    WAITING state. Once these events are fired service A gets notification
    that PASSIVE service P is not available (immediateUnavailableDependencies > 0)
    and service A will transition from START_REQUESTED to PROBLEM state
    (where it should be).
  13. [MSC-187] Revisited and reimplemented ServiceRegistrationImpl locking…

    ropalka committed Jun 7, 2017
    … mechanism.
    
    This is internal change only not visible to MSC API users because
    ServiceRegistrationImpl references do not escape to user code.
    
    In order to address majority of MSC race conditions it was necessary to revisit
    locking scenarios:
     * when controllers are installed
     * when controller tasks are created
     * when controller tasks are executed
     * when controllers are removed
    Most of the races we were observing were because there was dependents snapshot
    created and cached when controller dependent task was created. And many times
    this dependents snapshot was out of date at the time dependent task was executed.
    
    To address these issues it was necessary to:
     * distinguish between read and write operations to service registry and
     * allow read lock to be acquired in one method and release it later in another one
    
    The solution addressing aforementioned problem (in this commit) consists of the following changes:
    [1] ServiceRegistrationImpl intrinsic lock is enhanced to behave like ReadWriteLock
    [2] Eliminated unnecessary usage of ServiceRegistrationImpl dependents lock.
        Since now on we're relying on ServiceRegistrationImpl intrinsic lock only
        to be held while obtaining registration dependents snapshot.
    [3] ServiceRegistrationImpl write operations (they are modifying internal states) such as:
        - setInstance()
        - clearInstance()
        - addDependent()
        - removeDependent()
        - dependentStarted()
        - dependentStopped()
        - addDemand()
        - removeDemand()
        are protected with write lock
    [4] We never acquire two different ServiceRegistrationImpl write locks at the same time
        in the same thread (this is necessary to avoid dead locks)
    [5] Revisited
        - ServiceControllerImpl$DependenciesControllerTask.execute() usage of locks
        - ServiceControllerImpl.startInstallation() usage of locks
        - ServiceControllerImpl$RemoveTask.run() usage of locks
        We are acquiring ServiceRegistrationImpl write locks there
        one after one in the loop (to prevent dead locks, see [4]).
    [6] We introduced NOOP beforeExecute() and afterExecute() template methods in ControllerTask
        - beforeExecute() is always executed before task will start its execution
        - afterExecute() is always executed after task finished its execution
        These operations remain NOOP for every DependenciesControllerTask.
        These operations are overriden for every DependentsControllerTask:
        - we are acquiring multiple ServiceRegistrationImpl read locks at the same time
          in beforeExecute() method. This will prevent modifications of ServiceControllerImpl dependents set.
          Later when these dependents will be iterated in execute() method they will be up to date
          (we're holding the read lock so there is no race)
        - we are releasing all acquired ServiceRegistrationImpl read locks at the same time
          in afterExecute() method
    [7] ServiceControllerImpl.newDependent() was revisited to take controller 'dependents task execution flags' into
        account. These flags were introduced in previous commit and they allow us to detect if:
        - dependent task was scheduled
        - dependent task finished its execution
        This was the main problem in previous implementation. ServiceControllerImpl.newDependent() method
        was unable to detect if scheduled dependent tasks were scheduled and if they finished execution
        and thus it was causing many race conditions in previous implementation.
  14. [MSC-187] Introducing 'dependent tasks execution completion detection…

    ropalka committed Jun 7, 2017
    …' mechanism. It is not active yet - will be activated in next commit.
  15. [MSC-186] Since now on listener transition notifications

    ropalka committed Feb 9, 2017
    are executed separately and last for every controller
    ongoing transition to another state.
    
    In other words we say service controller successfully
    completed its transition to another state if and only if
    all its scheduled tasks (except transition notification tasks)
    for ongoing transition have been completed.
    
    This solution is similar to how stability monitoring works.
    We say controller stability was achieved if and only if
    all scheduled tasks were completed and controller entered REST state.
    
    The purpose of this commit is to align both
    stability monitors vs. listener transition notifications approaches.
    Regardless of which one is used the outcome must be the same.
    
    Without this explicit separation the following
    broken usecase (one of many) would ocassionally happen:
    
     1) Controller is in UP state
     2) User code associated ServiceListener with controller
        (via addListener) and is waiting for transition notifications
     3) User code scheduled A for removal
     4) Controller finished the following transitions UP -> ... -> REMOVING
     5) Controller is executing final transition REMOVING -> REMOVED
     6) Controller while holding intrinsic lock schedules
        both RemoveTask & ServiceListener transition task.
        Once intrinsic lock is released both RemoveTask
        and ServiceListener transition tasks are scheduled
        and executed concurrently
     7) ServiceListener transition notification task finished its execution
        while RemoveTask didn't start its execution yet.
     8) User received notification from ServiceListener
        that controller have been removed and so it tries
        to install new one. But installation of new controller
        will fail because removal of previous controller
        didn't happen yet (there is still pending RemoveTask)
  16. [MSC-184][MSC-185] Implementing 'dependents started counting' mechani…

    ropalka committed Mar 24, 2017
    …sm in ServiceRegistrationImpl.
    
    The result of this computation must be propagated to ServiceControllerImpl
    once it is associated with ServiceRegistrationImpl instance field.
    
    This problem was related to optional dependencies.
    Without this counting mechanism in place the following
    broken usecase may ocassionally happen:
    
     1) DEPENDENCY service is not installed
     2) DEPENDENT service is being installed with optional
        dependency on missing DEPENDENCY service
     3) DEPENDENT service moves from DOWN to START_REQUESTED state
        (it can start because missing optional DEPENDENCY service is ignored)
     4) DEPENDENT service moves from START_REQUESTED to START_INITIATING state
        and fires DependentStartedTask
     5) DEPENDENT's DependentStartedTask calls OptionalDependency.dependentStarted()
        which in turn calls DEPENDENCY ServiceRegistrationImpl.dependentStarted().
        Now because DEPENDENCY's ServiceRegistrationImpl 'instance' field is null
        and because we didn't implement 'dependent started counting' mechanism
        this notification is lost
     6) DEPENDENT ends up in UP state
     7) DEPENDENCY service is being installed and enters DOWN state
        (it was associated with DEPENDENCY's ServiceRegistrationImpl).
        DEPENDENCY's DependencyAvailableTask is fired
     8) DEPENDENCY's DependencyAvailableTask calls DEPENDENT's optional dependency
        immediateDependencyAvailable() method which is translated into
        DEPENDENT.immediateDependencyDown() method call
     9) DEPENDENT detects it should stop (stoppingDependencies > 0)
    10) DEPENDENT transitions from UP to DOWN state and fires DependentStopped task
    11) DEPENDENT's DependentStopped task calls optional dependency dependentStopped() method
        which in turn calls DEPENDENCY's ServiceRegistrationImpl.dependentStopped() method
        which in turn calls DEPENDENCY's instance.dependentStopped() method.
        This last call results into runningDependents invariant violation because
        it is decremented to negative value and invariant says:
        'ServiceControllerImpl runningDependents is always non negative number'
  17. [MSC-184] Completely fixing buggy optional dependency implementation.

    ropalka committed Apr 26, 2017
    Since now on optional dependencies are guaranteed to work properly.
    If there is service A installed with N optional dependencies and these
    optional dependencies are installed (it doesn't matter on the order
    of installation process) then in worst case scenario such service A
    will be restarted N times to see all available optional dependencies.
    But it is guaranteed these optional dependencies will be visible to A.
    
    Before this fix there were scenarios possible where
    DEPENDENT service in UP state depending on optional dependency
    DEPENDENCY did not see installed DEPENDENCY service referenced
    by that optional dependency (if DEPENDENT service was caching
    references to its dependencies during start phase).
    
    In other words it ocassionally happened that DEPENDENT's stop phase
    didn't execute even if missing DEPENDENCY service
    (referenced via optional dependency) had been installed.
    
    Here's concrete example describing one issue with
    previous optional dependencies implementation:
    
     1) DEPENDENT service that depends on missing DEPENDENCY service
        (via optional dependency mechanism) is being installed
     2) DEPENDENT service ends up in UP state
        (it is OK optional dependency is missing)
     3) DEPENDENCY service is being installed and enters DOWN state
        (firing DependencyAvailable task)
     4) Optional dependency receives 'immediateServiceAvailable' notification
        from DEPENDENCY's DependencyAvailable task and translates it to
        'immediateDependencyDown' notification for DEPENDENT service
     5) DEPENDENCY's DependencyAvailable task finished its execution
     6) DEPENDENT realizes it has to stop (stoppingDependencies > 0)
        so it enters STOP_REQUESTED state
     7) DEPENDENCY service transitions from DOWN to UP state and fires
        DependencyStarted task. This task notifies optional dependency
        with 'immediateDependencyUp' event. This event is propagated
        to DEPENDENT service (no event transformation)
     8) Now when DEPENDENT's last task is executed (in STOP_REQUESTED state)
        it detects that all its dependencies are satisfied again
        (stoppingDependencies == 0).
     9) DEPENDENT service will transition back from STOP_REQUESTED to UP state.
    10) Since now on if DEPENDENT service implementation is using cached
        optional dependency value it will not see DEPENDENCY service that
        has just been succesfully installed.
  18. Removing tests on OPTIONAL dependencies.

    ropalka committed Jun 9, 2017
    These tests have been testing old optional dependency implementation details.
    In next commit we will introduce new optional dependency implementation.
  19. Refactoring - no functional change. Enforcing FAIL FAST best practices.

    ropalka committed Jun 3, 2017
    ServiceControllerImpl.startInstallation() can fail if and only if
    there is already some other controller associated either with
    primary registration or with some alias registration.
    In such problematic case DuplicateServiceException is being thrown.
    This source code reordering in startInstallation() method ensures this critical
    and problematic part of controller installation process is executed first.
  20. [MSC-179] partial revert of commit id a3d68c0.

    ropalka committed Jun 21, 2017
    Service controller install -> cancel usecase must be synchronous
    (if installation fails). Only this way installing thread
    is notified properly when all registry associations have been removed.
  21. Refactoring - no functional change. All ServiceControllerImpl inner c…

    ropalka committed Jun 6, 2017
    …lasses must be final.