New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce and use Subject to fix race-condition when sending last leadership transition #6345
Introduce and use Subject to fix race-condition when sending last leadership transition #6345
Conversation
…dership transition JIRA Issues: MARATHON-8305
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm building your change at jenkins-marathon-pipelines-PR-6345-1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm building your change at jenkins-marathon-pipelines-PR-6345-2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔ Build of #6345 completed successfully.
See details at jenkins-marathon-pipelines-PR-6345-1.
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://s3.amazonaws.com/downloads.mesosphere.io/marathon/builds/1.7.8-cb56a9435/marathon-1.7.8-cb56a9435.tgz",
"sha1": "c060161ef1893c24720d4007582bcbf0e30bc726"
You can run system integration test changes of this PR against Marathon
master by triggering this Jenkins job with the Pull_Request_id
6345
.
The job then reports back to this PR.
\\ ٩( ᐛ )و //
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✗ Build of #6345 failed.
See the logs and test results for details.
Error message:
Stage Compile and Test failed.
(๑′°︿°๑)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand where the race was or how you fixed it. Could you please elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm building your change at jenkins-marathon-pipelines-PR-6345-3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✗ Build of #6345 failed.
See the logs and test results for details.
Error message:
Stage Compile and Test failed.
(๑′°︿°๑)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm building your change at jenkins-marathon-pipelines-PR-6345-4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔ Build of #6345 completed successfully.
See details at jenkins-marathon-pipelines-PR-6345-4.
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://s3.amazonaws.com/downloads.mesosphere.io/marathon/builds/1.7.12-add9d7120/marathon-1.7.12-add9d7120.tgz",
"sha1": "ea2353e6e161e105ed46914e33bf7d1256ef8bba"
You can run system integration test changes of this PR against Marathon
master by triggering this Jenkins job with the Pull_Request_id
6345
.
The job then reports back to this PR.
\\ ٩( ᐛ )و //
I struggled with introducing the Subject abstraction. I ultimately decided it was the last hacky way to solve this. I explored versioning the leadership transition states and then having logic to drop out-of-order events, but this did not provided guarantees that we would completely miss an update in the following scenario:
In this case, we would completely lose event 2 and the subscriber would erroneously think that event 1 is the current value. |
Even though Subject seems to work under pretty heavy testing, I'm still unsure about introducing it. I tried an alternate approach involving versioning to resolve the race without introducing another component. I'm undecided if it's better. |
After consideration, I'm inclined to stick with the Subject implementation since it separates the concurrency complexity concern better from ElectionService. Also, it is much easier to test the implementation in isolation. |
def onElement(t: T): Unit = synchronized { | ||
elementPushed = true | ||
lastElement = t | ||
subscribers.foreach(_.offer(t)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of our usage of SourceQueue, logic handling downstream stages of the graph will happen in a different thread / outside of this call context. This mitigates the possibility of deadlock here.
There are other uses of Subject in our code base for RX. With the introduction of this component we can remove RX from our code base and move completely to Akka Streams, reducing the number of concurrency abstractions we are using. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I did another pass and I am closer to approving :) Let's trigger SI tests for this and then I'll approve :)
@@ -87,14 +91,18 @@ class HttpEventStreamActor( | |||
streamHandleActors += handle -> actor | |||
} | |||
|
|||
private[stream] var isActive = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, we have this only because of testing? ... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah :/ lame, I know. Options were:
- Have the test send the messages directly (and bypass the stream). Lost coverage. Maybe okay.
- Find a way to synchronously materialize an akka stream in tests. Same-thread dispatcher?
- Find a way to pull out the current behavior context and compare the function reference to check the actor state
¯\_(ツ)_/¯ adding this was an easier option.
@timcharper yeah in general I am not very happy about dragging such a non-trivial piece of code inside the code base (as the Subject is). But I believe you revised other options and this was the best one :) |
I've reached out to the Akka development team about the process to propose the inclusion of Subject in Akka core. There isn't a lot of urgency to this PR, so we can hold the merge until we have a clear picture of where Subject will end up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this, I think the akka communication can take time especially in summer
This commit fixes a race condition that existed in ElectionService.subscribe. The old logic would:
It was possible that between 1-2, a new value was available, and that this value was published to the subscriber before the last known election state value was updated. This would result in step 2 sending a stale election state value after a newer one was published.
To solve, we introduce a Subject component, which provides the following guarantees:
JIRA Issues: MARATHON-8305