Use InMemoryPersistenceStorage for persisting pod records #80

takirala · 2019-03-29T01:03:09Z

Add a persistence storage interface and create a flow from it to use in the Scheduler graph. This PR adds the following changes

Create a generic RecordRepository trait. Currently, only PodRecordRepository is implemented.
Handle the initial snapshot loading from the PodRecordRepository.
Create a scheduler graph component that uses the PodRecordRepository to persist the SchedulerEvents. This component itself performs materialization for every SchedulerEvents element.

Testing

Add a RepositoryBehavior that can be used to verify all the implementors of RecordRepository. Currently used only for InMemory storage but can/will be extended to other implementations (such as zk).
Add the akka-stream-testkit library dependency to test-utils module so that it can be used in other modules.
Add a unit test for the persistence flow component.
Add an integration test to verify the atmost once launch guarantee.

@jeschkies made the initial PR #51 that laid some groundwork for this.

JIRA : DCOS-49274

jeschkies · 2019-03-29T12:52:40Z

core/src/main/scala/com/mesosphere/usi/core/Scheduler.scala

+  private def persistenceFlow(
+      podRecordRepository: PodRecordRepository): Flow[SchedulerEvents, SchedulerEvents, NotUsed] =
+    Flow[SchedulerEvents].mapAsync(1) { events =>
+      import CallerThreadExecutionContext.context


I don't quite understand the context. Why was that introduced?

This was introduced to perform cheap operations such as .map(_ => events) without switching the context. If we don't do future transformations in scope, we don't need this.

It had to be called something, can't make the object the execution context.

jeschkies · 2019-03-29T12:53:32Z

core/src/main/scala/com/mesosphere/usi/core/Scheduler.scala

@@ -130,4 +147,25 @@ object Scheduler {
      }
    }
  }
+
+  private def persistenceFlow(


I would have assumed that the repository would maybe provide this interface.

I don't understand how the repository can provide this interface. core module should control how the persistence storage is read/written to right ?

jeschkies · 2019-03-29T12:55:00Z

core/src/main/scala/com/mesosphere/usi/core/Scheduler.scala

+    Flow[SchedulerEvents].mapAsync(1) { events =>
+      import CallerThreadExecutionContext.context
+      Future
+        .sequence(events.stateEvents.map {


Why not use a stream here as well?

In the current flow of things:

[SchedulerEvents] ------> [Persistence] ----------------> [StateOutput, MesosCalls] emitted by process ALL the StateEvents Scheduler logic in single SchedulerEvents object and THEN emits the object downstream. backpressures the upstream until ALL the StateEvents in SchedulerEvents are processed. We can use a stream here but i am not sure what advantage it would add. We are using `.mapAsync` with parallelism 1 to process each SchedulerEvents object - one at a time with backpressure.

Currently the persistence layer emits the exact SchedulerEvents object that it receives, are you perhaps inferring that we don't need to do this and emit a "split-down" SchedulerEvents based on the stream processing speed ?

In some place in Marathon we used Akka streams instead of Future.sequence to avoid too many futures. Also, there is not backpressure for loading.

So, I think @zen-dog can help. I thought we had something like a Flow[StoreOp, Try] So that one could map the updates.

val persitenceFlow: Flow[SchedulerEvents, SchedulerEvents] = { Flow[SchedulerEvents] .flatMapConcat(_.stateEvents) .map { case ... => UpdateOp | DeleteOp } .via(repository.flow) ... }

This might be a little to complicated though it also omits then mapping back to SchedulerEvents.

jeschkies · 2019-03-29T12:59:45Z

core/src/main/scala/com/mesosphere/usi/core/SchedulerLogicGraph.scala

      }

+      override def preStart(): Unit =
+        initialPodRecords.onComplete(startGraph.invoke)(CallerThreadExecutionContext.context)


I is kind of odd. For user we ask them to deal with Source[(Snapshot, Source[StateEvent])] but internally we are not using that. Why are we not passing the state snapshot via a stream? Are we not loosing backpressure here? Especially recovering state was slowing down Marathon in the past so I would assume that we leverage backpressure just here so that the USI scheduler flow is not overloaded.

At least in the scope of this PR, I am not trying to handle adhoc-snapshots. Assumption in this PR is that we receive the snapshot only once during bootstrap (or crash recovery) and we intentionally block all inlets until we can load the snapshot and be ready for the processing. I initially had an approach using FanIn3Shape but decided this is simpler and cleaner to get the initial snapshot loading.

I think @timcharper has to explain the interface then once more to me. We do not expect adhoc snapshots but it is odd that our internal storage uses plain futures when we prescribe the nested sources to users. Why would we not use streams here as well?

In the future, we will not be sending a podSpec snapshot. This is going away.

Cool. Could we add a longer comment above?

Not sure if/how I can pass an extra param to override def createLogic. One thought I had was to set an attribute on graph component but it feels like an overkill. Do you have any other thoughts ?

otoh, i don't understand why loading the pod records should not be part of logic - we are creating a graph component which should not start pulling elements from upstream until a certain precondition is satisfied. In our case, the precondition is that the given future completes with an expected (Success) result. Since this future completion is non deterministic (it can even fail), i think it is fair to register the call back in prestart method. Ultimately, this callback decides whether (if at all) or not to start pulling elements from upstream.

I missed that createLogic is predefined. However, initialPodRecords could simply be the loaded record set. Why should it be a future?

By making pod records a future - we have the option to chose to either succeed loading the snapshot and then start the graph or mark graph stage as failed to start. If it's not a future, then we have to handle the failed future before creating this graph component (and it brings the question - if we fail to make the persistence call, should we fail to create the blueprint of the graph ?) cc @meln1k

Also, note that the SchedulerLogicHanlder has a Map instead of a Future[Map] - https://github.com/mesosphere/usi/pull/80/files#diff-704496199082f7d74e9d62d315f0a7abR66 and it does make sense to me because if we have a "handler", then it can be expected to have a valid snapshot. But in case of SchedulerLogicGraph where we are "building" the graph, we can chose to verify a precondition (in this case - successful completion of the given Future) before we chose to start the graph.

I have made the initialPodRecords to be a anon function to delay the pod record loading. If you still think that having a Future in graph pre condition is not ideal, I can reformat the code to complete the future in Scheduler and fail fast before a graph blue print is generated. wdyt ?

I'm just wondering if loading the snapshot in the scheduler factory would be simpler. We can always handle errors there. Ideally the factory method would fail if the snapshot loading would fail. The caller would not even have a failed graph stage. How should the user handle a failed graph stage anyways?

jeschkies · 2019-03-29T13:03:28Z

persistence/src/main/scala/com/mesosphere/usi/repository/RecordRepository.scala

    */
-  def update(record: Record): Future[RecordId]
+  def readAll(): Future[Map[RecordId, Record]]


Should we provide a stream instead to leverage backpressure?

Current structure assumes we are intentionally blocking inlets until the state can be loaded. Based on the outcome of that structure, we can chose to change this to be a stream instead.

Where are we blocking? I must be missing something?

persistence/src/main/scala/com/mesosphere/usi/repository/RecordRepository.scala

jeschkies · 2019-04-08T11:12:01Z

core/src/main/scala/com/mesosphere/usi/core/Scheduler.scala

+                  } else {
+                    // TODO : batch transactions
+                    Source(storeRecords)
+                      .map(_.newRecord.get)


Is newRecord an Option? If so I'd use collect:

Suggested change

.map(_.newRecord.get)

.collect { case Some(newRecord) => newRecord }

core/src/main/scala/com/mesosphere/usi/core/CallerThreadExecutionContext.scala

meln1k · 2019-04-08T14:16:11Z

core/src/main/scala/com/mesosphere/usi/core/Scheduler.scala

+                      .map(_.newRecord.get)
+                      .via(podRecordRepository.storeFlow)
+                      .runWith(Sink.foreach(_ => {
+                        if (eventCounter.decrementAndGet() == 0) maybePushToPull()


This is not thread-safe. See https://doc.akka.io/docs/akka/current/stream/stream-customize.html#thread-safety-of-custom-operators.

zen-dog · 2019-04-08T14:45:54Z

core/src/main/scala/com/mesosphere/usi/core/Scheduler.scala

+                      .map(_.newRecord.get)
+                      .via(podRecordRepository.storeFlow)
+                      .runWith(Sink.foreach(_ => {
+                        if (eventCounter.decrementAndGet() == 0) maybePushToPull()


maybePushToPull happens on another thread so you should not call it like this

zen-dog · 2019-04-11T17:52:43Z

core/src/main/scala/com/mesosphere/usi/core/Scheduler.scala

+    Flow[SchedulerEvents].mapAsync(1) { events =>
+      import CallerThreadExecutionContext.context
+      Future
+        .sequence(events.stateEvents.map {


I would go for something like:

mapAsync(1) { events => val (updated, deleted) = events.stateEvents.partition(....) val updatedF: Future[Done] = Source(updated) .via(podRepository.storeFlow) .runWith(Sink.ignore) val deletedF: Future[Done] = Source(deleted) .via(podRepository.deleteFlow) .runWith(Sink.ignore) updated.andThen(deleted) }

It has the downside of 2x materialization but that might be ok for now.

takirala · 2019-04-12T06:38:09Z

core/src/test/resources/application.conf

+        # environment!
+        # To get the best results, try combining this setting with a throughput
+        # of 1 on the corresponding dispatchers.
+        fuzzing-mode = on


I found this useful when I was playing around with a custom GraphStage approach I had at some point and I think retaining this would be useful and makes testing more aggressive, i can revert this if there is sth i missed.

timcharper · 2019-04-16T01:30:31Z

core/src/test/scala/com/mesosphere/usi/core/SchedulerTest.scala

+
+  "Persistence flow pipelines writes" in {
+    Given("a slow persistence storage and a flow using it")
+    val delayPerElement = 50.millis // Delay precision is 10ms


Let's please avoid using fixed delays in tests.

...ore-hello-world/src/test/scala/com/mesosphere/usi/examples/CoreHelloWorldFrameworkTest.scala

timcharper · 2019-04-16T01:41:48Z

core/src/test/scala/com/mesosphere/usi/core/SchedulerTest.scala

@@ -101,4 +105,55 @@ class SchedulerTest extends AkkaUnitTest with Inside {
    output.cancel()
    mesosCompleted.futureValue shouldBe Done
  }
+
+  "Persistence flow pipelines writes" in {


After reading this test several times, I'm inclined to submit that we should just delete this test. I don't think that it's testing anything meaningful; it appears to be testing Akka itself, which already has tremendous test coverage.

Perhaps a test that would be more meaningful would be a unit test for the persistence flow itself which asserts that delete and create operations are applied in order, and that the order of elements being processed is preserved (perhaps some fuzzing can be applied by causing the storage operation Future to complete in a non-linear order)

I agree that this test does not create much value. It was useful when i had a custom graph stage logic but since we got rid of that, we don't need this exact test anymore. I have rewritten the test using Random.nextInt(100) to apply "fuzzing", lmk if that is not ideal.

core/src/main/scala/com/mesosphere/usi/core/Scheduler.scala

jeschkies · 2019-04-23T08:53:46Z

core/src/main/scala/com/mesosphere/usi/core/SchedulerLogicGraph.scala

      }

+      override def preStart(): Unit =
+        initialPodRecords.onComplete(startGraph.invoke)(CallerThreadExecutionContext.context)


Ah, because this loads, the pod records, right? I still don't quite understand why it is a callback. What do you think of passing initialPodRecods to createLogic? This would simplify things a bit and the loading of the pod records should not be part of the logic anyways IMHO.

core/src/main/scala/com/mesosphere/usi/core/Scheduler.scala

core/src/main/scala/com/mesosphere/usi/core/SchedulerLogicGraph.scala

jeschkies · 2019-04-26T09:00:55Z

core/src/main/scala/com/mesosphere/usi/core/Scheduler.scala

+  private[core] def unconnectedGraph(
+      mesosCallFactory: MesosCalls,
+      podRecordRepository: PodRecordRepository): BidiFlow[SpecInput, StateOutput, MesosEvent, MesosCall, NotUsed] = {
+    val schedulerLogicGraph = new SchedulerLogicGraph(mesosCallFactory, loadPodRecords(podRecordRepository))


This is getting closer to it but I don't understand why I cannot be a async. The two main factory methods asFlow and asSinkAndSource return futures. So you could simply call

def asFlow(specsSnapshot: SpecsSnapshot, client: MesosClient, , podRecordRepository: PodRecordRepository)( implicit materializer: Materializer): Future[(StateSnapshot, Flow[SpecUpdated, StateEvent, NotUsed])] = async { implicit val ec = scala.concurrent.ExecutionContext.Implicits.global //only for ultra-fast non-blocking onComplete val initialRecords = await(loadRecords(repository)) val (snap, source, sink) = asSourceAndSink(specsSnapshot, client, initialRecords) (await(snap), Flow.fromSinkAndSourceCoupled(sink, source)) }

Discusses this offline, In an ideal world we should be using async-await construct to do this in a non-blocking fashion. But factory methods fromFlow and fromClient requires this to be done in a blocking fashion. I guess we can find a more cleaner solution once we finalize on a limited number of factory methods.

Based on the isolated fact that we can't make any progress until we load the records I guess this is okay for now.

fromClient and fromFlow could just be non-blocking. The signature of these methods is not set in stone.

Since I'll be out and do not want to block this PR.

takirala · 2019-04-29T18:06:38Z

@zen-dog @meln1k can you please re-review this :)

zen-dog

I left a few nits about naming and similar. My biggest question mark is the location of the InMemoryStore* related files. See my comment. Once clarified, this PR is good with me!

...es/core-hello-world/src/main/scala/com/mesosphere/usi/examples/CoreHelloWorldFramework.scala

...ore-hello-world/src/test/scala/com/mesosphere/usi/examples/CoreHelloWorldFrameworkTest.scala

zen-dog · 2019-05-02T13:40:00Z

persistence-zookeeper/src/test/scala/com/mesosphere/usi/repository/InMemoryRepositoryTest.scala

+import com.mesosphere.utils.UnitTest
+import com.typesafe.scalalogging.StrictLogging
+
+class InMemoryRepositoryTest extends UnitTest with RepositoryBehavior with StrictLogging {


I'm confused by the location of this file (and other in-memory-repository related files). Why are those in the persistence-zookeeper sub-module? Why is InMemoryPodRecordRepository.scala in the persistence sub-module? Ideally:

usi | ... - persistence # traits and interfaces, no implementation - persistence-zookeeper # zookeeper implementation - test-utils # In-memory implementation here (see DummyMetrics for example)? It is not supposed to be used in production so this seems appropriate

I guess it was done so that we don't need to have

implementation project(':test-utils')

and can just do

implementation project(':persistence') testCompile project(':test-utils')

in order to instantiate the persistence layer. I have moved InMemoryPodRecordRepository to test-utils and added a note to build.gradle file to replace the implementation project(':test-utils') with testCompile project(':test-utils') when we have a persistence-zookeeper implementation.

The only thing I am unsure about is the location of InMemoryRepositoryTest which is currently in persistence-zookeeper. I can not move this to test-utils as it would cause a cyclic dependency. I guess having this here is fine as its intention is to verify the RepositoryBehavior which would be used to verify a zk persistence layer in future. wdyt ?

jeschkies

Aside from the blocking call in the Scheduler factory methods I'm fine with this patch 🎉

jeschkies · 2019-05-07T11:55:47Z

core/src/main/scala/com/mesosphere/usi/core/Scheduler.scala

+  private[core] def unconnectedGraph(
+      mesosCallFactory: MesosCalls,
+      podRecordRepository: PodRecordRepository): BidiFlow[SpecInput, StateOutput, MesosEvent, MesosCall, NotUsed] = {
+    val schedulerLogicGraph = new SchedulerLogicGraph(mesosCallFactory, loadPodRecords(podRecordRepository))


fromClient and fromFlow could just be non-blocking. The signature of these methods is not set in stone.

zen-dog

🎉

yolo

takirala · 2019-05-08T09:30:38Z

Thanks for the reviews @zen-dog @jeschkies @meln1k and @timcharper :)

takirala self-assigned this Mar 29, 2019

takirala added 4 commits March 28, 2019 18:18

Update core module to use persistence module and use PodRecordRepository

19681f1

Update RecordRepository interface

eb28b79

Add an integration test

0e5c5c6

Revert unintentional changes

eccd7ff

takirala force-pushed the tga/use-inmemory-persistence branch from cdccb11 to eccd7ff Compare March 29, 2019 01:18

jeschkies previously requested changes Mar 29, 2019

View reviewed changes

takirala added 2 commits March 29, 2019 17:09

Emit StateSnapshot on crash-recovery

19cbb44

Draft future-less persistence storage flow

9ac896d

jeschkies reviewed Apr 8, 2019

View reviewed changes

zen-dog reviewed Apr 8, 2019

View reviewed changes

core/src/main/scala/com/mesosphere/usi/core/CallerThreadExecutionContext.scala Show resolved Hide resolved

meln1k previously requested changes Apr 8, 2019

View reviewed changes

zen-dog reviewed Apr 8, 2019

View reviewed changes

Use async call back for thread safe access

92c1c08

jeschkies self-requested a review April 9, 2019 16:17

Add a unit test case to validate persistence graph stage

f71b932

zen-dog reviewed Apr 11, 2019

View reviewed changes

takirala added 3 commits April 11, 2019 14:38

Make all methods in RecordRepository to be stream based

3772c77

Cleanup tests

d895c68

Cleanup code

3fd9cb3

takirala commented Apr 12, 2019

View reviewed changes

takirala marked this pull request as ready for review April 12, 2019 06:40

takirala requested review from zen-dog and meln1k April 12, 2019 06:40

takirala added 2 commits April 12, 2019 13:26

Merge master and resolve conflicts

5baccdd

Address review comments

21810fc

timcharper reviewed Apr 16, 2019

View reviewed changes

...ore-hello-world/src/test/scala/com/mesosphere/usi/examples/CoreHelloWorldFrameworkTest.scala Outdated Show resolved Hide resolved

timcharper reviewed Apr 16, 2019

View reviewed changes

...ore-hello-world/src/test/scala/com/mesosphere/usi/examples/CoreHelloWorldFrameworkTest.scala Show resolved Hide resolved

timcharper reviewed Apr 16, 2019

View reviewed changes

Revert unintentional formatting changes

fa19488

takirala requested a review from zen-dog April 17, 2019 16:13

takirala and others added 3 commits April 18, 2019 14:28

Address review comments

3a0e7a2

Add code comments

4b6af6d

changes from pairing

1f6ec8e

takirala added the WIP label Apr 19, 2019

Enforce pipelining limit across multiple SchedulerEvents

e30961e

takirala removed the WIP label Apr 19, 2019

jeschkies reviewed Apr 23, 2019

View reviewed changes

Code reformat

265e7fd

takirala force-pushed the tga/use-inmemory-persistence branch from f22c19c to 265e7fd Compare April 23, 2019 22:46

meln1k reviewed Apr 24, 2019

View reviewed changes

core/src/main/scala/com/mesosphere/usi/core/Scheduler.scala Outdated Show resolved Hide resolved

meln1k reviewed Apr 24, 2019

View reviewed changes

core/src/main/scala/com/mesosphere/usi/core/SchedulerLogicGraph.scala Outdated Show resolved Hide resolved

takirala added 3 commits April 24, 2019 14:18

Remove unused method - fromSnapshot

b41c1c8

Merge master and resolve conflicts

827923a

Load the snapshot before a graph component is built

f836891

jeschkies reviewed Apr 26, 2019

View reviewed changes

takirala requested review from meln1k and jeschkies April 26, 2019 10:36

zen-dog suggested changes May 2, 2019

View reviewed changes

Move InMemoryPodRecordRepository to test-utils

059e0a4

jeschkies approved these changes May 7, 2019

View reviewed changes

zen-dog approved these changes May 7, 2019

View reviewed changes

Add comment on location of RepositoryBehavior

a9389b5

takirala merged commit dd4c04a into master May 8, 2019

takirala deleted the tga/use-inmemory-persistence branch May 8, 2019 09:30

	.map(_.newRecord.get)
	.collect { case Some(newRecord) => newRecord }

Use InMemoryPersistenceStorage for persisting pod records #80

Use InMemoryPersistenceStorage for persisting pod records #80

Conversation

takirala commented Mar 29, 2019 • edited Loading

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timcharper Apr 16, 2019 • edited Loading

Choose a reason for hiding this comment

timcharper Apr 16, 2019 • edited Loading

Choose a reason for hiding this comment

takirala Apr 16, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeschkies Apr 26, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

takirala commented Apr 29, 2019

zen-dog left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeschkies left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zen-dog left a comment

Choose a reason for hiding this comment

takirala commented May 8, 2019

takirala commented Mar 29, 2019 •

edited

Loading

timcharper Apr 16, 2019 •

edited

Loading

timcharper Apr 16, 2019 •

edited

Loading

takirala Apr 16, 2019 •

edited

Loading

jeschkies Apr 26, 2019 •

edited

Loading