Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1202: prioritized merge #1205

Merged
merged 14 commits into from Aug 11, 2020
Merged

1202: prioritized merge #1205

merged 14 commits into from Aug 11, 2020

Conversation

ctoomey
Copy link
Contributor

@ctoomey ctoomey commented Jun 19, 2020

Incomplete draft for #1202 for review/discussion. Tests to follow as appropriate.

priority.compareTo(o.priority)
}

val pq = new PriorityBlockingQueue[PQElem](sources.size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use normal PriorityQueue instead of a blocking one? It seems like the access is synchronized anyway

Copy link
Contributor Author

@ctoomey ctoomey Jun 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will look at replacing it and adding necessary synchronization, then adding tests.

Otherwise it looks good?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I haven't got a moment to take a close look, but it seems to look good.

I think it's not too far from mergeMapPrioritized version If you'd like to go further:

  • onNext of the source Observable (upstream) would add a new Observable to the composite instead of doing it from the starting list
  • RefCountObservable is used to control the number of active children and their lifecycle
  • It would be awesome to support different overflow strategies, but it would require to update BufferedSubscriber to allow priority queues. The implementation is in AbstractBackPressuredBufferedSubscriber for Back-Pressure strategy. Unfortunately I don't know if there is a good implementation of concurrent priority queue out there. I imagine it's too much work, and I don't have capacity to work on it right now so I suppose we can stay with the current behavior which is a back-pressure with bufferSize of 0.

If you don't feel like doing it - that's okay, mergePrioritizedList is already better than the status quo

It might also be nice to add a benchmark vs normal merge, but that's just my curiosity and completely optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since updating BufferedSuscriber would be lots of work I kept my implementation as is. I swapped out the Java PriorityBlockingQueue for the Scala mutable.PriorityQueue, and added tests.

@monix monix deleted a comment from Avasil Jul 27, 2020
@monix monix deleted a comment from ctoomey Jul 27, 2020
@monix monix deleted a comment from ctoomey Jul 27, 2020
@monix monix deleted a comment from AlecZorab Jul 27, 2020
@monix monix deleted a comment from ctoomey Jul 27, 2020
@monix monix deleted a comment from ctoomey Jul 27, 2020
@monix monix deleted a comment from ctoomey Jul 27, 2020
@ctoomey
Copy link
Contributor Author

ctoomey commented Aug 4, 2020

@Avasil will you go ahead and merge this, or is there anything else needed? Thanks.

* sources have items available. If items are available from sources with the
* same priority, the order is undefined.
*/
def mergePrioritizedList[A](sources: Observable[A]*)(priorities: Seq[Int]): Observable[A] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting implementation @ctoomey.

I do have one suggestion about this signature ... it's not clear that the priorities should match the sources and there's no reason for it.

How about ...

def mergePrioritizedList[A](sources: (Int, Observable[A])*): Observable[A]

Then you could do ...

Observable.mergePrioritizedList(
  1 -> observable1,
  2 -> observable2,
  //...
)

That way the signature is clearer, and you don't need that runtime assert in the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I'd prefer that signature too and in fact had it that way originally, but decided to match the way Akka streams did it. I'll change it.

@alexandru
Copy link
Member

alexandru commented Aug 5, 2020

@ctoomey btw, thanks for the contribution, looking good, just dropped a suggestion on the signature . Will try and take it for a test drive.

@Avasil don't hold the merge on my account 🙂

@ctoomey
Copy link
Contributor Author

ctoomey commented Aug 6, 2020

CI failed due to unrelated test monix.catnap.ConcurrentChannelJVMParallelism8Suite failure. Will somebody please re-trigger it?

@ctoomey
Copy link
Contributor Author

ctoomey commented Aug 7, 2020

Another unrelated test failure. Please re-run the CI, thanks.

@Avasil
Copy link
Collaborator

Avasil commented Aug 7, 2020

I'm not sure about ConcurrentChannelJVMParallelism8Suite but I know about InputObservableSuite - I'll fix it during the weekend and review your PR. :)

@ctoomey
Copy link
Contributor Author

ctoomey commented Aug 7, 2020

Thanks @Avasil -- looking forward to having this merged and released -- any idea when you'll do the next release?

@Avasil
Copy link
Collaborator

Avasil commented Aug 7, 2020

I wanted to do the next release with #1213 but it might take a while and we could do something earlier.
We can also release a snapshot any time

@ctoomey
Copy link
Contributor Author

ctoomey commented Aug 7, 2020

It'd be great to have a release soon after this PR is merged, if that's not too much trouble.

Copy link
Collaborator

@Avasil Avasil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good @ctoomey ! Thank you for the effort and sorry for slow responses, I've been away for the last two weeks.

I'm wondering about how strict should we be about prioritizing sources.
e.g.

val source =
  Observable.mergePrioritizedList((2, Observable(1, 1, 1, 1, 1, 1)), (1, Observable(2, 2, 2, 2, 2, 2)))

val f = source.toListL.runToFuture

All elements are available immediately for both sources, should it always return List(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2)?

BTW @AlecZorab Do you think the issue you mentioned is fixed now? It seems like the comment was deleted


// MUST BE synchronized by `lock`
def signalOnNext(): Future[Ack] =
lock.synchronized {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant synchronize - comment already mentions it and it is done in onNext

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

if (isDone) Stop else out.onNext(a)
}

def processNext(): Future[Ack] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a comment: // MUST BE synchronized by `lock`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -6056,6 +6056,19 @@ object Observable extends ObservableDeprecatedBuilders {
}
}

/** Given a sequence of priority/observable pairs, combines them
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to mention whether higher value is higher priority

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@AlecZorab
Copy link

I'm not sure if the issue I spotted is fixed or not.

It looks like signalOnComplete can still trigger before the final message gets processed, but maybe there's some subtlety to the execution model that I'm missing?

The example that springs to mind:

Observable calls onNext with its final value, there are higher priority values waiting, so it gets put into the pqueue. The observable is now complete, so it calls onComplete, triggering signalOnComplete. This can't acquire the lock, so it blocks. In the async case of signalOnNext, there's now a race where the waiting signalOnComplete can enter the lock before the value processing happens.

But maybe the system invariants forbid that from happening?

@ctoomey
Copy link
Contributor Author

ctoomey commented Aug 10, 2020

I'm wondering about how strict should we be about prioritizing sources.

Good question. I thought more about this and how to document it and came up w/ the following doc. update that'll be in the next commit. One improvement this triggered was changing the order of subscribing to the sources from the given order to prioritized order, such that if multiple sources have items immediately available, the highest priority ones will come through first.

  /** Given a sequence of priority/observable pairs, combines them into a new
    * observable that eagerly emits source items downstream as soon as demand is
    * signaled, choosing the item from the highest priority (greater numbers
    * mean higher priority) source when items from multiple sources are
    * available. If items are available from multiple sources with the same
    * highest priority, one of them is chosen arbitrarily.
    *
    * Source items are buffered only to the extent necessary to accommodate
    * backpressure from downstream, and thus if only a single item is available
    * when demand is signaled, it will be emitted regardless of priority.
    *
    * Backpressure is propagated from downstream to the source observables, so
    * that items from a given source will always be emitted downstream in the
    * same order as received from the source, and at most a single item from a
    * given source will be in flight at a time.
    */

In your example, the proposed ordering is not guaranteed though, as the sequencing of the sources calling onNext is non-deterministic and so per the eager downstream emitting, so is the item ordering.

@Avasil
Copy link
Collaborator

Avasil commented Aug 10, 2020

Note that the normal Observable.merge uses an underlying back-pressured buffer. An implementation using bounded buffers for a prioritized merge could be more complicated, but I'd like for us to give it a try.

So in case your implementation isn't back-pressured, that is fine for now, but then I'd like to know if you'd be interested in exploring an alternative using bounded buffers afterwards, to make this safer. I can provide you with some ideas if interested.

Could we provide the current implementation with API that would allow us to add customizable back-pressure in later release?
If I remember correctly, @ctoomey implementation supports OverflowStrategy.BackPressure(1) which ironically is not yet possible for other operators :D Though even if we add a new parameter with that default it won't be consistent with other operators where it is os: OverflowStrategy[B] = OverflowStrategy.Default and I need to double check if it's binary compatible

case Continue =>
isDone = true
out.onComplete()
completePromises()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the issue:

The key is that so long as there are items the downstream hasn't yet processed, lastAck will remain a Future[Ack], and signalOnComplete won't call out.onComplete until that future completes, i.e., downstream has finished processing all pending items including the last one from the last source.

Take the simplest case of your example: 2 sources w/ 1 item each:

Source A calls onNext, item gets sent downstream: lastAck = Future[Ack for A's item]
Source A calls onComplete: completedCount = 1
Source B calls onNext, item gets queued: lastAck = Future[Ack for A's item].flatMap(Future[Ack for B's item])
Source B calls onComplete: completedCount = 2; lastAck.onComplete { ... out.onComplete }

Looking at the contract: https://monix.io/docs/3x/reactive/observers.html#contract

  1. Back-pressure for onComplete and onError is optional: when calling onComplete or onError you are not required to wait on the Future[Ack] of the previous onNext.

It implies that Source B could call:

out.onNext(lastElem)
out.onComplete()

In that case, we could send remaining elements from the queue to the downstream instead of completing each promise with Stop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we could send remaining elements from the queue to the downstream instead of completing each promise with Stop.

So long as the downstream returns Continue and none of the upstreams call onError, the downstream will always get all the items emitted by the upstreams. That's what the "should push all items downstream before calling onComplete" test is verifying. Again:

The key is that so long as there are items the downstream hasn't yet processed, lastAck will remain a Future[Ack], and signalOnComplete won't call out.onComplete (or completePromises) until that future completes, i.e., downstream has finished processing all pending items including the last one from the last source.

@alexandru
Copy link
Member

Could we provide the current implementation with API that would allow us to add customizable back-pressure in later release?

If I remember correctly, @ctoomey implementation supports OverflowStrategy.BackPressure(1) which ironically is not yet possible for other operators :D Though even if we add a new parameter with that default it won't be consistent with other operators where it is os: OverflowStrategy[B] = OverflowStrategy.Default and I need to double check if it's binary compatible

We can't add an extra parameter, if that's what you mean. But should we add a buffered alternative, can always add an overload, with the same name or with a different name.

@alexandru
Copy link
Member

Note I'm fine with a BackPressure(1) behavior, that's valuable in itself.

If some buffering is desired however, it would probably be incorrect to add it after the merge, because you can buffer lower-priority items, and keep the high priority items from being processed.

So I'd like us to open an issue for a buffered variant too. I'm thinking that each priority can have its own buffer and then a loop would go and select between them, cycling from high priority to low priority.

@alexandru
Copy link
Member

... speaking of this, this is how RxJava built a kick-ass merge operation, last time I checked. They created multiple buffers (concurrent queues), assigning them per observable being merged, to reduce the contention on the producers side. But this isn't very relevant to this PR. Mentioning it to keep it in mind for the future.

Note: I enabled GitHub Actions, so a "stable" "snapshot" version will be published as soon as this PR is merged.

throw e
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the above logic to ensure that we complete the upstream onNext promises when the downstream stops or errors early while we still have pending upstream items. Added tests to verify as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use our syncOnStopOrFailure syntax here, should be more efficient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much nicer, thanks.

@ctoomey
Copy link
Contributor Author

ctoomey commented Aug 11, 2020

So I'd like us to open an issue for a buffered variant too. I'm thinking that each priority can have its own buffer and then a loop would go and select between them, cycling from high priority to low priority.

Created #1227

Note: I enabled GitHub Actions, so a "stable" "snapshot" version will be published as soon as this PR is merged.

Great, thanks @alexandru

@AlecZorab
Copy link

So having done some println debugging, I'm convinced that the issue I highlighted isn't an issue any more. The subtlety in the implementation that I missed is in the way the chaining of futures works in signalOnNext. For the early completion I was thinking about to occur, you'd have to have an item in the pqueue waiting to be sent, whilst also having lastAck == Continue, which isn't a reachable state.

@ctoomey, my apologies for taking so long to understand what you were talking about


def unsafeSubscribeFn(subscriber: Subscriber[A]): Cancelable = {
val onNext = subscriber.onNext(elem)
onNext.onComplete(_ => subscriber.onComplete())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it's allowed in the contract, it might be a better test to change this to call onComplete immediately, rather than waiting for onNext to complete. Having tried it, the test still passes, so it's really just a case of making it a little bit more stringent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@alexandru alexandru changed the base branch from master to series/3.x August 11, 2020 18:14
@ctoomey
Copy link
Contributor Author

ctoomey commented Aug 11, 2020

So having done some println debugging, I'm convinced that the issue I highlighted isn't an issue any more. The subtlety in the implementation that I missed is in the way the chaining of futures works in signalOnNext. For the early completion I was thinking about to occur, you'd have to have an item in the pqueue waiting to be sent, whilst also having lastAck == Continue, which isn't a reachable state.

@ctoomey, my apologies for taking so long to understand what you were talking about

No problem, good to be sure for this kind of thing.

@ctoomey
Copy link
Contributor Author

ctoomey commented Aug 11, 2020

@Avasil will you please merge if there's nothing else? Thanks

@alexandru
Copy link
Member

@Avasil at this point I think we should just merge, and let @ctoomey test with the published version. If there's anything wrong, I'm sure he'll be around to help.

@ctoomey
Copy link
Contributor Author

ctoomey commented Aug 11, 2020

@Avasil at this point I think we should just merge, and let @ctoomey test with the published version. If there's anything wrong, I'm sure he'll be around to help.

For sure

@Avasil Avasil merged commit 1a5b9b1 into monix:series/3.x Aug 11, 2020
@Avasil
Copy link
Collaborator

Avasil commented Aug 11, 2020

Awesome, I've merged into series/3.x but I'm not sure if it has an automatic release now?

@Avasil
Copy link
Collaborator

Avasil commented Aug 11, 2020

And how do we go about keeping consistency between series, should we open a PR from this branch and target 4.x.?

alexandru pushed a commit that referenced this pull request Aug 12, 2020
* 1202: prioritized merge

* 1202: prioritized merge

* 1202: prioritized merge

* Fix unrelated InputStreamObservableSuite bug that failed build

* 1202: prioritized merge

* 1202: prioritized merge

* 1202: prioritized merge

* 1202: prioritized merge

* 1202: prioritized merge

* 1202: prioritized merge

* 1202: prioritized merge

* 1202: prioritized merge

* 1202: prioritized merge
@alexandru
Copy link
Member

alexandru commented Aug 12, 2020

Awesome, I've merged into series/3.x but I'm not sure if it has an automatic release now?

It didn't, I introduced a parameter in our secrets, called AUTO_PUBLISH_SERIES_3X, and it was set to false. I've set it to true and re-running the build, will let you know when it's ready.

And how do we go about keeping consistency between series, should we open a PR from this branch and target 4.x.?

Unfortunately we have to start making incompatible changes, and maintaining two branches is inevitable, because the 4.x release will not be stable enough to release for a couple of months.

So yes, new developments that are compatible with Monix 3.x should target the series/3.x branch, which is now the main branch, and then we can back-port the changes. Fortunately we have been using "squash" when merging, keeping our history linear. So I just did this in the series/4.x branch:

git cherry-pick 1a5b9b1b41eaa2a1e955302e741a6daab1ff73f6

In this case it was clear, no PRs needed, I just committed it. The author could do that too, in which case a PR would be needed. And a PR would also be needed in case of conflicts.

@alexandru
Copy link
Member

We have version 3.2.2+37-1a5b9b1b published.

@ctoomey
Copy link
Contributor Author

ctoomey commented Aug 12, 2020

We have version 3.2.2+37-1a5b9b1b published.

@alexandru the monix-internal-jctools artifact is missing:

[info] downloading https://repo1.maven.org/maven2/io/monix/monix_2.12/3.2.2+37-1a5b9b1b/monix_2.12-3.2.2+37-1a5b9b1b.jar ...
[info] downloading https://repo1.maven.org/maven2/io/monix/monix-catnap_2.12/3.2.2+37-1a5b9b1b/monix-catnap_2.12-3.2.2+37-1a5b9b1b.jar ...
[info] downloading https://repo1.maven.org/maven2/io/monix/monix-eval_2.12/3.2.2+37-1a5b9b1b/monix-eval_2.12-3.2.2+37-1a5b9b1b.jar ...
[info] downloading https://repo1.maven.org/maven2/io/monix/monix-reactive_2.12/3.2.2+37-1a5b9b1b/monix-reactive_2.12-3.2.2+37-1a5b9b1b.jar ...
[info] downloading https://repo1.maven.org/maven2/org/typelevel/cats-effect_2.12/2.1.4/cats-effect_2.12-2.1.4.jar ...
[info] downloading https://repo1.maven.org/maven2/io/monix/monix-tail_2.12/3.2.2+37-1a5b9b1b/monix-tail_2.12-3.2.2+37-1a5b9b1b.jar ...
[info] downloading https://repo1.maven.org/maven2/io/monix/monix-java_2.12/3.2.2+37-1a5b9b1b/monix-java_2.12-3.2.2+37-1a5b9b1b.jar ...
[info] downloading https://repo1.maven.org/maven2/io/monix/monix-execution_2.12/3.2.2+37-1a5b9b1b/monix-execution_2.12-3.2.2+37-1a5b9b1b.jar ...
[info] 	[SUCCESSFUL ] io.monix#monix_2.12;3.2.2+37-1a5b9b1b!monix_2.12.jar (393ms)
[info] 	[SUCCESSFUL ] io.monix#monix-java_2.12;3.2.2+37-1a5b9b1b!monix-java_2.12.jar (776ms)
[info] 	[SUCCESSFUL ] io.monix#monix-catnap_2.12;3.2.2+37-1a5b9b1b!monix-catnap_2.12.jar (1109ms)
[info] 	[SUCCESSFUL ] io.monix#monix-tail_2.12;3.2.2+37-1a5b9b1b!monix-tail_2.12.jar (1255ms)
[info] 	[SUCCESSFUL ] org.typelevel#cats-effect_2.12;2.1.4!cats-effect_2.12.jar (1354ms)
[info] 	[SUCCESSFUL ] io.monix#monix-eval_2.12;3.2.2+37-1a5b9b1b!monix-eval_2.12.jar (1358ms)
[info] 	[SUCCESSFUL ] io.monix#monix-execution_2.12;3.2.2+37-1a5b9b1b!monix-execution_2.12.jar (1365ms)
[info] 	[SUCCESSFUL ] io.monix#monix-reactive_2.12;3.2.2+37-1a5b9b1b!monix-reactive_2.12.jar (1728ms)
[warn] 	::::::::::::::::::::::::::::::::::::::::::::::
[warn] 	::          UNRESOLVED DEPENDENCIES         ::
[warn] 	::::::::::::::::::::::::::::::::::::::::::::::
[warn] 	:: io.monix#monix-internal-jctools_2.12;3.2.2+37-1a5b9b1b: not found
[warn] 	::::::::::::::::::::::::::::::::::::::::::::::
[warn] 
[warn] 	Note: Unresolved dependencies path:
[warn] 		io.monix:monix-internal-jctools_2.12:3.2.2+37-1a5b9b1b (/Users/christoomey/unix/liv/service-template/api/build.sbt#L40-75)
[warn] 		  +- io.monix:monix-execution_2.12:3.2.2+37-1a5b9b1b (/Users/christoomey/unix/liv/service-template/api/build.sbt#L40-75)
[warn] 		  +- io.monix:monix-catnap_2.12:3.2.2+37-1a5b9b1b (/Users/christoomey/unix/liv/service-template/api/build.sbt#L40-75)
[warn] 		  +- io.monix:monix-eval_2.12:3.2.2+37-1a5b9b1b (/Users/christoomey/unix/liv/service-template/api/build.sbt#L40-75)
[warn] 		  +- io.monix:monix-reactive_2.12:3.2.2+37-1a5b9b1b (/Users/christoomey/unix/liv/service-template/api/build.sbt#L40-75)
[warn] 		  +- io.monix:monix_2.12:3.2.2+37-1a5b9b1b (/Users/christoomey/unix/liv/service-template/api/build.sbt#L40-75)

@alexandru
Copy link
Member

@ctoomey sorry, I updated JCTools, and in the process I ended up "shading" it, but forgot to "aggregate" it, so it didn't get published.

We're fixing it.

@alexandru
Copy link
Member

@ctoomey try 3.2.2+41-615286ba

@ctoomey
Copy link
Contributor Author

ctoomey commented Aug 13, 2020

Thanks @alexandru, looks good now.

@Avasil Avasil added this to the 3.3.0 milestone Sep 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants