Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISPN-9812 Streaming Cluster Publisher #7523

Merged
merged 1 commit into from Nov 15, 2019

Conversation

wburns
Copy link
Member

@wburns wburns commented Oct 31, 2019

@wburns wburns added the Preview label Oct 31, 2019
@wburns
Copy link
Member Author

wburns commented Oct 31, 2019

I still have to organize some of the classes and add more docs for the internal classes. This is still in Preview as only core tests should be pretty much passing.

@wburns
Copy link
Member Author

wburns commented Oct 31, 2019

There is a random failure that I can't seem to isolate or get a trace from that I have been still trying to track down. It seems to only happen when running without trace and always seems to fail two different tests. So I am not sure if it is related to running those in parallel somehow.

@wburns
Copy link
Member Author

wburns commented Oct 31, 2019

I have not deleted the old distributed stream classes, but that will be the next PR and it should remove needless to say a lot of lines of code (more than was added for this I believe).

@wburns wburns force-pushed the ISPN-9812_cluster_publisher branch from d8c281c to 9cd4444 Compare October 31, 2019 03:14
@wburns
Copy link
Member Author

wburns commented Oct 31, 2019

I was able to squash the remaining intermittent failure I could find. Was caused by the async closure of a publisher was too slow and a failover could request again with the same id. Now it properly overwrites an old state if it is present and complete (not completed still throws exception).

@wburns
Copy link
Member Author

wburns commented Oct 31, 2019

Looks like I need to tweak some query stuff to no longer return a null value :)

@wburns wburns force-pushed the ISPN-9812_cluster_publisher branch from a8dd157 to 5bdccf9 Compare October 31, 2019 13:55
@wburns
Copy link
Member Author

wburns commented Oct 31, 2019

Query should hopefully be fixed now too :)

@wburns
Copy link
Member Author

wburns commented Oct 31, 2019

CI is looking pretty good - it appears my segment completion isn't quite on par yet though

https://ci.infinispan.org/job/Infinispan/job/PR-7523/4/testReport/junit/org.infinispan.client.hotrod.impl.iteration/DistFailOverRemoteIteratorTest/testFailOver/

@wburns
Copy link
Member Author

wburns commented Nov 1, 2019

CI appears to be passing just fine now. Just have to add Javadoc and maybe reorg classes slightly.

@wburns wburns force-pushed the ISPN-9812_cluster_publisher branch from 011b1ac to 47750bc Compare November 1, 2019 13:02
@wburns
Copy link
Member Author

wburns commented Nov 1, 2019

Testing with https://github.com/infinispan/infinispan-benchmarks/tree/master/iteration I have anywhere from 13-33% increase in throughput when rehash is disabled. When rehash is not disabled the increase is anywhere from 38-141% in throughput.

@wburns
Copy link
Member Author

wburns commented Nov 1, 2019

Averaging the various runs non rehash had a performance increase of 20.7% and rehash had an increase of 76.4%.

https://docs.google.com/spreadsheets/d/1p3rArTx-lIZ66vALMcuXjxeCW7ivxfCqeYCfluwzQyw/edit?usp=sharing

@wburns wburns changed the title [PREVIEW] ISPN-9812 Streaming Cluster Publisher ISPN-9812 Streaming Cluster Publisher Nov 4, 2019
@wburns wburns removed the Preview label Nov 4, 2019
@wburns
Copy link
Member Author

wburns commented Nov 5, 2019

Some recent changes have introduced a failure, probably a typo. Fixing.

}
segments.forEach(completedSegmentConsumer);
})
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also handle doOnError?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What were you thinking would go in there? We don't want to complete the segments. The error would be propagated to the caller, so the entire Subscription would be cancelled anyways.

@wburns
Copy link
Member Author

wburns commented Nov 6, 2019

Okay flatMap should now be fixed. It can probably be optimized to not have to store the entries in a list by counting the entries and using onComplete but my first pass at that didn't work. So we can leave that for a future endeavor maybe.

@wburns
Copy link
Member Author

wburns commented Nov 6, 2019

Key tracking map/flatMap is better optimized now as it doesn't have to eagerly retrieve the values into a list, which is especially costly for map as it is always 1:1 or 1:0.

@wburns
Copy link
Member Author

wburns commented Nov 6, 2019

Addressed all the current comments as well as fixed another bug with flatMap if there was a large number of entries generated from it.

* can be useful for some optimizations that may need to track produced values from the original.
* @return if the values in the publisher are unchanged
*/
boolean isModified();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs updating to say that is should return true when the function does change the value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me make sure this correct again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now.

log.tracef("Completing last segments of: %s", completedSegments);
listener.accept(completedSegments::iterator);
completedSegments.clear();
}
}

private class RehashIterator<S> extends AbstractIterator<S> implements CloseableIterator<S> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we can now remove this and CompletionListenerRehashIterator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup that will be handled in https://issues.jboss.org/browse/ISPN-10895. I didn't want to bloat this PR too much.

@wburns
Copy link
Member Author

wburns commented Nov 13, 2019

Rebased and squashed to 1 commit.

@ryanemerson
Copy link
Contributor

@wburns #7591 was merged before I came online, so this needs a rebase. Then I promise I'll merge!

@wburns
Copy link
Member Author

wburns commented Nov 15, 2019

CI found a small gap in partition handling between when a publisher is created and when it is subscribed to. Recheck the partition status after registering with listener to remove gap. I am not sure what those PooledConnectionOperation failures are, but I can't seem to reproduce and it seems like a setup issue more likely as everyone failed consistently.

@wburns
Copy link
Member Author

wburns commented Nov 15, 2019

It looks like I fixed flatMap when rehash is enabled, but non rehash flatMap is still broken :(

I need to actually fix this at the LocalPublisherManager level, which will in turn make the PublisherHandler code simpler.

@wburns
Copy link
Member Author

wburns commented Nov 15, 2019

Non rehash flatMap should work properly now. I just piggy back off the rehash version, but don't return the keys to keep the response size smaller. This can be optimized later though.

@wburns
Copy link
Member Author

wburns commented Nov 15, 2019

Got a chance to think about it and the flatMap with non rehash is now properly optimized. In fact the LocalPublisherManager now should always properly return the segment completed or lost after all values as it was originally envisioned. Unfortunately, KeyPublisherState can't utilize this as it has to track by the original key to a segment, not just the values - so that will have to stay the same.

@ryanemerson ryanemerson added this to the 10.1.0.Beta1 milestone Nov 15, 2019
@ryanemerson
Copy link
Contributor

@wburns The transformer changes have broken SimpleClusterPublisherManagerTest.

@wburns
Copy link
Member Author

wburns commented Nov 15, 2019 via email

@wburns
Copy link
Member Author

wburns commented Nov 15, 2019

Context should be only applied once now.

@ryanemerson ryanemerson merged commit 8684d79 into infinispan:master Nov 15, 2019
@ryanemerson
Copy link
Contributor

Great stuff @wburns, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants