ISPN-5329 Reduce number of allocations #3342

rvansa · 2015-03-26T08:33:53Z

I have analyzed allocations during put() (into local cache) and removed some of the low-hanging fruits. Some of those could be already handled by escape analysis, but most of them cannot, IMO. Though, the difference between performance before and after was within the margin of error of the benchmark.

You can find both the benchmark and Byteman tooling for analysis at https://github.com/rvansa/benchmarks/tree/master/ispntest

galderz · 2015-03-26T11:43:42Z

Looking into this... do you have some numbers/data to compare what allocation rates were like before the fix and after?

rvansa · 2015-03-26T11:47:39Z

@galderz I have attached those data to JIRA https://issues.jboss.org/browse/ISPN-5329 - I'll yet add the rates before the PR.

wburns · 2015-03-26T12:28:48Z

core/src/main/java/org/infinispan/util/concurrent/locks/OwnableReentrantLock.java

@@ -40,6 +42,7 @@ protected final Object currentRequestor() {
   }

   private void setCurrentRequestor(Object requestor) {
+      assert requestorOnStack.get() == null;


afaik We don't enable runtime assertions.

During tests, assertions are enabled. Of course, in production this would not be called (and that's the idea!)

I was unaware that we enabled them on tests. But this still seems like maybe we should be throwing an IllegalStateException if this occurs ?

I don't think this can really happen, and I don't want to read ThreadLocal when I don't have to. Assert does not hurt.

But assert does hurt, if it slows down the test suite :)

I'd remove the assert altogether, and maybe add a TL leak test for OwnableReentrantLock. The TL is only used internally, so there's no point to check it every single time.

I see some other potential optimizations here:

Only set/unset the TL in lock(Object) if the CAS operation fails.

unlock(Object) shouldn't need the TL at all.

And a possible problem: lockInterruptibly() tryLock() should throw an UnsupportedOperationException, because they can't set the owner.

rvansa · 2015-03-26T12:53:42Z

Updated JIRA with pre and post allocation data.

wburns · 2015-03-26T12:55:39Z

...main/java/org/infinispan/util/concurrent/locks/containers/AbstractPerEntryLockContainer.java

@@ -137,4 +114,36 @@ public String toString() {
            "locks=" + locks +
            '}';
   }
+
+   private class Locker implements EquivalentConcurrentHashMapV8.BiFun<Object, L, L> {


Does this really help with allocation rates ? I am more curious for myself.

I could not get results precise enough to be able to compare these two implementations - in my benchmarks, Infinispan was sometimes suddenly faster for a brief period of time without any obvious reason, spoiling the results.
However, by merging the function and value returned by reference, you allocate only one object instead of two. I can't tell whether this would be eliminated by compiler anyway.

rvansa · 2015-03-31T08:46:05Z

@wburns @pruivo Updated PR

pruivo · 2015-03-31T09:40:27Z

commons/src/main/java/org/infinispan/commons/collections/MiniSet.java

+            if (holder == null) throw new IllegalArgumentException();
+            type = Type.SINGLE;
+        } else if (entries.length < ARRAY_SIZE) {
+            Object[] array = new Object[ARRAY_SIZE];


why not using add()? like

holder = new Object[ARRAY_SIZE]; type = Type.ARRAY; for (T entry : entries) add(entry);

the code will be simplified and the performance would be the same. WDYT?

you could use the same thing for collection constructor and use addAll()

rvansa · 2015-03-31T10:28:02Z

@pruivo You're right, I've replaced the calls with add/addAll. And thanks for spotting the other bugs; I was considering adding guava testlib as test dependency so that I could test those collections properly, but I got some problems with that (and haven't spent much time with it, hoping that the general testsuite will catch bugs anyway - apparently it can't). I don't want to write standard collection test myself.

pruivo · 2015-03-31T13:13:58Z

@rvansa one last comment... the iterators should in MiniSet should check for modifications. If the type of the set moves (eg. from SINGLE -> ARRAY or from ARRAY -> MAP) you will get a bunch of weird exceptions (ClassCastException) or results

rvansa · 2015-04-01T06:55:18Z

@pruivo I didn't want to burden it with another field for modcount (MiniList gets that inherited from AbstractList, AbstractSet does not contain it), but if you think that it's beneficial, I can go for that.

pruivo · 2015-04-01T10:09:58Z

@rvansa well, if you don't want to add the modcout, at least check if the MiniSet type is the same as the Iterator type...

and it would be good to see what happen if the type of the MiniSet changes during the iteration:

Set<String> set = new MiniSet<>("single-value");
Iterator<String> iterator = set.iterator();
set.add("another-value");
assertTrue(iterator.hasNext())
assertEquals("single-value", iterator.next());

and a similar with the array => map change

danberindei · 2015-04-02T13:46:53Z

@rvansa The data in the JIRA only mentions the number of allocations, not the actual performance... And speaking of the number of allocations, it's pretty hard to read the numbers in the issue, maybe you change the format to something more diff-like? I used this:

diff -u <(sort -k1.20 before.txt| sed 's/(.*)//') <(sort -k1.20 after.txt | sed 's/(.*)//')

TBH I'm not convinced the additional branching in MiniList is worth the memory savings, as it makes the job of the CPU (and maybe the JIT) harder.

The MiniSet benefit seems clearer, because HashSet needs so many allocations, but here too I think a simpler implementation with only two storage options (Object[] and HashMap) may be better.

For both of them, I'm not sure the type field is needed. Since you can't really put an Object[] in the cache we can just say these collections don't work with Object[].

The synchronized versions look good, although they probably won't be as useful once we baseline on Java 9: http://openjdk.java.net/jeps/169 ;)

danberindei · 2015-04-02T13:55:58Z

I also agree with @pruivo that the collections need at least some minimal tests. The core testsuite may expose bugs by itself, but why make you life harder and rely on random test failures when this should be so much easier to test than almost everything else in Infinispan? :)

Sanne · 2015-04-03T21:48:23Z

Great initiative 👍

Dan, asking to measure results as direct performance impact is very hard. All we know is that in a full system (which does many other things and runs lots of user code too) you'll generally end up in trouble with memory because of Infinispan using it all/most up for himself.
You might not be able to measure this at all by just measuring Infinispan; the problem is crossing the threshold of how much is sensible to allocate in terms of temporary objects. As long as you don't cross the threshold, it might look like (almost) free to allocate some temporary helpers, so like making a defensive copy of a Map might look like "free". But when doing such things extensively, the system falls over.

At that point the memory bandwidth is saturated, and the CPU will be mostly sitting idle. Ironically, that means that making some code less efficient will not be measured by any benchmark either, as you'll be putting to better use some CPU cycles which are otherwise wasted in idle waits for main memory, or worse trigger additional context switching exacerbating the problem of unpredictable performance per operation.

It's measurable that there is such a problem of excess heap utilization as it's pretty easy to cross all these boundaries even when using Infinispan with a pretty well written client application. The solution is to keep working on these patches, however we'll need many of these before any difference can be measured.

That said, @rvansa make sure you don't focus just on instance numbers but prioritize based on size of instances? I'm not sure how you're measuring, but the JIRA mentions that among others you're ignoring arrays.
I've sent an email many months ago highlighting the urgency of getting rid of some Collections altogether (not just optimise them), as I had used several tools (among others flight recorder and java object layout), these can also help you to identify the biggest offenders.

Sanne · 2015-04-03T21:58:06Z

Well looks like that was 16+ months ago when I wrote about those tools...!
http://lists.jboss.org/pipermail/infinispan-dev/2013-December/014311.html

But I fixed it only recently:

https://issues.jboss.org/browse/ISPN-5032

Combined with this much older one, that saved approx. 130GB / second of pointless allocations on our benchmark:

https://issues.jboss.org/browse/ISPN-1678

Sanne · 2015-04-03T22:11:09Z

+1 to add thourough unit tests BTW

Couldn't MiniList be replaced by LinkedList? If you expect the lists to be small, that's not much overhead over an array. Certainly all the System.arraycopy you have to perform on add/remove will defeat the point of not polluting heap with small objects.
Also, in the smallest case I think LinkedList would perform not worse than this new implementation.

Generally speaking, you're replacing some concurrent sets with "mini" concurrent sets. Can you 100% guarantee that these use cases are small? Maybe at least checking initialCapacity is in order?

danberindei · 2015-04-08T11:24:03Z

@Sanne I have to disagree with you here...

When optimizing, there are clean-cut cases where we can just stop doing unnecessary stuff, but most of the time we also have to do something else instead. We can guess beforehand that that "something else" is going to be faster than the old stuff beforehand, but we also need to test that assumption after making our changes. And if we can't write a single test that shows an improvement, we have to accept that our initial assumption was wrong.

To use your LinkedList suggestion as an example... how can we possibly decide which is better, MiniList or LinkedList? Is it true for all tx sizes, or is there an "unreasonable" tx size threshold that we will ignore in our tests? My personal theory is that a specialized singly-linked list that doesn't support null values or removing elements should be the fastest option for small txs, but who knows...

Sanne · 2015-04-10T11:59:29Z

@danberindei yea you're right, I overreacted a bit to highlight the other point of view. Just making the point that it's not easy to measure at all.

Sometimes one has to simply trust it, but I agree only if the difference can be "read" in the source code. As in the LinkedList example, I'm not sure of that.

To find out though, one would need a micro benchmark comparing just the List implementation behaviour and focusing on memory allocation costs only, I don't see how one could measure the difference in a full-stack Infinispan benchmark.
One would also want to see the computational difference in such a benchmark, but I'd argue that we know enough of Infinispan to know that if the memory benefit is significant, a small cost (if it's small) should be worth it.

What concerns me the most here is that some collections are replaced with "mini" implementations, but I'm not seeing strong evidence that those collections will really be small in practice.

rvansa · 2015-04-10T14:06:12Z

@Sanne I can't tell you how many writes will the user usually do in one transaction. However, MiniSet set shouldn't be ever worse than HashSet (which delegates to internal HashMap anyway) and MiniList shouldn't be worse than ArrayList (for directly implemented operations), besides single switch.
I've seen similar optimizations elsewhere, therefore I've implemented this PR.

I am not objecting the fact that I should add the collections tests. And regarding performance - I've tried to do full-stack comparison of puts and gets, but for some reason I had much faster outliers sometimes so I don't trust the results :-/

danberindei · 2015-04-10T18:04:43Z

@rvansa You can always ignore the outliers :) What test were you using?

rvansa · 2015-04-11T18:08:50Z

@danberindei JMH, doing puts and gets against local cache. Code here https://github.com/rvansa/benchmarks/tree/master/ispntest

@Sanne And regarding LinkedList, the idea of MiniList was to replace allocating another instance (Node) with having that single object directly in the list. But I appreciate your note that size of the object is what matters rather than number of instances, I should bear that in mind, too.

galderz · 2015-04-16T15:13:13Z

Needs rebasing

* replaced HashSet, ArrayList, LinkedList and Collections.synchronizedXXX with memory-optimized collections

* AtomicFieldUpdater instead of AtomicInteger * replaced ByRef with field in invoked function

* default metadata don't create copy * EmbeddedMetadata$Builder is not used when not necessary

rvansa · 2015-05-05T17:55:19Z

Added the additional check as recommended by @pruivo and rebased. Beyond that, I ran JMH benchmark few days ago and I can see little improvement in writes (testPutImplicit inserts single entry, possibly using auto-commit transaction with transactional cache):
https://gist.github.com/rvansa/ba6195ff6ebecfdf6011

rvansa · 2015-05-11T08:54:50Z

Temporarily closing.

galderz added the Ready for Review label Mar 26, 2015

wburns reviewed Mar 26, 2015
View reviewed changes

rvansa force-pushed the t_5329 branch from 6c6b24f to 532b40c Compare March 31, 2015 08:44

pruivo reviewed Mar 31, 2015
View reviewed changes

rvansa force-pushed the t_5329 branch from 532b40c to b347666 Compare March 31, 2015 10:27

galderz added Needs Rebase and removed Ready for Review labels Apr 16, 2015

rvansa force-pushed the t_5329 branch from b347666 to 3a7d637 Compare May 5, 2015 17:20

rvansa added 3 commits May 5, 2015 13:21

ISPN-5329 Reduce number of allocations

1544eb3

* replaced HashSet, ArrayList, LinkedList and Collections.synchronizedXXX with memory-optimized collections

ISPN-5329 Reduce number of allocations

0a297d9

* AtomicFieldUpdater instead of AtomicInteger * replaced ByRef with field in invoked function

ISPN-5329 Reduce number of allocations

6406512

* default metadata don't create copy * EmbeddedMetadata$Builder is not used when not necessary

rvansa force-pushed the t_5329 branch from 3a7d637 to 6406512 Compare May 5, 2015 17:49

rvansa closed this May 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISPN-5329 Reduce number of allocations #3342

ISPN-5329 Reduce number of allocations #3342

rvansa commented Mar 26, 2015

galderz commented Mar 26, 2015

rvansa commented Mar 26, 2015

wburns Mar 26, 2015

rvansa Mar 26, 2015

wburns Mar 26, 2015

rvansa Mar 26, 2015

danberindei Apr 2, 2015

danberindei Apr 2, 2015

rvansa commented Mar 26, 2015

wburns Mar 26, 2015

rvansa Mar 26, 2015

rvansa commented Mar 31, 2015

pruivo Mar 31, 2015

pruivo Mar 31, 2015

rvansa commented Mar 31, 2015

pruivo commented Mar 31, 2015

rvansa commented Apr 1, 2015

pruivo commented Apr 1, 2015

danberindei commented Apr 2, 2015

danberindei commented Apr 2, 2015

Sanne commented Apr 3, 2015

Sanne commented Apr 3, 2015

Sanne commented Apr 3, 2015

danberindei commented Apr 8, 2015

Sanne commented Apr 10, 2015

rvansa commented Apr 10, 2015

danberindei commented Apr 10, 2015

rvansa commented Apr 11, 2015

galderz commented Apr 16, 2015

rvansa commented May 5, 2015

rvansa commented May 11, 2015

ISPN-5329 Reduce number of allocations #3342

ISPN-5329 Reduce number of allocations #3342

Conversation

rvansa commented Mar 26, 2015

galderz commented Mar 26, 2015

rvansa commented Mar 26, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rvansa commented Mar 26, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rvansa commented Mar 31, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rvansa commented Mar 31, 2015

pruivo commented Mar 31, 2015

rvansa commented Apr 1, 2015

pruivo commented Apr 1, 2015

danberindei commented Apr 2, 2015

danberindei commented Apr 2, 2015

Sanne commented Apr 3, 2015

Sanne commented Apr 3, 2015

Sanne commented Apr 3, 2015

danberindei commented Apr 8, 2015

Sanne commented Apr 10, 2015

rvansa commented Apr 10, 2015

danberindei commented Apr 10, 2015

rvansa commented Apr 11, 2015

galderz commented Apr 16, 2015

rvansa commented May 5, 2015

rvansa commented May 11, 2015