New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISPN-5329 Reduce number of allocations #3342
Conversation
Looking into this... do you have some numbers/data to compare what allocation rates were like before the fix and after? |
@galderz I have attached those data to JIRA https://issues.jboss.org/browse/ISPN-5329 - I'll yet add the rates before the PR. |
@@ -40,6 +42,7 @@ protected final Object currentRequestor() { | |||
} | |||
|
|||
private void setCurrentRequestor(Object requestor) { | |||
assert requestorOnStack.get() == null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaik We don't enable runtime assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During tests, assertions are enabled. Of course, in production this would not be called (and that's the idea!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unaware that we enabled them on tests. But this still seems like maybe we should be throwing an IllegalStateException if this occurs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this can really happen, and I don't want to read ThreadLocal when I don't have to. Assert does not hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But assert does hurt, if it slows down the test suite :)
I'd remove the assert altogether, and maybe add a TL leak test for OwnableReentrantLock
. The TL is only used internally, so there's no point to check it every single time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some other potential optimizations here:
- Only set/unset the TL in
lock(Object)
if the CAS operation fails. unlock(Object)
shouldn't need the TL at all.
And a possible problem: lockInterruptibly()
tryLock()
should throw an UnsupportedOperationException
, because they can't set the owner.
Updated JIRA with pre and post allocation data. |
@@ -137,4 +114,36 @@ public String toString() { | |||
"locks=" + locks + | |||
'}'; | |||
} | |||
|
|||
private class Locker implements EquivalentConcurrentHashMapV8.BiFun<Object, L, L> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really help with allocation rates ? I am more curious for myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not get results precise enough to be able to compare these two implementations - in my benchmarks, Infinispan was sometimes suddenly faster for a brief period of time without any obvious reason, spoiling the results.
However, by merging the function and value returned by reference, you allocate only one object instead of two. I can't tell whether this would be eliminated by compiler anyway.
if (holder == null) throw new IllegalArgumentException(); | ||
type = Type.SINGLE; | ||
} else if (entries.length < ARRAY_SIZE) { | ||
Object[] array = new Object[ARRAY_SIZE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not using add()? like
holder = new Object[ARRAY_SIZE];
type = Type.ARRAY;
for (T entry : entries) add(entry);
the code will be simplified and the performance would be the same. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use the same thing for collection constructor and use addAll()
@pruivo You're right, I've replaced the calls with add/addAll. And thanks for spotting the other bugs; I was considering adding guava testlib as test dependency so that I could test those collections properly, but I got some problems with that (and haven't spent much time with it, hoping that the general testsuite will catch bugs anyway - apparently it can't). I don't want to write standard collection test myself. |
@rvansa one last comment... the iterators should in |
@pruivo I didn't want to burden it with another field for modcount (MiniList gets that inherited from AbstractList, AbstractSet does not contain it), but if you think that it's beneficial, I can go for that. |
@rvansa well, if you don't want to add the modcout, at least check if the MiniSet type is the same as the Iterator type... and it would be good to see what happen if the type of the MiniSet changes during the iteration:
and a similar with the array => map change |
@rvansa The data in the JIRA only mentions the number of allocations, not the actual performance... And speaking of the number of allocations, it's pretty hard to read the numbers in the issue, maybe you change the format to something more diff-like? I used this:
TBH I'm not convinced the additional branching in MiniList is worth the memory savings, as it makes the job of the CPU (and maybe the JIT) harder. The MiniSet benefit seems clearer, because For both of them, I'm not sure the The synchronized versions look good, although they probably won't be as useful once we baseline on Java 9: http://openjdk.java.net/jeps/169 ;) |
I also agree with @pruivo that the collections need at least some minimal tests. The core testsuite may expose bugs by itself, but why make you life harder and rely on random test failures when this should be so much easier to test than almost everything else in Infinispan? :) |
Great initiative 👍 Dan, asking to measure results as direct performance impact is very hard. All we know is that in a full system (which does many other things and runs lots of user code too) you'll generally end up in trouble with memory because of Infinispan using it all/most up for himself. At that point the memory bandwidth is saturated, and the CPU will be mostly sitting idle. Ironically, that means that making some code less efficient will not be measured by any benchmark either, as you'll be putting to better use some CPU cycles which are otherwise wasted in idle waits for main memory, or worse trigger additional context switching exacerbating the problem of unpredictable performance per operation. It's measurable that there is such a problem of excess heap utilization as it's pretty easy to cross all these boundaries even when using Infinispan with a pretty well written client application. The solution is to keep working on these patches, however we'll need many of these before any difference can be measured. That said, @rvansa make sure you don't focus just on instance numbers but prioritize based on size of instances? I'm not sure how you're measuring, but the JIRA mentions that among others you're ignoring arrays. |
Well looks like that was 16+ months ago when I wrote about those tools...! But I fixed it only recently: Combined with this much older one, that saved approx. 130GB / second of pointless allocations on our benchmark: |
+1 to add thourough unit tests BTW Couldn't Generally speaking, you're replacing some concurrent sets with "mini" concurrent sets. Can you 100% guarantee that these use cases are small? Maybe at least checking |
@Sanne I have to disagree with you here... When optimizing, there are clean-cut cases where we can just stop doing unnecessary stuff, but most of the time we also have to do something else instead. We can guess beforehand that that "something else" is going to be faster than the old stuff beforehand, but we also need to test that assumption after making our changes. And if we can't write a single test that shows an improvement, we have to accept that our initial assumption was wrong. To use your LinkedList suggestion as an example... how can we possibly decide which is better, MiniList or LinkedList? Is it true for all tx sizes, or is there an "unreasonable" tx size threshold that we will ignore in our tests? My personal theory is that a specialized singly-linked list that doesn't support null values or removing elements should be the fastest option for small txs, but who knows... |
@danberindei yea you're right, I overreacted a bit to highlight the other point of view. Just making the point that it's not easy to measure at all. Sometimes one has to simply trust it, but I agree only if the difference can be "read" in the source code. As in the To find out though, one would need a micro benchmark comparing just the List implementation behaviour and focusing on memory allocation costs only, I don't see how one could measure the difference in a full-stack Infinispan benchmark. What concerns me the most here is that some collections are replaced with "mini" implementations, but I'm not seeing strong evidence that those collections will really be small in practice. |
@Sanne I can't tell you how many writes will the user usually do in one transaction. However, MiniSet set shouldn't be ever worse than HashSet (which delegates to internal HashMap anyway) and MiniList shouldn't be worse than ArrayList (for directly implemented operations), besides single switch. I am not objecting the fact that I should add the collections tests. And regarding performance - I've tried to do full-stack comparison of puts and gets, but for some reason I had much faster outliers sometimes so I don't trust the results :-/ |
@rvansa You can always ignore the outliers :) What test were you using? |
@danberindei JMH, doing puts and gets against local cache. Code here https://github.com/rvansa/benchmarks/tree/master/ispntest @Sanne And regarding LinkedList, the idea of MiniList was to replace allocating another instance (Node) with having that single object directly in the list. But I appreciate your note that size of the object is what matters rather than number of instances, I should bear that in mind, too. |
Needs rebasing |
* replaced HashSet, ArrayList, LinkedList and Collections.synchronizedXXX with memory-optimized collections
* AtomicFieldUpdater instead of AtomicInteger * replaced ByRef with field in invoked function
* default metadata don't create copy * EmbeddedMetadata$Builder is not used when not necessary
Added the additional check as recommended by @pruivo and rebased. Beyond that, I ran JMH benchmark few days ago and I can see little improvement in writes (testPutImplicit inserts single entry, possibly using auto-commit transaction with transactional cache): |
Temporarily closing. |
I have analyzed allocations during put() (into local cache) and removed some of the low-hanging fruits. Some of those could be already handled by escape analysis, but most of them cannot, IMO. Though, the difference between performance before and after was within the margin of error of the benchmark.
You can find both the benchmark and Byteman tooling for analysis at https://github.com/rvansa/benchmarks/tree/master/ispntest