Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISPN-5329 Reduce number of allocations #3342

Closed
wants to merge 3 commits into from
Closed

Conversation

rvansa
Copy link
Member

@rvansa rvansa commented Mar 26, 2015

I have analyzed allocations during put() (into local cache) and removed some of the low-hanging fruits. Some of those could be already handled by escape analysis, but most of them cannot, IMO. Though, the difference between performance before and after was within the margin of error of the benchmark.

You can find both the benchmark and Byteman tooling for analysis at https://github.com/rvansa/benchmarks/tree/master/ispntest

@galderz
Copy link
Member

galderz commented Mar 26, 2015

Looking into this... do you have some numbers/data to compare what allocation rates were like before the fix and after?

@rvansa
Copy link
Member Author

rvansa commented Mar 26, 2015

@galderz I have attached those data to JIRA https://issues.jboss.org/browse/ISPN-5329 - I'll yet add the rates before the PR.

@@ -40,6 +42,7 @@ protected final Object currentRequestor() {
}

private void setCurrentRequestor(Object requestor) {
assert requestorOnStack.get() == null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik We don't enable runtime assertions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During tests, assertions are enabled. Of course, in production this would not be called (and that's the idea!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unaware that we enabled them on tests. But this still seems like maybe we should be throwing an IllegalStateException if this occurs ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can really happen, and I don't want to read ThreadLocal when I don't have to. Assert does not hurt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But assert does hurt, if it slows down the test suite :)

I'd remove the assert altogether, and maybe add a TL leak test for OwnableReentrantLock. The TL is only used internally, so there's no point to check it every single time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some other potential optimizations here:

  • Only set/unset the TL in lock(Object) if the CAS operation fails.
  • unlock(Object) shouldn't need the TL at all.

And a possible problem: lockInterruptibly() tryLock() should throw an UnsupportedOperationException, because they can't set the owner.

@rvansa
Copy link
Member Author

rvansa commented Mar 26, 2015

Updated JIRA with pre and post allocation data.

@@ -137,4 +114,36 @@ public String toString() {
"locks=" + locks +
'}';
}

private class Locker implements EquivalentConcurrentHashMapV8.BiFun<Object, L, L> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really help with allocation rates ? I am more curious for myself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not get results precise enough to be able to compare these two implementations - in my benchmarks, Infinispan was sometimes suddenly faster for a brief period of time without any obvious reason, spoiling the results.
However, by merging the function and value returned by reference, you allocate only one object instead of two. I can't tell whether this would be eliminated by compiler anyway.

@rvansa
Copy link
Member Author

rvansa commented Mar 31, 2015

@wburns @pruivo Updated PR

if (holder == null) throw new IllegalArgumentException();
type = Type.SINGLE;
} else if (entries.length < ARRAY_SIZE) {
Object[] array = new Object[ARRAY_SIZE];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not using add()? like

holder = new Object[ARRAY_SIZE];
type = Type.ARRAY;
for (T entry : entries) add(entry);

the code will be simplified and the performance would be the same. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use the same thing for collection constructor and use addAll()

@rvansa
Copy link
Member Author

rvansa commented Mar 31, 2015

@pruivo You're right, I've replaced the calls with add/addAll. And thanks for spotting the other bugs; I was considering adding guava testlib as test dependency so that I could test those collections properly, but I got some problems with that (and haven't spent much time with it, hoping that the general testsuite will catch bugs anyway - apparently it can't). I don't want to write standard collection test myself.

@pruivo
Copy link
Member

pruivo commented Mar 31, 2015

@rvansa one last comment... the iterators should in MiniSet should check for modifications. If the type of the set moves (eg. from SINGLE -> ARRAY or from ARRAY -> MAP) you will get a bunch of weird exceptions (ClassCastException) or results

@rvansa
Copy link
Member Author

rvansa commented Apr 1, 2015

@pruivo I didn't want to burden it with another field for modcount (MiniList gets that inherited from AbstractList, AbstractSet does not contain it), but if you think that it's beneficial, I can go for that.

@pruivo
Copy link
Member

pruivo commented Apr 1, 2015

@rvansa well, if you don't want to add the modcout, at least check if the MiniSet type is the same as the Iterator type...

and it would be good to see what happen if the type of the MiniSet changes during the iteration:

Set<String> set = new MiniSet<>("single-value");
Iterator<String> iterator = set.iterator();
set.add("another-value");
assertTrue(iterator.hasNext())
assertEquals("single-value", iterator.next());

and a similar with the array => map change

@danberindei
Copy link
Member

@rvansa The data in the JIRA only mentions the number of allocations, not the actual performance... And speaking of the number of allocations, it's pretty hard to read the numbers in the issue, maybe you change the format to something more diff-like? I used this:

diff -u <(sort -k1.20 before.txt| sed 's/(.*)//') <(sort -k1.20 after.txt | sed 's/(.*)//')

TBH I'm not convinced the additional branching in MiniList is worth the memory savings, as it makes the job of the CPU (and maybe the JIT) harder.

The MiniSet benefit seems clearer, because HashSet needs so many allocations, but here too I think a simpler implementation with only two storage options (Object[] and HashMap) may be better.

For both of them, I'm not sure the type field is needed. Since you can't really put an Object[] in the cache we can just say these collections don't work with Object[].

The synchronized versions look good, although they probably won't be as useful once we baseline on Java 9: http://openjdk.java.net/jeps/169 ;)

@danberindei
Copy link
Member

I also agree with @pruivo that the collections need at least some minimal tests. The core testsuite may expose bugs by itself, but why make you life harder and rely on random test failures when this should be so much easier to test than almost everything else in Infinispan? :)

@Sanne
Copy link
Member

Sanne commented Apr 3, 2015

Great initiative 👍

Dan, asking to measure results as direct performance impact is very hard. All we know is that in a full system (which does many other things and runs lots of user code too) you'll generally end up in trouble with memory because of Infinispan using it all/most up for himself.
You might not be able to measure this at all by just measuring Infinispan; the problem is crossing the threshold of how much is sensible to allocate in terms of temporary objects. As long as you don't cross the threshold, it might look like (almost) free to allocate some temporary helpers, so like making a defensive copy of a Map might look like "free". But when doing such things extensively, the system falls over.

At that point the memory bandwidth is saturated, and the CPU will be mostly sitting idle. Ironically, that means that making some code less efficient will not be measured by any benchmark either, as you'll be putting to better use some CPU cycles which are otherwise wasted in idle waits for main memory, or worse trigger additional context switching exacerbating the problem of unpredictable performance per operation.

It's measurable that there is such a problem of excess heap utilization as it's pretty easy to cross all these boundaries even when using Infinispan with a pretty well written client application. The solution is to keep working on these patches, however we'll need many of these before any difference can be measured.

That said, @rvansa make sure you don't focus just on instance numbers but prioritize based on size of instances? I'm not sure how you're measuring, but the JIRA mentions that among others you're ignoring arrays.
I've sent an email many months ago highlighting the urgency of getting rid of some Collections altogether (not just optimise them), as I had used several tools (among others flight recorder and java object layout), these can also help you to identify the biggest offenders.

@Sanne
Copy link
Member

Sanne commented Apr 3, 2015

Well looks like that was 16+ months ago when I wrote about those tools...!
http://lists.jboss.org/pipermail/infinispan-dev/2013-December/014311.html

But I fixed it only recently:

Combined with this much older one, that saved approx. 130GB / second of pointless allocations on our benchmark:

@Sanne
Copy link
Member

Sanne commented Apr 3, 2015

+1 to add thourough unit tests BTW

Couldn't MiniList be replaced by LinkedList? If you expect the lists to be small, that's not much overhead over an array. Certainly all the System.arraycopy you have to perform on add/remove will defeat the point of not polluting heap with small objects.
Also, in the smallest case I think LinkedList would perform not worse than this new implementation.

Generally speaking, you're replacing some concurrent sets with "mini" concurrent sets. Can you 100% guarantee that these use cases are small? Maybe at least checking initialCapacity is in order?

@danberindei
Copy link
Member

@Sanne I have to disagree with you here...

When optimizing, there are clean-cut cases where we can just stop doing unnecessary stuff, but most of the time we also have to do something else instead. We can guess beforehand that that "something else" is going to be faster than the old stuff beforehand, but we also need to test that assumption after making our changes. And if we can't write a single test that shows an improvement, we have to accept that our initial assumption was wrong.

To use your LinkedList suggestion as an example... how can we possibly decide which is better, MiniList or LinkedList? Is it true for all tx sizes, or is there an "unreasonable" tx size threshold that we will ignore in our tests? My personal theory is that a specialized singly-linked list that doesn't support null values or removing elements should be the fastest option for small txs, but who knows...

@Sanne
Copy link
Member

Sanne commented Apr 10, 2015

@danberindei yea you're right, I overreacted a bit to highlight the other point of view. Just making the point that it's not easy to measure at all.

Sometimes one has to simply trust it, but I agree only if the difference can be "read" in the source code. As in the LinkedList example, I'm not sure of that.

To find out though, one would need a micro benchmark comparing just the List implementation behaviour and focusing on memory allocation costs only, I don't see how one could measure the difference in a full-stack Infinispan benchmark.
One would also want to see the computational difference in such a benchmark, but I'd argue that we know enough of Infinispan to know that if the memory benefit is significant, a small cost (if it's small) should be worth it.

What concerns me the most here is that some collections are replaced with "mini" implementations, but I'm not seeing strong evidence that those collections will really be small in practice.

@rvansa
Copy link
Member Author

rvansa commented Apr 10, 2015

@Sanne I can't tell you how many writes will the user usually do in one transaction. However, MiniSet set shouldn't be ever worse than HashSet (which delegates to internal HashMap anyway) and MiniList shouldn't be worse than ArrayList (for directly implemented operations), besides single switch.
I've seen similar optimizations elsewhere, therefore I've implemented this PR.

I am not objecting the fact that I should add the collections tests. And regarding performance - I've tried to do full-stack comparison of puts and gets, but for some reason I had much faster outliers sometimes so I don't trust the results :-/

@danberindei
Copy link
Member

@rvansa You can always ignore the outliers :) What test were you using?

@rvansa
Copy link
Member Author

rvansa commented Apr 11, 2015

@danberindei JMH, doing puts and gets against local cache. Code here https://github.com/rvansa/benchmarks/tree/master/ispntest

@Sanne And regarding LinkedList, the idea of MiniList was to replace allocating another instance (Node) with having that single object directly in the list. But I appreciate your note that size of the object is what matters rather than number of instances, I should bear that in mind, too.

@galderz
Copy link
Member

galderz commented Apr 16, 2015

Needs rebasing

rvansa added 3 commits May 5, 2015 13:21
* replaced HashSet, ArrayList, LinkedList and Collections.synchronizedXXX with memory-optimized collections
* AtomicFieldUpdater instead of AtomicInteger
* replaced ByRef with field in invoked function
* default metadata don't create copy
* EmbeddedMetadata$Builder is not used when not necessary
@rvansa
Copy link
Member Author

rvansa commented May 5, 2015

Added the additional check as recommended by @pruivo and rebased. Beyond that, I ran JMH benchmark few days ago and I can see little improvement in writes (testPutImplicit inserts single entry, possibly using auto-commit transaction with transactional cache):
https://gist.github.com/rvansa/ba6195ff6ebecfdf6011

@rvansa
Copy link
Member Author

rvansa commented May 11, 2015

Temporarily closing.

@rvansa rvansa closed this May 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants