-
-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-38867] Optimize Actionable.getAllActions #2582
[JENKINS-38867] Optimize Actionable.getAllActions #2582
Conversation
This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned about using guava as it has burned me many times in the past, but as this is internal only use and not exposed to plugin
🐝
LOGGER.log(Level.SEVERE, "Could not load actions from " + taf + " for " + this, e); | ||
List<Action> _actions = getActions(); | ||
boolean adding = false; | ||
synchronized (Actionable.class) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐛 ❗️ 💥 We're synchronizing every single getAllActions call on this single class. Instant lock contention all over the place.
Synchronize on this.getClass() I think. If we can avoid synchronization at all we should, though.
Edit: unless my coffee is still kicking in and I've misunderstood -- intent is to synchronize on this specific class, not the overall actionable if at all possible though, and better neither.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually @svanoort you are reading this wrong. factoryClass
is a lazy singleton cache. Would be better to leverage the lazy instantiation pattern...
private static final class ResourceHolder {
static final LoadingCache<Class<? extends Actionable>, Collection<? extends TransientActionFactory<?>>> factoryCache;
static {
@SuppressWarnings("rawtypes")
final ExtensionList<TransientActionFactory> allFactories = ExtensionList.lookup(TransientActionFactory.class);
factoryCache = CacheBuilder.newBuilder().build(new CacheLoader<Class<? extends Actionable>, Collection<? extends TransientActionFactory<?>>>() {
@Override
public Collection<? extends TransientActionFactory<?>> load(Class<? extends Actionable> implType) throws Exception {
List<TransientActionFactory<?>> factories = new ArrayList<>();
for (TransientActionFactory<?> taf : allFactories) {
if (taf.type().isAssignableFrom(implType)) {
factories.add(taf);
}
}
return factories;
}
});
allFactories.addListener(new ExtensionListListener() {
@Override
public void onChange() {
factoryCache.invalidateAll();
}
});
}
}
And then here we just go ResourceHolder.factoryCache
without care and the JVM can optimize better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, so more coffee it is then. Agree totally that the ResourceHolder approach is far better and will improve performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt that it matters, since after the cache is initialized during startup the code will just be doing a null check and the JVM is good at optimizing away contention on monitors in simple cases, but if it makes you happier I can switch to a resource holder pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that is silly. Any kind of cache lookup needs to acquire a lock anyway, so we might as well use just one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, there is a more subtle issue with restarting Jenkins which I will work to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, maybe not - explicit contention for something that will be invoked often by numerous threads raises a flag for me on general principles. I agree it shouldn't be as expensive as initially I thought when skimming through (and hopefully won't be a problem), but still worth a 🐜 on general principles.
Any kind of cache lookup needs to acquire a lock anyway, so we might as well use just one
IIRC most of the hashmaps and concurrent maps are based on locking per-slot and not on the whole, so contention is extremely rare. Actions are likely to get requested by many threads, and frequently, so contention will be high, even if very brief.
🐛 I think this is an excellent optimization target, and caching here could dramatically improve performance in some cases. However I feel there are some serious concerns with this specific implementation (sorry, I know probably not what you want to hear). Besides some serious thread-contention issues, I've seen the Guava cache perform... er, rather terribly in some cases. (Side comment: once Jenkins goes fully over to Java 8 for source, we will want to rip out every Guava caching use possible and replace with Caffeine which provides the same functionality but is MUCH faster in benchmarks, often 3-4x). Would want to know how many TransientActionFactories it's testing against, and if possible a benchmark. |
allFactories.addListener(new ExtensionListListener() { | ||
@Override | ||
public void onChange() { | ||
factoryCache.invalidateAll(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often will this get called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After startup it should only get called if you dynamically install a plugin, which is rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. I can see other places this strategy might be useful, so may borrow it in the future.
Any details? We use Guava caches in many places. If there is a large demonstrable overhead, would be easy enough to hand-roll a cache (it is just a
Not sure I understand the question.
I am afraid we have no infrastructure for meaningful benchmarks. |
Guava has a different contract with regard to breaking changes from that followed by Jenkins. This means I have been burned by guava changes when using guava from plugins. When using guava from core as long as we do not expose it to plugins, there is no issue as that aligns with the usage model that guava's compatibility policy was developed for... as long as public/protected methods do not declare guava return types / parameters the use within core is safely encapsulated and I am ok with it. |
I think @svanoort is suffering from premature optimisation syndrome. The current code hits |
It is marked beta in the v11 that core currently bundles, but not in v19, so they are promising compatibility for it.
I was not planning on exposing it in API signatures.
Yes, definitely. |
· Move the cache code to TransientActionFactory itself, for better encapsulation. · Optimize getAction(Class) to not need to call getAllActions; avoids copying lists, and can avoid calling TransientActionFactory at all. · Ensure that we maintain a separate cache per ExtensionList instance, so that static state is not leaked across Jenkins restarts.
🐝 |
@jglick WRT to guava performance, there's a link to benchmarks in one of my comments here. See also: google/guava#2063
Such as being an order of magnitude slower than ConcurrentHashMap? Should we roll our own caching solution? I'd argue absolutely not because some of the overhead comes from helpful features, but we should be smart about what we expect caching to deliver.
@stephenc It's easy to do name-calling, but without any benchmarks or official profiling at this point... isn't it all more or less premature? Have you done profiling here at all? I have, and can confirm that getAction is more expensive than it "should" be... but can't say for sure if this will improve the situation. Personally I'm hesitant to recommend this be merged until we've at least run a trivial benchmark, given how much getAction/getActions are called. |
I would very much like to see a trivial benchmark showing the impact of this (I expect it to be positive but do not know how much). Pipeline is the easiest case, since you can load up a simple pipeline and visualize it. One case I've used is to create a pipeline with the following and run 10x: for (int i=0; i<15; i++) {
stage "stage $i"
echo "ran my stage is $i"
node {
sh 'whoami';
}
}
stage 'label based'
echo 'wait for executor'
node {
stage 'things using node'
for (int i=0; i<200; i++) {
echo "we waited for this $i seconds"
}
} Then I close all browser windows, restart Jenkins, go into chrome dev mode, visit the job page, and measure time to fetch the initial runs list in stage view. It's not a perfect benchmark (see also the ongoing work on frameworkized benchmarks) but it is fast to execute and gives us a rough measure. |
@svanoort still sounds like YAGNI and premature optimization... the effort spent debating the caching framework takes away from the ability to actually improve other things... You optimize the worst thing and only the worst thing, and you only optimize it until it is no longer the worst thing... then you stop optimizing it and start optimizing the new worst thing. Switching from the old code to guava removes one hot path, now we need to see where the next hot path lies... adding other frameworks to core comes with great risk (at least until we start locking down the classes exposed from Jenkins core)... rolling our own cache is only worth the effort if we know this is still the hot path... hence premature optimization syndrome... we all suffer from it from time to time... the siren's call is tempting... just ensure you've stocked up on beeswax |
@stephenc I'm not sure what debate of the caching framework you're seeing but I haven't added one -- just provided the evidence @jglick requested to back up my assertions. My concerns aren't about optimizing the framework used (again: until we drop Java 7 support, then we switch to Caffeine because it's a no-brainer). I just want people to be aware that the overheads of caching sometimes limit its value. The first rule and golden of benchmarking, even more than YAGNI, is "measure measure measure" though, and I do want to see a trivial measurement. This is because I too know the siren-song temptation of premature optimization... and also the Scylla-and-Charybdis struggle of optimizing for big-O performance and losing to constant-time overheads (see also: why LinkedList is usually a bad idea). Plus if this delivers big gains, it's super helpful to have a tidy number to advertise to users as "this is why you really need to upgrade to Jenkins version X - YY% performance improvement." |
OK. My guess is that this overhead would be modest compared to the cost of actual It is of course an option to retain one part of the patch—that of avoiding needless array copies, and in the case of Or we could switch to a manual cache using |
for (Action a : getAllActions()) | ||
if (type.isInstance(a)) | ||
// Shortcut: if the persisted list has one, return it. | ||
for (Action a : getActions()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is likely to deliver rather large benefits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is the clearest win: if you imagine a malicious factory which just sleeps one second and then returns an empty collection, this part of the patch will skip the second delay in the common case that there is a persistent action of the requested type (or one provided by some factory earlier in the list).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it would be a great change
@@ -56,4 +65,37 @@ | |||
*/ | |||
public abstract @Nonnull Collection<? extends Action> createFor(@Nonnull T target); | |||
|
|||
@SuppressWarnings("rawtypes") | |||
private static final LoadingCache<ExtensionList<TransientActionFactory>, LoadingCache<Class<?>, List<TransientActionFactory<?>>>> cache = | |||
CacheBuilder.newBuilder().weakKeys().build(new CacheLoader<ExtensionList<TransientActionFactory>, LoadingCache<Class<?>, List<TransientActionFactory<?>>>>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow :)
@jglick I'd wager you're right too, just not sure how much. So, I have the first results from the analogous but much simpler change in pipeline itself: jenkinsci/workflow-api-plugin#21 -- it cuts runtime by 50% in a reasonably-constructed benchmark. My suspicion is that this change will generally improve performance significantly but because (unlike in that case) we have some caching overheads and still have to consider TransientActionFactories to some minimal extent... probably this will improve performance less than the workflow-api change. My gut says 10%-20% might be a good number. The bonus: it will improve performance everywhere we work with actions, not just pipeline. |
Would recommend against it initially, since we lose flexibility and some positive threading behavior that way. |
🐝 |
Have some other changes under development, hold on… |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for having benchmarks, but I do not see it as a blocker before we have such policy and a documented framework/guideline for them. 🐝
for (Action a : getAllActions()) | ||
if (type.isInstance(a)) | ||
// Shortcut: if the persisted list has one, return it. | ||
for (Action a : getActions()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it would be a great change
} | ||
// Otherwise check transient factories. | ||
for (TransientActionFactory<?> taf : TransientActionFactory.factoriesFor(getClass(), type)) { | ||
for (Action a : createFor(taf)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also catch/suppress exceptions? Not so good for performance, but commonly we should not trust extension points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createFor
now does that.
// Otherwise check transient factories. | ||
for (TransientActionFactory<?> taf : TransientActionFactory.factoriesFor(getClass(), type)) { | ||
for (Action a : createFor(taf)) { | ||
if (type.isInstance(a)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to get rid of this reflection instance check, but it seems to require the wider API changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it can be removed.
*/ | ||
public abstract @Nonnull Collection<? extends Action> createFor(@Nonnull T target); | ||
|
||
private static class CacheKey { // http://stackoverflow.com/a/24336841/12916 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to put such comments to Javadoc btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is private
anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it complicates the life of contributors, who have to go to another page just to understand the reason
🐝 |
Parking this until @svanoort has a chance to weigh in. The patch as is does demonstrably avoid calls to potentially slow |
for (TransientActionFactory<?> taf : TransientActionFactory.factoriesFor(getClass(), type)) { | ||
_actions.addAll(Util.filter(createFor(taf), type)); | ||
} | ||
return Collections.unmodifiableList(_actions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this improvement really worth dealing with potential breakage in whatever code is dumb enough to modify the returned list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt there is any such code, but if there is, I am happy for it to break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Jesse. Modification of the filtered list is not a good idea in any case.
We either expose the internal representation or send changes to /dev/null
BTW, it would be great to Javadoc the fact that the list should not be modified
@jglick Are you still working on it? Would be great to get it integrated, but seems it's going to miss the LTS |
It is mergeable as far as I am concerned but @svanoort seemed reluctant. If you think it should go in, I can resolve the merge conflicts and tweak the Javadoc. |
I'd still like to hit it with an abbreviated version of the benchmark, but haven't had time to set it up yet |
I think blocking on a benchmark is a bad precedent |
Please do. It's required independently of the benchmarking story. Personally I do not see a strong requirement in benchmarks since it's not an adopted practice in Jenkins core. If @svanoort want to drive this practice and to provide framework/docs, it would be great. |
If you're making a performance optimization and there's some doubt about whether it will achieve its goal (or possibly do the opposite), burden of proof is on the PR author to provide evidence. Same rules as when @oleg-nenashev requested performance tests on #2446 (comment) and I don't see why this would be any different? That said I'll aim to work on getting the benchmark together tonight so this can get a full yea or nay vote. |
Exact cite:
I've asked if there is a plan to create such tests, but I have not bugged the PR. So all kinds of manual/automatic tests were on the PR creator and other reviewers |
@svanoort Ping |
Merging since there is no response from @svanoort since my response 3 weeks ago. The change is not going to LTS soon, hence in the case of the performance degradation we have enough time to fix it |
Saw some slow request stack traces that included threads apparently running inside, for example,
In this particular case the
CheckpointNodeAction
and other classes below it are proprietary (and @svanoort claims that apipeline-stage-view
update introduces its own caching layer here), but at any rate we can expectgetAllActions
to be called very frequently from all sorts of places, so it is worth optimizing. This patchgetActions
into a newArrayList
unless it is actually being extendedTransientActionFactory
s applicable to a given type@reviewbybees