Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-38867] Optimize Actionable.getAllActions #2582

Merged
merged 13 commits into from
Nov 27, 2016

Conversation

jglick
Copy link
Member

@jglick jglick commented Oct 6, 2016

Saw some slow request stack traces that included threads apparently running inside, for example,

hudson.model.Actionable.getAllActions(Actionable.java:103)
hudson.model.Actionable.getAction(Actionable.java:165)
com.cloudbees.workflow.cps.checkpoint.CheckpointNodeAction.getAction(CheckpointNodeAction.java:59)
com.cloudbees.workflow.pipeline.stageview.rest.CloudBeesFlowNodeUtil.getStageCheckpoints(CloudBeesFlowNodeUtil.java:48)
com.cloudbees.workflow.pipeline.stageview.rest.RunCheckpointAPI.getCheckpointInfo(RunCheckpointAPI.java:65)
com.cloudbees.workflow.pipeline.stageview.rest.JobCheckpointAPI.doDynamic(JobCheckpointAPI.java:49)

In this particular case the CheckpointNodeAction and other classes below it are proprietary (and @svanoort claims that a pipeline-stage-view update introduces its own caching layer here), but at any rate we can expect getAllActions to be called very frequently from all sorts of places, so it is worth optimizing. This patch

  • avoids copying the mutable getActions into a new ArrayList unless it is actually being extended
  • caches the TransientActionFactorys applicable to a given type

@reviewbybees

@ghost
Copy link

ghost commented Oct 6, 2016

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

Copy link
Member

@stephenc stephenc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about using guava as it has burned me many times in the past, but as this is internal only use and not exposed to plugin

🐝

LOGGER.log(Level.SEVERE, "Could not load actions from " + taf + " for " + this, e);
List<Action> _actions = getActions();
boolean adding = false;
synchronized (Actionable.class) {
Copy link
Member

@svanoort svanoort Oct 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 ❗️ 💥 We're synchronizing every single getAllActions call on this single class. Instant lock contention all over the place.

Synchronize on this.getClass() I think. If we can avoid synchronization at all we should, though.

Edit: unless my coffee is still kicking in and I've misunderstood -- intent is to synchronize on this specific class, not the overall actionable if at all possible though, and better neither.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually @svanoort you are reading this wrong. factoryClass is a lazy singleton cache. Would be better to leverage the lazy instantiation pattern...

private static final class ResourceHolder {
  static final LoadingCache<Class<? extends Actionable>, Collection<? extends TransientActionFactory<?>>> factoryCache;
  static {
                @SuppressWarnings("rawtypes")
                final ExtensionList<TransientActionFactory> allFactories = ExtensionList.lookup(TransientActionFactory.class);
                factoryCache = CacheBuilder.newBuilder().build(new CacheLoader<Class<? extends Actionable>, Collection<? extends TransientActionFactory<?>>>() {
                    @Override
                    public Collection<? extends TransientActionFactory<?>> load(Class<? extends Actionable> implType) throws Exception {
                        List<TransientActionFactory<?>> factories = new ArrayList<>();
                        for (TransientActionFactory<?> taf : allFactories) {
                            if (taf.type().isAssignableFrom(implType)) {
                                factories.add(taf);
                            }
                        }
                        return factories;
                    }
                });
                allFactories.addListener(new ExtensionListListener() {
                    @Override
                    public void onChange() {
                        factoryCache.invalidateAll();
                    }
                });
  }
}

And then here we just go ResourceHolder.factoryCache without care and the JVM can optimize better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, so more coffee it is then. Agree totally that the ResourceHolder approach is far better and will improve performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt that it matters, since after the cache is initialized during startup the code will just be doing a null check and the JVM is good at optimizing away contention on monitors in simple cases, but if it makes you happier I can switch to a resource holder pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that is silly. Any kind of cache lookup needs to acquire a lock anyway, so we might as well use just one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, there is a more subtle issue with restarting Jenkins which I will work to fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, maybe not - explicit contention for something that will be invoked often by numerous threads raises a flag for me on general principles. I agree it shouldn't be as expensive as initially I thought when skimming through (and hopefully won't be a problem), but still worth a 🐜 on general principles.

Any kind of cache lookup needs to acquire a lock anyway, so we might as well use just one

IIRC most of the hashmaps and concurrent maps are based on locking per-slot and not on the whole, so contention is extremely rare. Actions are likely to get requested by many threads, and frequently, so contention will be high, even if very brief.

@svanoort
Copy link
Member

svanoort commented Oct 7, 2016

🐛 I think this is an excellent optimization target, and caching here could dramatically improve performance in some cases.

However I feel there are some serious concerns with this specific implementation (sorry, I know probably not what you want to hear). Besides some serious thread-contention issues, I've seen the Guava cache perform... er, rather terribly in some cases. (Side comment: once Jenkins goes fully over to Java 8 for source, we will want to rip out every Guava caching use possible and replace with Caffeine which provides the same functionality but is MUCH faster in benchmarks, often 3-4x).

Would want to know how many TransientActionFactories it's testing against, and if possible a benchmark.

allFactories.addListener(new ExtensionListListener() {
@Override
public void onChange() {
factoryCache.invalidateAll();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often will this get called?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After startup it should only get called if you dynamically install a plugin, which is rare.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. I can see other places this strategy might be useful, so may borrow it in the future.

@jglick
Copy link
Member Author

jglick commented Oct 10, 2016

I've seen the Guava cache perform rather terribly in some cases

Any details? We use Guava caches in many places. If there is a large demonstrable overhead, would be easy enough to hand-roll a cache (it is just a Map really).

Would want to know how many TransientActionFactories it's testing against

Not sure I understand the question.

if possible a benchmark

I am afraid we have no infrastructure for meaningful benchmarks.

@stephenc
Copy link
Member

Guava has a different contract with regard to breaking changes from that followed by Jenkins. This means I have been burned by guava changes when using guava from plugins.

When using guava from core as long as we do not expose it to plugins, there is no issue as that aligns with the usage model that guava's compatibility policy was developed for... as long as public/protected methods do not declare guava return types / parameters the use within core is safely encapsulated and I am ok with it.

@stephenc
Copy link
Member

I think @svanoort is suffering from premature optimisation syndrome.

The current code hits ExtensionList.lookup and friends so will hit a global lock anyway IIRC... replacing that with a class local lock will not make things worse. The other concerns are excessive optimisation without evidence... I think this is fine to go with as is... if evidence shows the classlock as a hotspot, the obvious fix is the ResourceHolder singleton pattern... but to do that now smacks of YANGNI

@jglick
Copy link
Member Author

jglick commented Oct 10, 2016

Guava has a different contract with regard to breaking changes from that followed by Jenkins.

It is marked beta in the v11 that core currently bundles, but not in v19, so they are promising compatibility for it.

as long as public/protected methods do not declare guava return types / parameters

I was not planning on exposing it in API signatures.

The other concerns are excessive optimisation without evidence

Yes, definitely.

@jglick jglick changed the title Optimize Actionable.getAllActions [JENKINS-38867] Optimize Actionable.getAllActions Oct 10, 2016
· Move the cache code to TransientActionFactory itself, for better encapsulation.
· Optimize getAction(Class) to not need to call getAllActions; avoids copying lists, and can avoid calling TransientActionFactory at all.
· Ensure that we maintain a separate cache per ExtensionList instance, so that static state is not leaked across Jenkins restarts.
@stephenc
Copy link
Member

🐝

@svanoort
Copy link
Member

svanoort commented Oct 10, 2016

@jglick WRT to guava performance, there's a link to benchmarks in one of my comments here. See also: google/guava#2063

If there is a large demonstrable overhead,

Such as being an order of magnitude slower than ConcurrentHashMap? Should we roll our own caching solution? I'd argue absolutely not because some of the overhead comes from helpful features, but we should be smart about what we expect caching to deliver.

"premature optimization syndrome"

@stephenc It's easy to do name-calling, but without any benchmarks or official profiling at this point... isn't it all more or less premature? Have you done profiling here at all? I have, and can confirm that getAction is more expensive than it "should" be... but can't say for sure if this will improve the situation.
My gut says yes, but my gut also has been known to make noises for no reason (especially after tacos).

Personally I'm hesitant to recommend this be merged until we've at least run a trivial benchmark, given how much getAction/getActions are called.

@svanoort
Copy link
Member

I would very much like to see a trivial benchmark showing the impact of this (I expect it to be positive but do not know how much). Pipeline is the easiest case, since you can load up a simple pipeline and visualize it.

One case I've used is to create a pipeline with the following and run 10x:

for (int i=0; i<15; i++) {
    stage "stage $i" 
    echo "ran my stage is $i"        
    node {
        sh 'whoami';
    }
}

stage 'label based'
echo 'wait for executor'
node {
    stage 'things using node'
    for (int i=0; i<200; i++) {
        echo "we waited for this $i seconds"    
    }
}

Then I close all browser windows, restart Jenkins, go into chrome dev mode, visit the job page, and measure time to fetch the initial runs list in stage view. It's not a perfect benchmark (see also the ongoing work on frameworkized benchmarks) but it is fast to execute and gives us a rough measure.

@stephenc
Copy link
Member

@svanoort still sounds like YAGNI and premature optimization... the effort spent debating the caching framework takes away from the ability to actually improve other things...

You optimize the worst thing and only the worst thing, and you only optimize it until it is no longer the worst thing... then you stop optimizing it and start optimizing the new worst thing.

Switching from the old code to guava removes one hot path, now we need to see where the next hot path lies... adding other frameworks to core comes with great risk (at least until we start locking down the classes exposed from Jenkins core)... rolling our own cache is only worth the effort if we know this is still the hot path... hence premature optimization syndrome... we all suffer from it from time to time... the siren's call is tempting... just ensure you've stocked up on beeswax

@svanoort
Copy link
Member

@stephenc I'm not sure what debate of the caching framework you're seeing but I haven't added one -- just provided the evidence @jglick requested to back up my assertions. My concerns aren't about optimizing the framework used (again: until we drop Java 7 support, then we switch to Caffeine because it's a no-brainer). I just want people to be aware that the overheads of caching sometimes limit its value.

The first rule and golden of benchmarking, even more than YAGNI, is "measure measure measure" though, and I do want to see a trivial measurement. This is because I too know the siren-song temptation of premature optimization... and also the Scylla-and-Charybdis struggle of optimizing for big-O performance and losing to constant-time overheads (see also: why LinkedList is usually a bad idea).

Plus if this delivers big gains, it's super helpful to have a tidy number to advertise to users as "this is why you really need to upgrade to Jenkins version X - YY% performance improvement."

@jglick
Copy link
Member Author

jglick commented Oct 11, 2016

an order of magnitude slower than ConcurrentHashMap

OK. My guess is that this overhead would be modest compared to the cost of actual TransientActionFactory calls.

It is of course an option to retain one part of the patch—that of avoiding needless array copies, and in the case of getAction(Class) often avoiding calls to TransientActionFactory at all—while commenting out the cache of applicable TransientActionFactorys: could simply make factoriesFor do a filtering iterator.

Or we could switch to a manual cache using WeakHashMap at the top level and ConcurrentHashMap below. Only takes a few minutes to write; mainly it is just a bit more verbose. I have no strong opinion about it.

@jglick jglick added the needs-more-reviews Complex change, which would benefit from more eyes label Oct 11, 2016
for (Action a : getAllActions())
if (type.isInstance(a))
// Shortcut: if the persisted list has one, return it.
for (Action a : getActions()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is likely to deliver rather large benefits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is the clearest win: if you imagine a malicious factory which just sleeps one second and then returns an empty collection, this part of the patch will skip the second delay in the common case that there is a persistent action of the requested type (or one provided by some factory earlier in the list).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it would be a great change

@@ -56,4 +65,37 @@
*/
public abstract @Nonnull Collection<? extends Action> createFor(@Nonnull T target);

@SuppressWarnings("rawtypes")
private static final LoadingCache<ExtensionList<TransientActionFactory>, LoadingCache<Class<?>, List<TransientActionFactory<?>>>> cache =
CacheBuilder.newBuilder().weakKeys().build(new CacheLoader<ExtensionList<TransientActionFactory>, LoadingCache<Class<?>, List<TransientActionFactory<?>>>>() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow :)

@svanoort
Copy link
Member

svanoort commented Oct 12, 2016

OK. My guess is that this overhead would be modest compared to the cost of actual TransientActionFactory calls.

@jglick I'd wager you're right too, just not sure how much. So, I have the first results from the analogous but much simpler change in pipeline itself: jenkinsci/workflow-api-plugin#21 -- it cuts runtime by 50% in a reasonably-constructed benchmark.

My suspicion is that this change will generally improve performance significantly but because (unlike in that case) we have some caching overheads and still have to consider TransientActionFactories to some minimal extent... probably this will improve performance less than the workflow-api change. My gut says 10%-20% might be a good number.

The bonus: it will improve performance everywhere we work with actions, not just pipeline.

@svanoort
Copy link
Member

Or we could switch to a manual cache using WeakHashMap at the top level and ConcurrentHashMap below. Only takes a few minutes to write; mainly it is just a bit more verbose. I have no strong opinion about it.

Would recommend against it initially, since we lose flexibility and some positive threading behavior that way.

@rsandell
Copy link
Member

🐝

@jglick
Copy link
Member Author

jglick commented Oct 13, 2016

Have some other changes under development, hold on…

Copy link
Member

@oleg-nenashev oleg-nenashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for having benchmarks, but I do not see it as a blocker before we have such policy and a documented framework/guideline for them. 🐝

for (Action a : getAllActions())
if (type.isInstance(a))
// Shortcut: if the persisted list has one, return it.
for (Action a : getActions()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it would be a great change

}
// Otherwise check transient factories.
for (TransientActionFactory<?> taf : TransientActionFactory.factoriesFor(getClass(), type)) {
for (Action a : createFor(taf)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also catch/suppress exceptions? Not so good for performance, but commonly we should not trust extension points.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createFor now does that.

// Otherwise check transient factories.
for (TransientActionFactory<?> taf : TransientActionFactory.factoriesFor(getClass(), type)) {
for (Action a : createFor(taf)) {
if (type.isInstance(a)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to get rid of this reflection instance check, but it seems to require the wider API changes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it can be removed.

*/
public abstract @Nonnull Collection<? extends Action> createFor(@Nonnull T target);

private static class CacheKey { // http://stackoverflow.com/a/24336841/12916
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to put such comments to Javadoc btw

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is private anyway.

Copy link
Member

@oleg-nenashev oleg-nenashev Nov 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it complicates the life of contributors, who have to go to another page just to understand the reason

@stephenc
Copy link
Member

🐝

@jglick
Copy link
Member Author

jglick commented Oct 19, 2016

Parking this until @svanoort has a chance to weigh in. The patch as is does demonstrably avoid calls to potentially slow TransientActionFactory implementations; whether the added complexity is actually justified by concrete gains (especially given the workarounds in flight in workflow-api) is an open question.

@jglick jglick added the work-in-progress The PR is under active development, not ready to the final review label Oct 19, 2016
for (TransientActionFactory<?> taf : TransientActionFactory.factoriesFor(getClass(), type)) {
_actions.addAll(Util.filter(createFor(taf), type));
}
return Collections.unmodifiableList(_actions);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this improvement really worth dealing with potential breakage in whatever code is dumb enough to modify the returned list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt there is any such code, but if there is, I am happy for it to break.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Jesse. Modification of the filtered list is not a good idea in any case.
We either expose the internal representation or send changes to /dev/null

BTW, it would be great to Javadoc the fact that the list should not be modified

@oleg-nenashev oleg-nenashev added the unresolved-merge-conflict There is a merge conflict with the target branch. label Nov 5, 2016
@oleg-nenashev
Copy link
Member

@jglick Are you still working on it? Would be great to get it integrated, but seems it's going to miss the LTS

@jglick
Copy link
Member Author

jglick commented Nov 5, 2016

It is mergeable as far as I am concerned but @svanoort seemed reluctant. If you think it should go in, I can resolve the merge conflicts and tweak the Javadoc.

@svanoort
Copy link
Member

svanoort commented Nov 5, 2016

I'd still like to hit it with an abbreviated version of the benchmark, but haven't had time to set it up yet

@stephenc
Copy link
Member

stephenc commented Nov 6, 2016

I think blocking on a benchmark is a bad precedent

@oleg-nenashev
Copy link
Member

It is mergeable as far as I am concerned but @svanoort seemed reluctant. If you think it should go in, I can resolve the merge conflicts and tweak the Javadoc.

Please do. It's required independently of the benchmarking story. Personally I do not see a strong requirement in benchmarks since it's not an adopted practice in Jenkins core. If @svanoort want to drive this practice and to provide framework/docs, it would be great.

@jglick jglick removed unresolved-merge-conflict There is a merge conflict with the target branch. work-in-progress The PR is under active development, not ready to the final review labels Nov 6, 2016
@svanoort
Copy link
Member

svanoort commented Nov 7, 2016

If you're making a performance optimization and there's some doubt about whether it will achieve its goal (or possibly do the opposite), burden of proof is on the PR author to provide evidence. Same rules as when @oleg-nenashev requested performance tests on #2446 (comment) and I don't see why this would be any different?

That said I'll aim to work on getting the benchmark together tonight so this can get a full yea or nay vote.

@oleg-nenashev
Copy link
Member

@svanoort

Same rules as when @oleg-nenashev requested performance tests on #2446 (comment) and I don't see why this would be any different

Exact cite:

We had a kind of caching in jenkinsci/role-strategy-plugin#13, which caused severe performance regressions even with cache. We need to be very accurate with this PR. Do you plan creating any performance tests?

I've asked if there is a plan to create such tests, but I have not bugged the PR. So all kinds of manual/automatic tests were on the PR creator and other reviewers

@daniel-beck
Copy link
Member

@svanoort Ping

@oleg-nenashev oleg-nenashev added ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback and removed needs-more-reviews Complex change, which would benefit from more eyes labels Nov 27, 2016
@oleg-nenashev
Copy link
Member

Merging since there is no response from @svanoort since my response 3 weeks ago. The change is not going to LTS soon, hence in the case of the performance degradation we have enough time to fix it

@oleg-nenashev oleg-nenashev merged commit 6360b96 into jenkinsci:master Nov 27, 2016
oleg-nenashev added a commit that referenced this pull request Nov 27, 2016
@jglick jglick deleted the TransientActionFactory-opt branch November 29, 2016 15:04
jglick added a commit that referenced this pull request Nov 29, 2016
@jglick jglick mentioned this pull request May 5, 2023
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback
Projects
None yet
7 participants