(PUP-5482) Flag that a type could not be found by nbarrientos · Pull Request #4427 · puppetlabs/puppet

nbarrientos · 2015-11-09T15:35:11Z

If a Ruby type is not available in the file system, every time it is found by
the parser, the Autoloader will scan all over again the directories in
search_directories for it. This generates tons of stat()s that can be saved.

This patch annotates that a type could not be found so the next time it is
required during the compilation of a catalog, the file system is not hammered
again unnecessarily.

This optimization has given us a ~61% reduction in the number of stats per
compilation, as we make an extensive use of the concat module which only
provides Puppet "defined types".

joshcooper · 2015-11-10T00:04:40Z

Thank you for your contribution @nbarrientos. This is reminiscent of PUP-1592, in which we short circuited the lookup process for defined types, see #2373 Perhaps we have regressed? /cc @hlindberg

hlindberg · 2015-11-10T00:45:50Z

Sure sounds familiar. What we did then was to not search for ruby resource types when the name was namespaced (since currently types cannot be name spaced). We did nothing for user defined types though. Back when this was done, it was expected that adding a missing file would mean it got picked up. Now with environment caching it is different.

Note that the new loaders (currently only used for 4.x functions) scans the file system once for an environment and keeps that information. The intent is to switch to the new loaders also for types and defines.

hlindberg · 2015-11-10T00:53:26Z

While I am not completely familiar with the lifecycle of metatype/manager.rb stuff - the change looks ok - it is just checking if it knows or not, the change does not affect anything having to do with how this information is bound/cached etc. Looks ok for 4.x. (This change would not have worked on 3.x due to the expectancy to be able to find added files).

hlindberg · 2015-11-10T01:01:49Z

I am curious - are there lots of user defined types that have a name that is not fully qualified? Otherwise I cannot see how the change in this PR helps since the first thing the type method does is to return nil if there is a ':' in the name.

nbarrientos · 2015-11-10T07:11:26Z

Thanks to both of you for looking at the patch. Probably we wouldn't have hit this in the first place if we were using concat fully qualified as you mentioned due to this. However, the optimization might still make sense to avoid degrading the performance if users don't use the types sensibly.

@hlindberg, We're actually testing the patch on 3.8.4 as we haven't moved to 4.x yet, could you please elaborate on why it wouldn't work? We're seeing a significant improvement in the performance with the patch however we might be breaking something else without noticing :)

nbarrientos · 2015-11-10T08:00:17Z

Actually in our case this makes a big difference as, even though our catalog only contains 6 resources of type Concat, type() is called with "concat" as parameter 172 times per compilation forcing the autoloader to scan the disk. However, we have plenty of Concat::Fragment resources so maybe there's a bug somewhere triggering a call to type("concat") when a Concat::Fragment is found. Otherwise I fail to find right now an explanation for such a big number of calls. Nevertheless, with the patch on, the situation becomes less dramatic as only one of those 172 has to actually stat() the disk.

However, I can also see that the function is called several times with "::concat::fragment" as parameter and those calls return indeed immediately after the check in L140.

nbarrientos · 2015-11-10T10:00:50Z

Well, I've just repeated the same experiment with the Service type and the behavior described above showed up so I'm assuming that's normal. There are only 15 resources of type Service in our catalog, however type("service") is called 100 times. This has no impact though as the type is found the first time that the function is called so no disk hammering here.

However these multiple calls that are taking place kind of justify in my opinion the need to have a protection aimed to types that are not Ruby ones, in order to avoid unnecessary stat()s that are issued when resolving the few Puppet defined types that don't have "::" in their names (like concat).

hlindberg · 2015-11-10T14:16:42Z

The problem with 3.x (non directory environments) is that it is expected to be able to find files dynamically at any time. A 3.x environment keeps monitoring files that are in use, and if they change the cache is evicted. That works fine for existing (or files being removed), but it does not work when files are added if the cache contains a blocking not-found/nil entry since there will never again be a search for the corresponding file (until cache is evicted for other reasons). You would need to touch another file if a type is added or restart the master. In practice, say when using r10k, it is most likely that other files are changed at the same time. Thus for 3.x and some users configurations, the proposed patch would change the behavior when switching between versions of the environment and if a user is unlucky the change would not be recognized.

This is in contrast to 4.x and directory based environments where the caching semantics are different.

I suggest that the patch is accepted for 4.x, but not for 3.x, and that module authors should be encouraged to change all unqualified references to qualified (since that also speeds up the resolution in other ways).

There are test errors to deal with as well before accepting.

nbarrientos · 2015-11-10T14:53:51Z

Okay, thanks. A question, though. Would it be safe to backport the patch and deploy it locally as we're using 3.x and directory environments?

hlindberg · 2015-11-10T15:51:32Z

@nbarrientos yes, in your own environment - I don't think we want to merge the patch into 3.x since it supports both dynamic and directory based environments. With 4.x (and directory based env in 3.x), everything (including types cache) is evicted when the environment expires.

nbarrientos · 2015-11-10T15:56:14Z

Okay, will then add it to our local 3.x Puppet build and roll it out. We'll take it out from our list of patches to apply once we're on 4.x :) I'm trying to turn the tests green now. Thanks again.

If a Ruby type is not available in the file system, every time it is found by the parser, the Autoloader will scan all over again the directories in search_directories for it. This generates tons of stat()s that can be saved. This patch annotates that a type could not be found so the next time it is required during the compilation of a catalog, the file system is not hammered again unnecessarily. This optimization has given us a ~61% reduction in the number of stats per compilation, as we make an extensive use of the concat module which only provides Puppet "defined types".

nbarrientos · 2015-11-10T16:27:37Z

Alright, so everything is green now. However, I'm not totally sure that the way I've made the tests happy is ideal. Let me know :)

nbarrientos · 2015-11-17T21:34:48Z

FTR, today we started to roll out the patch to one of our pools of Puppet masters at around 16:35. The following plot represents the number of stat()s made by all Puppet masters in that pool vs time. The results in production match what he saw during the testing phase, a 55-60% gain. The average compilation time has also decreased 40%.

HAIL9000 · 2015-11-24T18:19:24Z

Ping @hlindberg, are you alright with cherry-picking and merging this after we do some testing?

hlindberg · 2015-11-24T23:06:32Z

@HAIL9000 I am fine with this patch for 4.x - so stable and master are fine targets. (Should not be cherry picked to 3.x as noted in comments earlier).

joshcooper · 2015-11-24T23:18:21Z

I'm not sure this is desired behavior on the agent, since pluginsync can make new types and providers available during its run.

It's probably ok on the puppet master, but I know some people rely on the agent running on the master to bootstrap the master (which is why we couldn't separate the agent and master libdir), so I think the same problem could happen on the master and would be considering a breaking change.

That said, we added an always_cache_features puppet setting that will cache the result of feature evaluation, so for example if Puppet.features.msgpack? is false, cache the result, and don't try to load msgpack again.

What about putting this new behavior behind a feature flag? In retrospect, it might have been better to create one puppet setting that meant "disable agent cache invalidation", rather than have N different settings...

Another area of improvement would be to not clear gem paths when loading a feature: https://github.com/puppetlabs/puppet/blob/master/lib/puppet/util/feature.rb#L85 That logic is mainly for the agent, so that we can use a gem installed in the same run. But probably has negative performance impact on the server.

/cc @camlow325 @cprice404

kylog · 2015-12-14T19:56:55Z

I like the idea of putting this behind a feature flag.

See also #2951, which proposed a similar PR to address https://tickets.puppetlabs.com/browse/PUP-3038.

hlindberg · 2016-02-11T00:37:11Z

Long term, the compiler will use the 4.x loaders instead of the @AutoLoader route. It already does that for new things; 4x functions, type aliases, but will use those loaders more. In 4.4.0 we will be loading resource types that way (for type references; only light use still). As that continues the code paths will become more and more separate.

Frankly I would like to see as little change as possible to 3.x code - one fix always breaks something else... this one looked like it was worth it.

The 4.x loaders do cash the misses (and does some other optimizations like promoting values so that they are found as quickly as possible). When compiling the environment is not supposed to change.

joshcooper · 2016-02-22T19:22:38Z

@hlindberg I understand your concerns about not adding new caching behaviors, and am +1 to using 4.x loaders in the future. Having well-defined environment lifetimes makes sense.

But if 4.x loaders cache misses then it will break use cases where hiera backends, functions, and report processors are delivered via agents running on the master. We ran into that issue when we tried to separate the agent and server libdir. IOW, separate the libdirs is a prerequisite for having 4.x loaders always cache misses. There are some on-going discussions about that. /cc @kylog

Second, this PR adds a fairly significant performance improvement to puppetserver. It seems worthwhile to me to add this caching behavior under the existing always_cache_features feature for the time being.

@camlow325 can you take ownership of this PR and work with the contributor or close?

hlindberg · 2016-02-22T22:11:14Z

@joshcooper I would like to understand the use cases where agents bring new things and why that means that loaders cannot cache. The 4.x loaders are tied to the lifecycle of the environment when compiling, they could just as easily be tied to something else (or not at all - which would be the same thing as only caching hits and misses inside the task when the loaders were bound to something. (This because loaders are not in charge of their own overall life cycle). But, maybe it is a case of allowing the "environment" on the agent to change during the course of the agent "apply" ? Could it perhaps detect when it would need to rest the loaders?

Behind an "always cache" seems fine to me. I do think this is a valuable speed up.

kylog · 2016-02-23T00:18:20Z

My $.02:

👍 on adding this behind either a new always_cache setting or a more-specific new always_cache_types (or ??? - naming is hard) setting
I'd rather not reuse the always_cache_features setting for this because that will confuse future me

joshcooper · 2016-03-17T15:13:54Z

Ping @haus since you're on rotation for puppetserver.

ahpook · 2016-03-23T17:18:08Z

If you want a new setting I would reluctantly support that, but please make it default to on. Having N+1 settings where N > 200 and requiring the user to find it and flip the default in order to get a 40% performance boost is... perverse.

haus · 2016-03-23T21:48:11Z

@joshcooper thanks for the ping. I'll pick this up soon.

haus · 2016-03-24T23:45:40Z

@ahpook I don't think we need to default it to true. We could/should explicitly set it to true on the server side as was done for always_cache_features https://github.com/puppetlabs/puppet-server/blob/master/src/ruby/puppet-server-lib/puppet/server/puppet_config.rb#L26. Then there would be no needed work for folks running puppetserver, and for those running agents, they wouldn't experience regressions around loading new types that were delivered during runs, as @joshcooper mentioned in #4427 (comment).

I don't care much about new setting vs using always_cache_features, largely because I can't imagine a case where a user would need/want to enable one and not the other.

hlindberg · 2016-03-25T23:27:07Z

Ping @thallgren - in light of #4812 which is related to this.

haus · 2016-03-28T16:37:16Z

There is one case that worries me here, which is if a user begins using a new type in manifests before it is available (say they forgot to install the module first). In that case compiles will fail, but the type load miss will be cached for the life of the server process, so even after the user installs the type compiles will fail until the jruby expires or the server is restarted/reloaded.

I have a branch that would resolve this case here (the branch also guards the caching behind a setting): https://github.com/haus/puppet/commits/tickets/master/pup-5482

haus · 2016-03-28T17:25:53Z

I've added a few commits to put this caching behind a setting in #4818. So I'll close this out in favor of that PR.

HAIL9000 added the Pending Triage label Nov 9, 2015

nbarrientos force-pushed the pup5482 branch from 336d630 to 2692231 Compare November 10, 2015 15:58

nbarrientos added 3 commits November 11, 2015 08:53

Merge branch 'master' into pup5482

97d2aa2

Merge branch 'master' into pup5482

9129778

Merge branch 'master' into pup5482

ca0584d

HAIL9000 added Triaged and removed Pending Triage labels Nov 24, 2015

HAIL9000 added the #puppethack label Dec 14, 2015

haus closed this Mar 28, 2016

haus mentioned this pull request Mar 28, 2016

(PUP-5482) Flag that a type could not be found #4818

Merged

Conversation

nbarrientos commented Nov 9, 2015

Uh oh!

joshcooper commented Nov 10, 2015

Uh oh!

hlindberg commented Nov 10, 2015

Uh oh!

hlindberg commented Nov 10, 2015

Uh oh!

hlindberg commented Nov 10, 2015

Uh oh!

nbarrientos commented Nov 10, 2015

Uh oh!

nbarrientos commented Nov 10, 2015

Uh oh!

nbarrientos commented Nov 10, 2015

Uh oh!

hlindberg commented Nov 10, 2015

Uh oh!

nbarrientos commented Nov 10, 2015

Uh oh!

hlindberg commented Nov 10, 2015

Uh oh!

nbarrientos commented Nov 10, 2015

Uh oh!

nbarrientos commented Nov 10, 2015

Uh oh!

nbarrientos commented Nov 17, 2015

Uh oh!

HAIL9000 commented Nov 24, 2015

Uh oh!

hlindberg commented Nov 24, 2015

Uh oh!

joshcooper commented Nov 24, 2015

Uh oh!

kylog commented Dec 14, 2015

Uh oh!

hlindberg commented Feb 11, 2016

Uh oh!

joshcooper commented Feb 22, 2016

Uh oh!

hlindberg commented Feb 22, 2016

Uh oh!

kylog commented Feb 23, 2016

Uh oh!

joshcooper commented Mar 17, 2016

Uh oh!

ahpook commented Mar 23, 2016

Uh oh!

haus commented Mar 23, 2016

Uh oh!

haus commented Mar 24, 2016

Uh oh!

hlindberg commented Mar 25, 2016

Uh oh!

haus commented Mar 28, 2016

Uh oh!

haus commented Mar 28, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants