Skip to content

(PUP-5482) Flag that a type could not be found#4427

Closed
nbarrientos wants to merge 4 commits intopuppetlabs:masterfrom
cernops:pup5482
Closed

(PUP-5482) Flag that a type could not be found#4427
nbarrientos wants to merge 4 commits intopuppetlabs:masterfrom
cernops:pup5482

Conversation

@nbarrientos
Copy link
Contributor

If a Ruby type is not available in the file system, every time it is found by
the parser, the Autoloader will scan all over again the directories in
search_directories for it. This generates tons of stat()s that can be saved.

This patch annotates that a type could not be found so the next time it is
required during the compilation of a catalog, the file system is not hammered
again unnecessarily.

This optimization has given us a ~61% reduction in the number of stats per
compilation, as we make an extensive use of the concat module which only
provides Puppet "defined types".

@joshcooper
Copy link
Contributor

Thank you for your contribution @nbarrientos. This is reminiscent of PUP-1592, in which we short circuited the lookup process for defined types, see #2373 Perhaps we have regressed? /cc @hlindberg

@hlindberg
Copy link
Contributor

Sure sounds familiar. What we did then was to not search for ruby resource types when the name was namespaced (since currently types cannot be name spaced). We did nothing for user defined types though. Back when this was done, it was expected that adding a missing file would mean it got picked up. Now with environment caching it is different.

Note that the new loaders (currently only used for 4.x functions) scans the file system once for an environment and keeps that information. The intent is to switch to the new loaders also for types and defines.

@hlindberg
Copy link
Contributor

While I am not completely familiar with the lifecycle of metatype/manager.rb stuff - the change looks ok - it is just checking if it knows or not, the change does not affect anything having to do with how this information is bound/cached etc. Looks ok for 4.x. (This change would not have worked on 3.x due to the expectancy to be able to find added files).

@hlindberg
Copy link
Contributor

I am curious - are there lots of user defined types that have a name that is not fully qualified? Otherwise I cannot see how the change in this PR helps since the first thing the type method does is to return nil if there is a ':' in the name.

@nbarrientos
Copy link
Contributor Author

Thanks to both of you for looking at the patch. Probably we wouldn't have hit this in the first place if we were using concat fully qualified as you mentioned due to this. However, the optimization might still make sense to avoid degrading the performance if users don't use the types sensibly.

@hlindberg, We're actually testing the patch on 3.8.4 as we haven't moved to 4.x yet, could you please elaborate on why it wouldn't work? We're seeing a significant improvement in the performance with the patch however we might be breaking something else without noticing :)

@nbarrientos
Copy link
Contributor Author

Actually in our case this makes a big difference as, even though our catalog only contains 6 resources of type Concat, type() is called with "concat" as parameter 172 times per compilation forcing the autoloader to scan the disk. However, we have plenty of Concat::Fragment resources so maybe there's a bug somewhere triggering a call to type("concat") when a Concat::Fragment is found. Otherwise I fail to find right now an explanation for such a big number of calls. Nevertheless, with the patch on, the situation becomes less dramatic as only one of those 172 has to actually stat() the disk.

However, I can also see that the function is called several times with "::concat::fragment" as parameter and those calls return indeed immediately after the check in L140.

@nbarrientos
Copy link
Contributor Author

Well, I've just repeated the same experiment with the Service type and the behavior described above showed up so I'm assuming that's normal. There are only 15 resources of type Service in our catalog, however type("service") is called 100 times. This has no impact though as the type is found the first time that the function is called so no disk hammering here.

However these multiple calls that are taking place kind of justify in my opinion the need to have a protection aimed to types that are not Ruby ones, in order to avoid unnecessary stat()s that are issued when resolving the few Puppet defined types that don't have "::" in their names (like concat).

@hlindberg
Copy link
Contributor

The problem with 3.x (non directory environments) is that it is expected to be able to find files dynamically at any time. A 3.x environment keeps monitoring files that are in use, and if they change the cache is evicted. That works fine for existing (or files being removed), but it does not work when files are added if the cache contains a blocking not-found/nil entry since there will never again be a search for the corresponding file (until cache is evicted for other reasons). You would need to touch another file if a type is added or restart the master. In practice, say when using r10k, it is most likely that other files are changed at the same time. Thus for 3.x and some users configurations, the proposed patch would change the behavior when switching between versions of the environment and if a user is unlucky the change would not be recognized.

This is in contrast to 4.x and directory based environments where the caching semantics are different.

I suggest that the patch is accepted for 4.x, but not for 3.x, and that module authors should be encouraged to change all unqualified references to qualified (since that also speeds up the resolution in other ways).

There are test errors to deal with as well before accepting.

@nbarrientos
Copy link
Contributor Author

Okay, thanks. A question, though. Would it be safe to backport the patch and deploy it locally as we're using 3.x and directory environments?

@hlindberg
Copy link
Contributor

@nbarrientos yes, in your own environment - I don't think we want to merge the patch into 3.x since it supports both dynamic and directory based environments. With 4.x (and directory based env in 3.x), everything (including types cache) is evicted when the environment expires.

@nbarrientos
Copy link
Contributor Author

Okay, will then add it to our local 3.x Puppet build and roll it out. We'll take it out from our list of patches to apply once we're on 4.x :) I'm trying to turn the tests green now. Thanks again.

If a Ruby type is not available in the file system, every time it is found by
the parser, the Autoloader will scan all over again the directories in
search_directories for it. This generates tons of stat()s that can be saved.

This patch annotates that a type could not be found so the next time it is
required during the compilation of a catalog, the file system is not hammered
again unnecessarily.

This optimization has given us a ~61% reduction in the number of stats per
compilation, as we make an extensive use of the concat module which only
provides Puppet "defined types".
@nbarrientos
Copy link
Contributor Author

Alright, so everything is green now. However, I'm not totally sure that the way I've made the tests happy is ideal. Let me know :)

@nbarrientos
Copy link
Contributor Author

FTR, today we started to roll out the patch to one of our pools of Puppet masters at around 16:35. The following plot represents the number of stat()s made by all Puppet masters in that pool vs time. The results in production match what he saw during the testing phase, a 55-60% gain. The average compilation time has also decreased 40%.

screen shot 2015-11-17 at 22 27 52

@HAIL9000
Copy link
Contributor

Ping @hlindberg, are you alright with cherry-picking and merging this after we do some testing?

@hlindberg
Copy link
Contributor

@HAIL9000 I am fine with this patch for 4.x - so stable and master are fine targets. (Should not be cherry picked to 3.x as noted in comments earlier).

@joshcooper
Copy link
Contributor

I'm not sure this is desired behavior on the agent, since pluginsync can make new types and providers available during its run.

It's probably ok on the puppet master, but I know some people rely on the agent running on the master to bootstrap the master (which is why we couldn't separate the agent and master libdir), so I think the same problem could happen on the master and would be considering a breaking change.

That said, we added an always_cache_features puppet setting that will cache the result of feature evaluation, so for example if Puppet.features.msgpack? is false, cache the result, and don't try to load msgpack again.

What about putting this new behavior behind a feature flag? In retrospect, it might have been better to create one puppet setting that meant "disable agent cache invalidation", rather than have N different settings...

Another area of improvement would be to not clear gem paths when loading a feature: https://github.com/puppetlabs/puppet/blob/master/lib/puppet/util/feature.rb#L85 That logic is mainly for the agent, so that we can use a gem installed in the same run. But probably has negative performance impact on the server.

/cc @camlow325 @cprice404

@kylog
Copy link

kylog commented Dec 14, 2015

I like the idea of putting this behind a feature flag.

See also #2951, which proposed a similar PR to address https://tickets.puppetlabs.com/browse/PUP-3038.

@hlindberg
Copy link
Contributor

Long term, the compiler will use the 4.x loaders instead of the @AutoLoader route. It already does that for new things; 4x functions, type aliases, but will use those loaders more. In 4.4.0 we will be loading resource types that way (for type references; only light use still). As that continues the code paths will become more and more separate.

Frankly I would like to see as little change as possible to 3.x code - one fix always breaks something else... this one looked like it was worth it.

The 4.x loaders do cash the misses (and does some other optimizations like promoting values so that they are found as quickly as possible). When compiling the environment is not supposed to change.

@joshcooper
Copy link
Contributor

@hlindberg I understand your concerns about not adding new caching behaviors, and am +1 to using 4.x loaders in the future. Having well-defined environment lifetimes makes sense.

But if 4.x loaders cache misses then it will break use cases where hiera backends, functions, and report processors are delivered via agents running on the master. We ran into that issue when we tried to separate the agent and server libdir. IOW, separate the libdirs is a prerequisite for having 4.x loaders always cache misses. There are some on-going discussions about that. /cc @kylog

Second, this PR adds a fairly significant performance improvement to puppetserver. It seems worthwhile to me to add this caching behavior under the existing always_cache_features feature for the time being.

@camlow325 can you take ownership of this PR and work with the contributor or close?

@hlindberg
Copy link
Contributor

@joshcooper I would like to understand the use cases where agents bring new things and why that means that loaders cannot cache. The 4.x loaders are tied to the lifecycle of the environment when compiling, they could just as easily be tied to something else (or not at all - which would be the same thing as only caching hits and misses inside the task when the loaders were bound to something. (This because loaders are not in charge of their own overall life cycle). But, maybe it is a case of allowing the "environment" on the agent to change during the course of the agent "apply" ? Could it perhaps detect when it would need to rest the loaders?

Behind an "always cache" seems fine to me. I do think this is a valuable speed up.

@kylog
Copy link

kylog commented Feb 23, 2016

My $.02:

  • 👍 on adding this behind either a new always_cache setting or a more-specific new always_cache_types (or ??? - naming is hard) setting
  • I'd rather not reuse the always_cache_features setting for this because that will confuse future me

@joshcooper
Copy link
Contributor

Ping @haus since you're on rotation for puppetserver.

@ahpook
Copy link
Contributor

ahpook commented Mar 23, 2016

If you want a new setting I would reluctantly support that, but please make it default to on. Having N+1 settings where N > 200 and requiring the user to find it and flip the default in order to get a 40% performance boost is... perverse.

@haus
Copy link
Contributor

haus commented Mar 23, 2016

@joshcooper thanks for the ping. I'll pick this up soon.

@haus
Copy link
Contributor

haus commented Mar 24, 2016

@ahpook I don't think we need to default it to true. We could/should explicitly set it to true on the server side as was done for always_cache_features https://github.com/puppetlabs/puppet-server/blob/master/src/ruby/puppet-server-lib/puppet/server/puppet_config.rb#L26. Then there would be no needed work for folks running puppetserver, and for those running agents, they wouldn't experience regressions around loading new types that were delivered during runs, as @joshcooper mentioned in #4427 (comment).

I don't care much about new setting vs using always_cache_features, largely because I can't imagine a case where a user would need/want to enable one and not the other.

@hlindberg
Copy link
Contributor

Ping @thallgren - in light of #4812 which is related to this.

@haus
Copy link
Contributor

haus commented Mar 28, 2016

There is one case that worries me here, which is if a user begins using a new type in manifests before it is available (say they forgot to install the module first). In that case compiles will fail, but the type load miss will be cached for the life of the server process, so even after the user installs the type compiles will fail until the jruby expires or the server is restarted/reloaded.

I have a branch that would resolve this case here (the branch also guards the caching behind a setting): https://github.com/haus/puppet/commits/tickets/master/pup-5482

@haus
Copy link
Contributor

haus commented Mar 28, 2016

I've added a few commits to put this caching behind a setting in #4818. So I'll close this out in favor of that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants