(PUP-5482) Flag that a type could not be found#4427
(PUP-5482) Flag that a type could not be found#4427nbarrientos wants to merge 4 commits intopuppetlabs:masterfrom
Conversation
|
Thank you for your contribution @nbarrientos. This is reminiscent of PUP-1592, in which we short circuited the lookup process for defined types, see #2373 Perhaps we have regressed? /cc @hlindberg |
|
Sure sounds familiar. What we did then was to not search for ruby resource types when the name was namespaced (since currently types cannot be name spaced). We did nothing for user defined types though. Back when this was done, it was expected that adding a missing file would mean it got picked up. Now with environment caching it is different. Note that the new loaders (currently only used for 4.x functions) scans the file system once for an environment and keeps that information. The intent is to switch to the new loaders also for types and defines. |
|
While I am not completely familiar with the lifecycle of metatype/manager.rb stuff - the change looks ok - it is just checking if it knows or not, the change does not affect anything having to do with how this information is bound/cached etc. Looks ok for 4.x. (This change would not have worked on 3.x due to the expectancy to be able to find added files). |
|
I am curious - are there lots of user defined types that have a name that is not fully qualified? Otherwise I cannot see how the change in this PR helps since the first thing the type method does is to return nil if there is a ':' in the name. |
|
Thanks to both of you for looking at the patch. Probably we wouldn't have hit this in the first place if we were using concat fully qualified as you mentioned due to this. However, the optimization might still make sense to avoid degrading the performance if users don't use the types sensibly. @hlindberg, We're actually testing the patch on 3.8.4 as we haven't moved to 4.x yet, could you please elaborate on why it wouldn't work? We're seeing a significant improvement in the performance with the patch however we might be breaking something else without noticing :) |
|
Actually in our case this makes a big difference as, even though our catalog only contains 6 resources of type Concat, type() is called with "concat" as parameter 172 times per compilation forcing the autoloader to scan the disk. However, we have plenty of Concat::Fragment resources so maybe there's a bug somewhere triggering a call to type("concat") when a Concat::Fragment is found. Otherwise I fail to find right now an explanation for such a big number of calls. Nevertheless, with the patch on, the situation becomes less dramatic as only one of those 172 has to actually stat() the disk. However, I can also see that the function is called several times with "::concat::fragment" as parameter and those calls return indeed immediately after the check in L140. |
|
Well, I've just repeated the same experiment with the Service type and the behavior described above showed up so I'm assuming that's normal. There are only 15 resources of type Service in our catalog, however type("service") is called 100 times. This has no impact though as the type is found the first time that the function is called so no disk hammering here. However these multiple calls that are taking place kind of justify in my opinion the need to have a protection aimed to types that are not Ruby ones, in order to avoid unnecessary stat()s that are issued when resolving the few Puppet defined types that don't have "::" in their names (like concat). |
|
The problem with 3.x (non directory environments) is that it is expected to be able to find files dynamically at any time. A 3.x environment keeps monitoring files that are in use, and if they change the cache is evicted. That works fine for existing (or files being removed), but it does not work when files are added if the cache contains a blocking not-found/nil entry since there will never again be a search for the corresponding file (until cache is evicted for other reasons). You would need to touch another file if a type is added or restart the master. In practice, say when using r10k, it is most likely that other files are changed at the same time. Thus for 3.x and some users configurations, the proposed patch would change the behavior when switching between versions of the environment and if a user is unlucky the change would not be recognized. This is in contrast to 4.x and directory based environments where the caching semantics are different. I suggest that the patch is accepted for 4.x, but not for 3.x, and that module authors should be encouraged to change all unqualified references to qualified (since that also speeds up the resolution in other ways). There are test errors to deal with as well before accepting. |
|
Okay, thanks. A question, though. Would it be safe to backport the patch and deploy it locally as we're using 3.x and directory environments? |
|
@nbarrientos yes, in your own environment - I don't think we want to merge the patch into 3.x since it supports both dynamic and directory based environments. With 4.x (and directory based env in 3.x), everything (including types cache) is evicted when the environment expires. |
|
Okay, will then add it to our local 3.x Puppet build and roll it out. We'll take it out from our list of patches to apply once we're on 4.x :) I'm trying to turn the tests green now. Thanks again. |
If a Ruby type is not available in the file system, every time it is found by the parser, the Autoloader will scan all over again the directories in search_directories for it. This generates tons of stat()s that can be saved. This patch annotates that a type could not be found so the next time it is required during the compilation of a catalog, the file system is not hammered again unnecessarily. This optimization has given us a ~61% reduction in the number of stats per compilation, as we make an extensive use of the concat module which only provides Puppet "defined types".
|
Alright, so everything is green now. However, I'm not totally sure that the way I've made the tests happy is ideal. Let me know :) |
|
FTR, today we started to roll out the patch to one of our pools of Puppet masters at around 16:35. The following plot represents the number of stat()s made by all Puppet masters in that pool vs time. The results in production match what he saw during the testing phase, a 55-60% gain. The average compilation time has also decreased 40%. |
|
Ping @hlindberg, are you alright with cherry-picking and merging this after we do some testing? |
|
@HAIL9000 I am fine with this patch for 4.x - so stable and master are fine targets. (Should not be cherry picked to 3.x as noted in comments earlier). |
|
I'm not sure this is desired behavior on the agent, since pluginsync can make new types and providers available during its run. It's probably ok on the puppet master, but I know some people rely on the agent running on the master to bootstrap the master (which is why we couldn't separate the agent and master That said, we added an What about putting this new behavior behind a feature flag? In retrospect, it might have been better to create one puppet setting that meant "disable agent cache invalidation", rather than have N different settings... Another area of improvement would be to not clear gem paths when loading a feature: https://github.com/puppetlabs/puppet/blob/master/lib/puppet/util/feature.rb#L85 That logic is mainly for the agent, so that we can use a gem installed in the same run. But probably has negative performance impact on the server. /cc @camlow325 @cprice404 |
|
I like the idea of putting this behind a feature flag. See also #2951, which proposed a similar PR to address https://tickets.puppetlabs.com/browse/PUP-3038. |
|
Long term, the compiler will use the 4.x loaders instead of the @AutoLoader route. It already does that for new things; 4x functions, type aliases, but will use those loaders more. In 4.4.0 we will be loading resource types that way (for type references; only light use still). As that continues the code paths will become more and more separate. Frankly I would like to see as little change as possible to 3.x code - one fix always breaks something else... this one looked like it was worth it. The 4.x loaders do cash the misses (and does some other optimizations like promoting values so that they are found as quickly as possible). When compiling the environment is not supposed to change. |
|
@hlindberg I understand your concerns about not adding new caching behaviors, and am +1 to using 4.x loaders in the future. Having well-defined environment lifetimes makes sense. But if 4.x loaders cache misses then it will break use cases where hiera backends, functions, and report processors are delivered via agents running on the master. We ran into that issue when we tried to separate the agent and server libdir. IOW, separate the libdirs is a prerequisite for having 4.x loaders always cache misses. There are some on-going discussions about that. /cc @kylog Second, this PR adds a fairly significant performance improvement to puppetserver. It seems worthwhile to me to add this caching behavior under the existing @camlow325 can you take ownership of this PR and work with the contributor or close? |
|
@joshcooper I would like to understand the use cases where agents bring new things and why that means that loaders cannot cache. The 4.x loaders are tied to the lifecycle of the environment when compiling, they could just as easily be tied to something else (or not at all - which would be the same thing as only caching hits and misses inside the task when the loaders were bound to something. (This because loaders are not in charge of their own overall life cycle). But, maybe it is a case of allowing the "environment" on the agent to change during the course of the agent "apply" ? Could it perhaps detect when it would need to rest the loaders? Behind an "always cache" seems fine to me. I do think this is a valuable speed up. |
|
My $.02:
|
|
Ping @haus since you're on rotation for puppetserver. |
|
If you want a new setting I would reluctantly support that, but please make it default to on. Having N+1 settings where N > 200 and requiring the user to find it and flip the default in order to get a 40% performance boost is... perverse. |
|
@joshcooper thanks for the ping. I'll pick this up soon. |
|
@ahpook I don't think we need to default it to true. We could/should explicitly set it to true on the server side as was done for I don't care much about new setting vs using |
|
Ping @thallgren - in light of #4812 which is related to this. |
|
There is one case that worries me here, which is if a user begins using a new type in manifests before it is available (say they forgot to install the module first). In that case compiles will fail, but the type load miss will be cached for the life of the server process, so even after the user installs the type compiles will fail until the jruby expires or the server is restarted/reloaded. I have a branch that would resolve this case here (the branch also guards the caching behind a setting): https://github.com/haus/puppet/commits/tickets/master/pup-5482 |
|
I've added a few commits to put this caching behind a setting in #4818. So I'll close this out in favor of that PR. |

If a Ruby type is not available in the file system, every time it is found by
the parser, the Autoloader will scan all over again the directories in
search_directories for it. This generates tons of stat()s that can be saved.
This patch annotates that a type could not be found so the next time it is
required during the compilation of a catalog, the file system is not hammered
again unnecessarily.
This optimization has given us a ~61% reduction in the number of stats per
compilation, as we make an extensive use of the concat module which only
provides Puppet "defined types".