New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework cache timestamps #16885
Rework cache timestamps #16885
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Very nice improvements - when testing this functionality, I had systems with the clock starting way in the past, but none with the clock in the future, so I missed this corner case. Thanks for fixing it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, some minor comments
src/core/manager.c
Outdated
@@ -1956,9 +1956,12 @@ bool manager_unit_file_maybe_loadable_from_cache(Unit *u) { | |||
if (u->manager->unit_cache_mtime == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here, too, explaining what precisely this checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it was added by @bluca in d904afc. I didn't think too much about it. But looking at it carefully, I'm not sure if I understand the purpose. @bluca, can you explain? If we remove it, we would be try to reload the unit if if the cache is not initialized. I'm not sure if that could ever happen (because we load the cache quite early), but either way, I don't see a point for the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If memory serves, there was another pre-existing check somewhere else that skipped using _mtime if it was not initialised, so I replicated it (sorry can't remember more precisely, it was a while ago). I don't remember any specific reason apart from this, so I think it can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think I see: unit_file_build_name_map() had a check like this, but it is used in the inverse sense: we always rebuild the cache if the timestamp == 0. So yeah, it seems appropriate to remove this.
hmm, i wonder if a simple hash like XOR might be more appropriate here. i.e. is it a good thing to be dependent of search path order or a bad? Might be it's good, dunno, just wanted to raise the question? opinion? |
The search path order is fixed, so it doesn't really matter. I don't have a very strong opinion on siphash vs. xor: we're only hashing on the order of 10 values, and the hash cost is swamped by IO. But we don't need a strong hash. I can rework to use something else if you think it'd be better. |
The name is misleading, since we aren't really loading the unit from cache — if this function returns true, we'll try to load the unit from disk, updating the cache in the process.
We really only care if the cache has been reloaded between the time when we last attempted to load this unit and now. So instead of recording the actual time we try to load the unit, just store the timestamp of the cache. This has the advantage that we'll notice if the cache mtime jumps forward or backward. Also rename fragment_loadtime to fragment_not_found_time. It only gets set when we failed to load the unit and the old name was suggesting it is always set. In https://bugzilla.redhat.com/show_bug.cgi?id=1871327 (and most likely https://bugzilla.redhat.com/show_bug.cgi?id=1867930 and most likely https://bugzilla.redhat.com/show_bug.cgi?id=1872068) we try to load a non-existent unit over and over from transaction_add_job_and_dependencies(). My understanding is that the clock was in the future during inital boot, so cache_mtime is always in the future (since we don't touch the fs after initial boot), so no matter how many times we try to load the unit and set fragment_loadtime / fragment_not_found_time, it is always higher than cache_mtime, so manager_unit_cache_should_retry_load() always returns true.
This check was added in d904afc. It would only apply in the case where the cache hasn't been loaded yet. I think we pretty much always have the cache loaded when we reach this point, but even if we didn't, it seems better to try to reload the unit. So let's drop this check.
Instead of assuming that more-recently modified directories have higher mtime, just look for any mtime changes, up or down. Since we don't want to remember individual mtimes, hash them to obtain a single value. This should help us behave properly in the case when the time jumps backwards during boot: various files might have mtimes that in the future, but we won't care. This fixes the following scenario: We have /etc/systemd/system with T1. T1 is initially far in the past. We have /run/systemd/generator with time T2. The time is adjusted backwards, so T2 will be always in the future for a while. Now the user writes new files to /etc/systemd/system, and T1 is updated to T1'. Nevertheless, T1 < T1' << T2. We would consider our cache to be up-to-date, falsely.
1b3be32
to
c2911d4
Compare
Updated. I added one more separate commit to drop the check you asked about, with explanation in the commit message. I also added one more comment in that function. Otherwise, the only change is to rename the variables. |
bionic-i386 timed out like usual. merging |
No description provided.