Rework cache timestamps #16885

keszybz · 2020-08-28T10:51:19Z

No description provided.

bluca

LGTM! Very nice improvements - when testing this functionality, I had systems with the clock starting way in the past, but none with the clock in the future, so I missed this corner case. Thanks for fixing it!

poettering

looks good, some minor comments

poettering · 2020-08-31T14:07:16Z

src/core/manager.c

@@ -1956,9 +1956,12 @@ bool manager_unit_file_maybe_loadable_from_cache(Unit *u) {
        if (u->manager->unit_cache_mtime == 0)


can you add a comment here, too, explaining what precisely this checks?

Hmm, it was added by @bluca in d904afc. I didn't think too much about it. But looking at it carefully, I'm not sure if I understand the purpose. @bluca, can you explain? If we remove it, we would be try to reload the unit if if the cache is not initialized. I'm not sure if that could ever happen (because we load the cache quite early), but either way, I don't see a point for the check.

If memory serves, there was another pre-existing check somewhere else that skipped using _mtime if it was not initialised, so I replicated it (sorry can't remember more precisely, it was a while ago). I don't remember any specific reason apart from this, so I think it can be removed.

Oh, I think I see: unit_file_build_name_map() had a check like this, but it is used in the inverse sense: we always rebuild the cache if the timestamp == 0. So yeah, it seems appropriate to remove this.

src/shared/unit-file.c

src/core/manager.h

poettering · 2020-08-31T14:15:03Z

hmm, i wonder if a simple hash like XOR might be more appropriate here. i.e. is it a good thing to be dependent of search path order or a bad? Might be it's good, dunno, just wanted to raise the question? opinion?

keszybz · 2020-08-31T17:40:29Z

hmm, i wonder if a simple hash like XOR might be more appropriate here. i.e. is it a good thing to be dependent of search path order or a bad? Might be it's good, dunno, just wanted to raise the question? opinion?

The search path order is fixed, so it doesn't really matter. I don't have a very strong opinion on siphash vs. xor: we're only hashing on the order of 10 values, and the hash cost is swamped by IO. But we don't need a strong hash. I can rework to use something else if you think it'd be better.

The name is misleading, since we aren't really loading the unit from cache — if this function returns true, we'll try to load the unit from disk, updating the cache in the process.

We really only care if the cache has been reloaded between the time when we last attempted to load this unit and now. So instead of recording the actual time we try to load the unit, just store the timestamp of the cache. This has the advantage that we'll notice if the cache mtime jumps forward or backward. Also rename fragment_loadtime to fragment_not_found_time. It only gets set when we failed to load the unit and the old name was suggesting it is always set. In https://bugzilla.redhat.com/show_bug.cgi?id=1871327 (and most likely https://bugzilla.redhat.com/show_bug.cgi?id=1867930 and most likely https://bugzilla.redhat.com/show_bug.cgi?id=1872068) we try to load a non-existent unit over and over from transaction_add_job_and_dependencies(). My understanding is that the clock was in the future during inital boot, so cache_mtime is always in the future (since we don't touch the fs after initial boot), so no matter how many times we try to load the unit and set fragment_loadtime / fragment_not_found_time, it is always higher than cache_mtime, so manager_unit_cache_should_retry_load() always returns true.

This check was added in d904afc. It would only apply in the case where the cache hasn't been loaded yet. I think we pretty much always have the cache loaded when we reach this point, but even if we didn't, it seems better to try to reload the unit. So let's drop this check.

Instead of assuming that more-recently modified directories have higher mtime, just look for any mtime changes, up or down. Since we don't want to remember individual mtimes, hash them to obtain a single value. This should help us behave properly in the case when the time jumps backwards during boot: various files might have mtimes that in the future, but we won't care. This fixes the following scenario: We have /etc/systemd/system with T1. T1 is initially far in the past. We have /run/systemd/generator with time T2. The time is adjusted backwards, so T2 will be always in the future for a while. Now the user writes new files to /etc/systemd/system, and T1 is updated to T1'. Nevertheless, T1 < T1' << T2. We would consider our cache to be up-to-date, falsely.

keszybz · 2020-08-31T18:56:51Z

Updated. I added one more separate commit to drop the check you asked about, with explanation in the commit message. I also added one more comment in that function. Otherwise, the only change is to rename the variables.

anitazha · 2020-09-01T06:21:08Z

bionic-i386 timed out like usual. merging

keszybz added pid1 priority Stuff that should enter master quickly, since it fixes a major bug, unbreaks CI or stalls other work labels Aug 28, 2020

keszybz mentioned this pull request Aug 28, 2020

Extremely slow boot and constant CPU utilization #16878

Closed

bluca approved these changes Aug 29, 2020

View reviewed changes

keszybz requested a review from poettering August 31, 2020 11:10

poettering linked an issue Aug 31, 2020 that may be closed by this pull request

Extremely slow boot and constant CPU utilization #16878

Closed

poettering requested changes Aug 31, 2020

View reviewed changes

poettering added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Aug 31, 2020

keszybz added 4 commits August 31, 2020 20:53

core: rename manager_unit_file_maybe_loadable_from_cache()

81be238

The name is misleading, since we aren't really loading the unit from cache — if this function returns true, we'll try to load the unit from disk, updating the cache in the process.

keszybz force-pushed the rework-cache-timestamps branch from 1b3be32 to c2911d4 Compare August 31, 2020 18:55

keszybz removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Aug 31, 2020

poettering added the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Aug 31, 2020

anitazha merged commit 0419dae into systemd:master Sep 1, 2020

keszybz deleted the rework-cache-timestamps branch September 1, 2020 08:59

poettering mentioned this pull request Oct 12, 2020

Still prompting to daemon-reload after doing so #17312

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework cache timestamps #16885

Rework cache timestamps #16885

keszybz commented Aug 28, 2020

bluca left a comment

poettering left a comment

poettering Aug 31, 2020

keszybz Aug 31, 2020

bluca Aug 31, 2020 •

edited

keszybz Aug 31, 2020

poettering commented Aug 31, 2020

keszybz commented Aug 31, 2020

keszybz commented Aug 31, 2020

anitazha commented Sep 1, 2020

		@@ -1956,9 +1956,12 @@ bool manager_unit_file_maybe_loadable_from_cache(Unit *u) {
		if (u->manager->unit_cache_mtime == 0)

Rework cache timestamps #16885

Rework cache timestamps #16885

Conversation

keszybz commented Aug 28, 2020

bluca left a comment

Choose a reason for hiding this comment

poettering left a comment

Choose a reason for hiding this comment

poettering Aug 31, 2020

Choose a reason for hiding this comment

keszybz Aug 31, 2020

Choose a reason for hiding this comment

bluca Aug 31, 2020 • edited

Choose a reason for hiding this comment

keszybz Aug 31, 2020

Choose a reason for hiding this comment

poettering commented Aug 31, 2020

keszybz commented Aug 31, 2020

keszybz commented Aug 31, 2020

anitazha commented Sep 1, 2020

bluca Aug 31, 2020 •

edited