Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reload config after sighup #15

Merged
merged 8 commits into from Jan 31, 2020
Merged

Conversation

@ltratt
Copy link
Member

ltratt commented Jan 30, 2020

This PR enables snare to reload its config on SIGHUP. This is a bit more involved than it first seems because snare is multi-threaded. This PR thus goes through several stages (putting Config behind a Mutex and so on), before actually handling SIGHUP. Note that one case is handled somewhat, but not perfectly: reducing the value of maxjobs. I could do more here, but I think this is a niche case, and it's hard to test: the basic approach in this PR seems decent enough to me for the time being.

ltratt added 5 commits Jan 29, 2020
Although sharing allowed us to slightly minimise memory usage, the saving was
illusory: we still had to `clone()` `String`s and so on later. One possibility
is to cache `RepoConfig`s (and distribute them through `Arc` or similar), but
that seemed unnecessarily fussy, and would also mean we'd have to implement
extra stuff like cache eviction.

This commit thus simplifies things: every time we query the `Config` about a
repo, we get back a new `RepoConfig` that is not bound in anyway to `Config`.
However, since the GitHub secret is a `SecStr`, my assumption is that a) it's
expensive to clone (since it's doing `mprotect()` and so on) b) duplicating it
repeatedly throughout the heap might make it easier for an attacker to find and
decode it. We thus return that seaprately from the `RepoConfig`.
This is a necessary step to allowing reloading of Configs.
This is correct, but somewhat crude: any errrors in the config will cause the
whole process to terminate, for example.
@vext01

This comment has been minimized.

Copy link
Member

vext01 commented Jan 30, 2020

putting Config behind a Mutex and so on

Is the config mutable after parsing then? That would surprise me.

@ltratt

This comment has been minimized.

Copy link
Member Author

ltratt commented Jan 30, 2020

Is the config mutable after parsing then? That would surprise me.

When you send SIGHUP, the entire config file is reparsed and a new Config produced. So the Config is not mutable as such, but it can be replaced by a new Config.

@vext01

This comment has been minimized.

Copy link
Member

vext01 commented Jan 31, 2020

When you send SIGHUP, the entire config file is reparsed and a new Config produced. So the Config is not mutable as such, but it can be replaced by a new Config.

In the past, I've just had the program re-exec(3) itself, then you don't have to worry about any of this. I'm not sure if that would work for us here?

@ltratt

This comment has been minimized.

Copy link
Member Author

ltratt commented Jan 31, 2020

In the past, I've just had the program re-exec(3) itself

That would not be good here since we'd destroy the queue of jobs!

@vext01

This comment has been minimized.

Copy link
Member

vext01 commented Jan 31, 2020

That would not be good here since we'd destroy the queue of jobs!

Well, you'd have to wait for them to finish. I thought you were doing that already, but I guess from your response that you allow them to continue during the reload?

@ltratt

This comment has been minimized.

Copy link
Member Author

ltratt commented Jan 31, 2020

Yes, the reason this PR is quite so fiddly is that it has no effect on ongoing jobs, but we try still try to enact all the reasonable changes as soon as possible.

@vext01

This comment has been minimized.

Copy link
Member

vext01 commented Jan 31, 2020

I see. I hadn't appreciated that!

Code review coming soon.

@ltratt

This comment has been minimized.

Copy link
Member Author

ltratt commented Jan 31, 2020

Simplifying a bit: we don't change the config of any running job. So if a job was run when you said email="a@b" and you SIGHUP it so email="c@d" then the existing job will still send to a@b but all new jobs will send to c@d. As this suggests, each job has its own "unique" config in a sense (RepoConfig).

The major exception is 82af641: it's really fiddly -- and in the general case impossible -- to deal well with SIGHUP reducing the number of maxjobs if there are already running jobs. So I implemented something simple, given that this is a fairly niche case 82af641#diff-b319aab93ab499624a467ced0e18a2a8R364.

@vext01

This comment has been minimized.

Copy link
Member

vext01 commented Jan 31, 2020

So if a job was run when you said email="a@b" and you SIGHUP it so email="c@d" then the existing job will still send to a@b

I think that's OK. As I user I'd kind of expect that.

to deal well with SIGHUP reducing the number of maxjobs if there are already running jobs.

Hrm, yes. That's annoying.

I can think of two "possible improvements":

  • Wait until there are fewer-or-equal jobs than the new maxjobs, then create a new smaller array from the used slot of the old array. I think this is what your comment suggests with "compaction".

  • Keep a count of how many jobs there are, but the backing array may be larger. May waste memory and annoying to keep in sync.

I think the solution you have now is pretty good tbh.

pollfds.resize_with(snare.config.maxjobs * 2 + 1, || {
PollFd::new(-1, PollFlags::empty())
});
// If the unwrap() on the lock fails, the other thread has paniced.

This comment has been minimized.

Copy link
@vext01

vext01 Jan 31, 2020

Member

Is it necessary to repeat this comment on every mutex unlock? Mutex poisoning is well-known among Rust programmers and the documentation explains it well.

This comment has been minimized.

Copy link
@ltratt

ltratt Jan 31, 2020

Author Member

I tend to agree. I think in a future PR I will put this on the attribute in the struct.

This comment has been minimized.

Copy link
@vext01
});
// If the unwrap() on the lock fails, the other thread has paniced.
let maxjobs = snare.config.lock().unwrap().maxjobs;
assert!(maxjobs < std::usize::MAX);

This comment has been minimized.

Copy link
@vext01

vext01 Jan 31, 2020

Member

I'm not sure this is a meaningful assertion. Since both arguments are usize it's equivalent to:

assert!(maxjobs != std::usize::MAX);

Is that being used as a boundary condition elsewhere perhaps?

This comment has been minimized.

Copy link
@ltratt

ltratt Jan 31, 2020

Author Member

Actually, this should be assert!(maxjobs < ((std::usize::MAX - 1) / 2)! Long story, but it's basically about the amount of pollfds we create. In practise, of course, we're probably not going to have enough RAM for this to ever be an issue.

Fixed in 5ba7c25.

This comment has been minimized.

Copy link
@vext01

vext01 Jan 31, 2020

Member

OK, that looks more valid at least.

I missed why the / 2, but if the story is really long, I'll trust you ;)

This comment has been minimized.

Copy link
@ltratt

ltratt Jan 31, 2020

Author Member

Each job has stderr/stdout pipes which is the * (or /) 2.

This comment has been minimized.

Copy link
@vext01
assert!(maxjobs < std::usize::MAX);
let mut running = Vec::with_capacity(maxjobs);
running.resize_with(maxjobs, || None);
let mut pollfds = Vec::with_capacity(maxjobs * 2 + 1);

This comment has been minimized.

Copy link
@vext01

vext01 Jan 31, 2020

Member

Perhaps the above assertion is to cater for the + 1 here?

This comment has been minimized.

Copy link
@ltratt

ltratt Jan 31, 2020

Author Member

Correct.

src/main.rs Outdated
impl Snare {
/// Check to see if we've received a SIGHUP since the last check. If so, we will reload the
/// config file. **Note that because snare has multiple threads, the config file can change at
/// any arbitrary point, not just after calling this function.**

This comment has been minimized.

Copy link
@vext01

vext01 Jan 31, 2020

Member

I don't understand the "not just after calling this function" part of the comment.

When else can the config change?

This comment has been minimized.

Copy link
@ltratt

ltratt Jan 31, 2020

Author Member

There are two threads in snare, so the config can change in another thread even if it's not called check_for_hup.

This comment has been minimized.

Copy link
@vext01

vext01 Jan 31, 2020

Member

To be clear, the config may change when neither thread received a HUP?

This comment has been minimized.

Copy link
@ltratt

ltratt Jan 31, 2020

Author Member

At the moment, the signal comes in, a global bool is set, and then jobrunner checks that bool and reloads the config if necessary. There are only two threads, so one of them has to handle it (and the config stuff is way too much for it to be signal safe, so the signal handler can't deal with it directly).

This comment has been minimized.

Copy link
@vext01

vext01 Jan 31, 2020

Member

OK understood. But the comment makes it sound like some other part of the program (not this function) is mutating the config. As I understand you mean to say instead that this function could be run in another thread.

This comment has been minimized.

Copy link
@ltratt

ltratt Jan 31, 2020

Author Member

Is 3f03f44 better?

This comment has been minimized.

Copy link
@vext01

vext01 Jan 31, 2020

Member

Much clearer, thanks.

src/main.rs Outdated
/// config file. **Note that because snare has multiple threads, the config file can change at
/// any arbitrary point, not just after calling this function.**
fn check_for_hup(&self) {
if self.sighup_occurred.load(Ordering::Release) {

This comment has been minimized.

Copy link
@vext01

vext01 Jan 31, 2020

Member

Is the ordering supposed to be acquire here?

This comment has been minimized.

Copy link
@ltratt

ltratt Jan 31, 2020

Author Member

I think Relaxed is fine here as we don't need any other read/writes to have occurred before/after this.

This comment has been minimized.

Copy link
@vext01

vext01 Jan 31, 2020

Member

But we have Release.

Perhaps both this and the ordering in a few lines time should both be relaxed?

(If the config weren't in a mutex, you'd certainly want acquire/release, otherwise another thread may see the config mid-move)

This comment has been minimized.

Copy link
@ltratt

ltratt Jan 31, 2020

Author Member

You're totally right: they should both be Relaxed and one of the later commits (ff45ea1#diff-639fbc4ef05b315af92b4d836c31b023R66) fixes that. I don't know why I ever thougt Release was the correct ordering!

let sighup_occurred = Arc::new(AtomicBool::new(false));
{
let sighup_occurred = Arc::clone(&sighup_occurred);
if let Err(e) = unsafe {

This comment has been minimized.

Copy link
@vext01

vext01 Jan 31, 2020

Member

If you stored the closure in a variable, I think you might be able to limit the scope of unsafe some more?

This comment has been minimized.

Copy link
@ltratt

ltratt Jan 31, 2020

Author Member

I'm not sure I understand?

This comment has been minimized.

Copy link
@vext01

vext01 Jan 31, 2020

Member

So if you did something like:

if let Err(e) = {
    let f = move || {
        // All functions called in this function must be signal safe. See signal(3).
        sighup_occurred.store(true, Ordering::Relaxed);
        unsafe { nix::unistd::write(event_write_fd, &[0]).ok() };
    };
    unsafe { signal_hook::register(signal_hook::SIGHUP, f) };
}

Then fewer lines can be in unsafe? I may be wrong.

This comment has been minimized.

Copy link
@ltratt

ltratt Jan 31, 2020

Author Member

TBH, I'm fairly happy having the closure in unsafe because it's a signal handler and being clear that "here be dragons" is not a bad idea.

if self.sighup_occurred.load(Ordering::Relaxed) {
match Config::from_path(&self.conf_path) {
Ok(config) => *self.config.lock().unwrap() = config,
Err(msg) => eprintln!("{}", msg),

This comment has been minimized.

Copy link
@vext01

vext01 Jan 31, 2020

Member

So the error is printed on stderr, which may be in the background and may go un-noticed.

Not sure how you can fix that though.

One potential idea would be to have a snare --reload which communicates with the existing snare instance and prints any errors to its stderr, (not the daemon's). However, this would probably need more complex IPC.

This comment has been minimized.

Copy link
@ltratt

ltratt Jan 31, 2020

Author Member

A later PR will send this to syslog.

This comment has been minimized.

Copy link
@vext01

vext01 Jan 31, 2020

Member

sounds good!

eprintln!("{}", msg);
} else {
eprintln!("{}.", msg);
}

This comment has been minimized.

Copy link
@vext01

vext01 Jan 31, 2020

Member

I wonder if this is worth it :)

There's a any number of silly things the caller might do.

msg("error..");
msg("error,");
...

This comment has been minimized.

Copy link
@ltratt

ltratt Jan 31, 2020

Author Member

Yeah, I'm not sure either :/

@ltratt

This comment has been minimized.

Copy link
Member Author

ltratt commented Jan 31, 2020

I think that's everything?

@vext01

This comment has been minimized.

Copy link
Member

vext01 commented Jan 31, 2020

LGTM. Please squash.

ltratt added 3 commits Jan 30, 2020
If we're running, and receive SIGHUP, it's possible that the user's changes to
the config file are incorrect. Rather than aborting, it's better that we report
the problem, ignore the new config file, and keep on running.
If the user asks for more jobs to be run, we have an easy task: if they ask for
fewer to be run, then it is much trickier. The approach this commit takes for
the latter case is simple, but means that we can find ourselves in situations
where we are not ever able to actually reduce the number of maximum jobs that
are running.
Previously we were inconsistent about whether variables were "conf" or "config".
This commit homogenises this to "conf" (though the types are still the longer
"Config").
@ltratt ltratt force-pushed the ltratt:reload_config_after_sighup branch from 3f03f44 to 0a589e3 Jan 31, 2020
@ltratt

This comment has been minimized.

Copy link
Member Author

ltratt commented Jan 31, 2020

Squashed.

@vext01

This comment has been minimized.

Copy link
Member

vext01 commented Jan 31, 2020

bors r+

bors bot added a commit that referenced this pull request Jan 31, 2020
Merge #15
15: Reload config after sighup r=vext01 a=ltratt

This PR enables snare to reload its config on SIGHUP. This is a bit more involved than it first seems because snare is multi-threaded. This PR thus goes through several stages (putting `Config` behind a `Mutex` and so on), before actually handling SIGHUP. Note that one case is handled somewhat, but not perfectly: reducing the value of `maxjobs`. I could do more here, but I think this is a niche case, and it's hard to test: the basic approach in this PR seems decent enough to me for the time being.

Co-authored-by: Laurence Tratt <laurie@tratt.net>
@ltratt

This comment has been minimized.

Copy link
Member Author

ltratt commented Jan 31, 2020

@vext01 Any idea why this failed? buildbot seems to have succeeded but bors failed?

@vext01

This comment has been minimized.

Copy link
Member

vext01 commented Jan 31, 2020

Yeah, that doesn't look like our issue.

Let's try again.

bors r+

bors bot added a commit that referenced this pull request Jan 31, 2020
Merge #15
15: Reload config after sighup r=vext01 a=ltratt

This PR enables snare to reload its config on SIGHUP. This is a bit more involved than it first seems because snare is multi-threaded. This PR thus goes through several stages (putting `Config` behind a `Mutex` and so on), before actually handling SIGHUP. Note that one case is handled somewhat, but not perfectly: reducing the value of `maxjobs`. I could do more here, but I think this is a niche case, and it's hard to test: the basic approach in this PR seems decent enough to me for the time being.

Co-authored-by: Laurence Tratt <laurie@tratt.net>
@vext01

This comment has been minimized.

Copy link
Member

vext01 commented Jan 31, 2020

once more for luck

bors r+

bors bot added a commit that referenced this pull request Jan 31, 2020
Merge #15
15: Reload config after sighup r=vext01 a=ltratt

This PR enables snare to reload its config on SIGHUP. This is a bit more involved than it first seems because snare is multi-threaded. This PR thus goes through several stages (putting `Config` behind a `Mutex` and so on), before actually handling SIGHUP. Note that one case is handled somewhat, but not perfectly: reducing the value of `maxjobs`. I could do more here, but I think this is a niche case, and it's hard to test: the basic approach in this PR seems decent enough to me for the time being.

Co-authored-by: Laurence Tratt <laurie@tratt.net>
@vext01 vext01 merged commit 1913d29 into softdevteam:master Jan 31, 2020
1 check failed
1 check failed
bors Canceled
Details
@ltratt ltratt deleted the ltratt:reload_config_after_sighup branch Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.