Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
add limited metadata caching to journald and other journal improvements #6392
Conversation
poettering
added
the
journal
label
Jul 17, 2017
| @@ -200,3 +200,10 @@ DEFINE_TRIVIAL_CLEANUP_FUNC(char *, string_free_erase); | ||
| #define _cleanup_string_free_erase_ _cleanup_(string_free_erasep) | ||
| bool string_is_safe(const char *p) _pure_; | ||
| + | ||
| +static inline size_t strlen_ptr(const char *s) { |
| + | ||
| +/* This implements a metadata cache for clients, which are identified by their PID. Requesting metadata through /proc | ||
| + * is expensive, hence let's cache the data if we can. Note that this means the metadata might be out-of-date when we | ||
| + * store it, but it might already be anyway, as we request the data asynchronously from /proc at a different time the |
vcaputo
Jul 19, 2017
Member
nit: the current situation is actually the opposite of out of date; the metadata is sampled in the future relative to when the log was produced.
keszybz
Jul 25, 2017
Owner
Yeah, that's complicated. We might actually have accurate metadata more often after this patch, e.g. if a client logs twice, and exits immediately after, and we manage to cache metadata after the first log entry.
| + if (!c) | ||
| + return -ENOMEM; | ||
| + | ||
| + c->n_ref = 0; |
| + else | ||
| + (void) get_process_gid(c->pid, &c->gid); | ||
| + | ||
| + return 0; |
| + assert(c); | ||
| + assert(pid_is_valid(c->pid)); | ||
| + | ||
| + if (get_process_comm(c->pid, &t) >= 0) { |
vcaputo
Jul 19, 2017
Member
Why not just use the free_and_replace() macro from basic/alloc-util.h? These can all lose the {}s even then.
| + c->capeff = t; | ||
| + } | ||
| + | ||
| + return 0; |
| + if (!l) | ||
| + return -ENOMEM; | ||
| + | ||
| + memcpy(l, label, label_size); |
vcaputo
Jul 19, 2017
Member
is there really not a convenience helper in systemd for doing this null-terminated memcpy?
| + memcpy(l, label, label_size); | ||
| + l[label_size] = 0; | ||
| + | ||
| + free(c->label); |
| + /* If we got no SELinux label passed in, let's try to acquire one */ | ||
| + | ||
| + if (getpidcon(c->pid, &con) >= 0) { | ||
| + free(c->label); |
| + return 0; | ||
| + } | ||
| + | ||
| + free(c->cgroup); |
vcaputo
Jul 19, 2017
Member
lots of free_and_replace() in this function, I'll stop pointing them out.
| + assert_se(prioq_reshuffle(s->client_contexts_lru, c, &c->lru_index) >= 0); | ||
| + } | ||
| + | ||
| + return 0; |
| + if (label_size > 0 && (label_size != c->label_size || memcmp(label, c->label, label_size) != 0)) | ||
| + goto refresh; | ||
| + | ||
| + return 0; |
vcaputo
Jul 19, 2017
Member
another void return no? these all imply errors are propagated and that's simply not the case
| + return client_context_really_refresh(s, c, ucred, label, label_size, unit_id, timestamp); | ||
| +} | ||
| + | ||
| +static void client_context_make_room(Server *s, size_t limit) { |
vcaputo
Jul 19, 2017
Member
This function name makes me expect the argument to be how much room to make.
I think it would be better named something like client_context_try_shrink_to() or something.
| + if (add_ref) | ||
| + c->n_ref = 1; | ||
| + else { | ||
| + c->n_ref = 0; |
| + | ||
| + } | ||
| + | ||
| + return 0; |
|
Neat, something in this vein is long overdue. I just did a pretty casual review, mostly nits, otherwise It's good you addressed all the tiny allocation/frees for the various metadata fields. When I started reviewing I was concerned you wouldn't address that aspect while adding the cache. Your alloca approach is less invasive than mine was, though I avoided all copies. I think I like yours more. |
euank
referenced this pull request
in coreos/bugs
Jul 19, 2017
Open
journald miss the last few log lines when process exit #373
|
Thanks for the review! I have now force pushed a new version with almost all of your points fixed. I did leave some functions returning "int", even though the caller ignores it then. It just feels weird to eat obvious OOM issues right away in the callee, it felt more natural to leave this to the caller. I mean, ultimately it doesn't really matter anyway, the compiler should optimize all this away easily as this stuff is all static, non-exported stuff... Anway, I hope that makes some sense. Please have another look so that we can get this landed! |
vcaputo
approved these changes
Jul 20, 2017
•
You have my approval, FWIW. Note I've only looked at the last commit...
keszybz
requested changes
Jul 25, 2017
Some initial comments. i didn't review the main patch yet.
| + | ||
| + /* The same as memdup() but place a safety NUL byte after the allocated memory */ | ||
| + | ||
| + q = memdup(p, l+1); |
keszybz
Jul 25, 2017
Owner
This doesn't look right. Based on the commit description, this could be used to bytes from a fixed size buffer, right to the edge. Then this memdup will read one past the allowed area. This must be replaced by malloc + memcpy.
| @@ -388,7 +388,7 @@ int is_kernel_thread(pid_t pid) { | ||
| bool eof; | ||
| FILE *f; | ||
| - if (pid == 0 || pid == 1) /* pid 1, and we ourselves certainly aren't a kernel thread */ | ||
| + if (pid == 0 || pid == 1 || pid == getpid()) /* pid 1, and we ourselves certainly aren't a kernel thread */ |
keszybz
Jul 25, 2017
Owner
Shouldn't this be getpid_cached()? Without that, this additional check might slow things down.
poettering
Jul 31, 2017
Owner
yupp, this PR predates the getpid_cached() PR, hence it's based on a version without it. Will rebase.
| @@ -200,3 +200,10 @@ DEFINE_TRIVIAL_CLEANUP_FUNC(char *, string_free_erase); | ||
| #define _cleanup_string_free_erase_ _cleanup_(string_free_erasep) | ||
| bool string_is_safe(const char *p) _pure_; | ||
| + | ||
| +static inline size_t strlen_ptr(const char *s) { |
| +#include <selinux/selinux.h> | ||
| +#endif | ||
| + | ||
| +#include <assert.h> |
| +#include "user-util.h" | ||
| +#include "audit-util.h" | ||
| +#include "string-util.h" | ||
| +#include "cgroup-util.h" |
| + | ||
| +/* This implements a metadata cache for clients, which are identified by their PID. Requesting metadata through /proc | ||
| + * is expensive, hence let's cache the data if we can. Note that this means the metadata might be out-of-date when we | ||
| + * store it, but it might already be anyway, as we request the data asynchronously from /proc at a different time the |
vcaputo
Jul 19, 2017
Member
nit: the current situation is actually the opposite of out of date; the metadata is sampled in the future relative to when the log was produced.
keszybz
Jul 25, 2017
Owner
Yeah, that's complicated. We might actually have accurate metadata more often after this patch, e.g. if a client logs twice, and exits immediately after, and we manage to cache metadata after the first log entry.
| + /* Returns: | ||
| + * | ||
| + * 0 → the log message shall be suppressed, | ||
| + * 1 + n → if the log message shall be permitted, and n messages where dropped from the peer before |
| + /* Write a suppression message if we suppressed something */ | ||
| + if (rl > 1) | ||
| + server_driver_message(s, "MESSAGE_ID=" SD_MESSAGE_JOURNAL_DROPPED_STR, | ||
| + LOG_MESSAGE("Suppressed %u messages from %s", rl - 1, path), |
| + | ||
| + /* If that didn't work, we use the unit ID passed in as fallback, if we have nothing cached yet */ | ||
| + if (unit_id && !c->unit) { | ||
| + c->unit = strdup(unit_id); |
keszybz
Jul 26, 2017
Owner
Hmm, is it on purpose that in client_context_read_label free_and_replace is used, and here just normal assignment (no free)?
poettering
Jul 31, 2017
Owner
yes, it is... we trust the data from /proc more, and only use the data from the peer if we have nothing better. Hence we only set c->unit if it is NULL so far, as the comment is supposed to clarify.
| + if (cg_path_get_session(c->cgroup, &t) >= 0) | ||
| + free_and_replace(c->session, t); | ||
| + else | ||
| + c->session = mfree(c->session); |
keszybz
Jul 26, 2017
Owner
Maybe it's too much magic, but lines 276–279 can be replaced with
(void) cg_path_get_session(c->cgroup, &t);
free_and_replace(c->session, t);| + ClientContext **ret) { | ||
| + | ||
| + return client_context_get_internal(s, pid, ucred, label, label_len, unit_id, true, ret); | ||
| +}; |
keszybz
Jul 26, 2017
Owner
Shouldn't those be static inline functions? Or are we counting on lto to figure things out for us?
poettering
Jul 31, 2017
Owner
client_context_get_internal() is a static function, and gcc should be smart enough to optimize this away for us, and at least turn this into JMP rather than CALL, which I am very sure is good enough
| + | ||
| + if (!s->my_context) { | ||
| + struct ucred ucred = { | ||
| + .pid = getpid(), |
| static void dispatch_message_real( | ||
| Server *s, | ||
| struct iovec *iovec, unsigned n, unsigned m, | ||
| - const struct ucred *ucred, | ||
| + ClientContext *c, |
poettering
referenced this pull request
Jul 31, 2017
Open
Filter mechanism for logs in journald #6432
poettering
added some commits
Jul 14, 2017
|
force pushed a new version, please have a look. addressed all issues raised |
| + /* Write a suppression message if we suppressed something */ | ||
| + if (rl > 1) | ||
| + server_driver_message(s, "MESSAGE_ID=" SD_MESSAGE_JOURNAL_DROPPED_STR, | ||
| + LOG_MESSAGE("Suppressed %u messages from %s", rl - 1, c->unit), |
glasser
Jul 31, 2017
Contributor
I actually have a script which parses this message and looks at the path (so eg, right now it is looking for /system.slice/docker.service here). Can you confirm that after this change I should be looking for docker.service instead? That's the understanding I get from reading this PR but I don't actually know how to build and test systemd :)
poettering
Jul 31, 2017
Owner
i figure we should add _OBJECT_UNIT_NAME= to this line to make it easy to match against this log message in a structured way. But that should probably happen in a later commit
poettering
Jul 31, 2017
Owner
and yeah, you should check the unit name, not the cgroup path, as we do take liberty that the path might change
keszybz
Jul 31, 2017
Owner
It'd be nice to add SYSTEMD_UNIT= field too to that log message, so it shows up in 'journalctl -u' output, but that's not really related to this PR.
glasser
Jul 31, 2017
•
Contributor
yeah, that would certainly be nicer than us having to run journalctl -f _SYSTEMD_UNIT=docker.service + MESSAGE_ID=a596d6fe7bfa4994828e72309e95d61e and then filter out based on parsing a string but I agree this is probably separate ;)
poettering commentedJul 17, 2017
No description provided.