Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obey debug_level when syslog()ing #7

Merged
merged 1 commit into from Aug 31, 2020
Merged

Conversation

frozencemetery
Copy link
Member

(This is a manual migration of https://pagure.io/gssproxy/pull-request/251 , a downstream patch which makes our logging behavior more admin-pleasing)

@frozencemetery:

Second commit is the real fix. First commit is a conceptual change to merge "logging" and "debugging".

@simo5:

Sorry but the reason why they are separate is that I wanted to make very clear that loggin and debugging should be two different things.
And should emit different things.

Can you give more explanation of the rationale ?

I definitely and, on purpose, recall not wanting debug going to syslog which is generally not the place for debugging logs as it may be sourced and aggregated in remote systems and debug may spit sensitive data like keys.

@simo5:

I guess my mistake was not to have also two completely separate files, perhaps we should fix that.

@frozencemetery:

In developing the second patch (which I hope is okay?), I had a surprising amount of trouble figuring out where the messages were coming from. In normal operation, the "print" logging and the "syslog" logging get merged into one place, and if one doesn't touch the codebase frequently...

Additionally, the fix in the second patch causes the "syslog" system to depend on a variable (gp_debug) from the "debug" system, which made me think it made sense to merge them.

(I'd also be fine with merging the other direction - keep gp_log and remove gp_debug. I only did it this way for a smaller diff.)

@simo5:

I think the main issue is that I do not agree to merge debugging ad logging.
But perhaps I do not understand the problem, maybe ping me up for a high b/w discussion on that.

(@frozencemetery rebased)

@frozencemetery:

Per out-of-band review, we agreed to use an off-by-default parameter to configure logging, and keep logging separate from debugging.

When investigating implementation, I determined that all messages I'm concerned with pass through gp_log_status(); thus, this parameter only controls those. Messages from GPERROR() are all things that an admin really should fix.

Additionally, there's only a single usage of GPAUDIT(); this isn't really a problem, but we might move it to GPERROR() for consistency.

Please double-check my variable/parameter names for clarity.

(@frozencemetery rebased)

@simo5:

So if I understand correctly, we still output errors to syslog in general, except now we make it a config option to log the status of "acquire cred" calls to syslog.
I think that is fine, am I right that debug statements still print the output at a deeper level so nothing is lost of debug purposes ?

@frozencemetery:

Technically the debug logs are sufficient, but it's harder to follow without the syslogs. Because debugging and logging are separate, the two sets of logs are wholly disjoint. Here's my notes which might help explain why I wanted to tie the two systems together:

  • gp_log_status() is only called from a single place (in gp_creds.c), and will only show failures around credential acquisition. It is not called on success.
  • gp_log_failure() is also only called from a single place (in gp_rpc_indicate_mechs.c), and will only show failures around mech indication.
  • despite the name, gp_log_failure() sends output to debug, not syslog.
  • gp_log_failure() uses a prefix of "Failed with:", which no other functions do
  • gp_log_failure() will always log - it's not conditional on the value of gp_debug, unlike everything else that can send debug output.
  • GPAUDIT() is only used to indicate that a config directory couldn't be read - there are no other calls
  • almost everything logged with GPERROR() could be argued to be made fatal if we didn't have other constraints
  • GPERROR() syslogs at LOG_ERROR, while GPAUDIT() syslogs at LOG_INFO.
  • in light of that, it's confusing that GPDEBUG() sends to the debug output, rather than syslog at LOG_DEBUG.
  • there's a third logging system for the RPC layer, built on gp_debug_printf() (which always logs, but the RPC logging checks the value of gp_debug)

It's this RPC debugging layer that will log the return statuses, though they become harder to pick out, and haven't been run through gss_display_status().

@simo5:

Sounds all pretty confusing :-( Logging is hard.

Here are my intentions when I started:

  • send logs to syslog
  • send debug to a file
  • audit is for when admin care only to see who does what, but no verbose logging is required

All the information is complementary, however it may make sense to replicate all logging in debug files for correlation purposes.
In particular my intention was that an admin would get enough info from logs to know what is failing and not require to crank up debugging unless directed by a support person.
Debugging can be as verbose as needed because it will never end in syslog.

If you are inclined to improve the situation in this PR I am ok with it, otherwise I am ok with the current patch for now, and to open an issue to adjust back al logging and debugging according to these guidelines if possible.

If not possible I would rather stop completely sending to syslog ever and only have debug logs.

@frozencemetery:

I think the PR as it is now addresses my immediate issue (that admins don't like seeing failures here). However, it intentionally violates your assumption that "an admin would get enough info from logs to know what is failing".

@simo5:

I feared that, indeed.

@frozencemetery:

What would you like to have happen here? Are there changes in approach or code you'd like to see from this PR? I'm happy to open such an issue, but I am trying to break one of your intentions here so I'm not sure where to go from here?

@simo5:

Any chance we can apply this patch downstream for now and revisit this issue in a more complete way for upstream ?

@frozencemetery:

Well, this is upstream and I was hoping we could have a discussion to figure out what that would be :)

But yes, if it comes time to release and we don't have an easy solution, I can do the simple thing.

Add a parameter (syslog_status) to configuration and
CLI (--syslog-status).  This logs the results of GSSAPI calls at
LOG_DEBUG.  Typically these calls resemble:

    gssproxy[28914]: (OID: { 1 2 840 113554 1 2 2 }) Unspecified GSS
    failure.  Minor code may provide more information, No credentials
    cache found

Since these messages worry some admins, turn them off by default.

Signed-off-by: Robbie Harwood <rharwood@redhat.com>
@simo5
Copy link
Contributor

simo5 commented Aug 31, 2020

I like this PR

@simo5 simo5 merged commit 8d865bd into gssapi:main Aug 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants