Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Log at debug level resolved cluster namespaces, use same logger in query and coordinator #1775

Merged
merged 10 commits into from
Jun 26, 2019

Conversation

robskillington
Copy link
Collaborator

@robskillington robskillington commented Jun 25, 2019

What this PR does / why we need it:

A few people have requested at least some basic insight to what namespaces are being resolved, this is adding the first level of debugging in this area. (More to come later, this we can get out quickly).

Also I needed to flow the config to control the logging level through to the query/coordinator so I ended up consolidating all our logging so that it's the exact same loggers using instrument options all the way down in query and coordinator.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

@vjtm
Copy link

vjtm commented Jun 25, 2019

It works great.

Example output in debug mode (it shows two namespaces depending on the retention value):
jun 25 14:55:22 lgmadcengvic01v m3coordinator[53282]: 2019-06-25T14:55:22.268+0200 DEBUG query resolved cluster namespace, will use most granular per result {"query": "", "start": "2019-06-25T14:20:15.000+0200", "end": "2019-06-25T14:55:30.000+0200", "fanoutType": "coversPartialQueryRange", "namespace": "metricsagg", "type": "aggregated", "retention": "20m0s", "resolution": "10s"} jun 25 14:55:32 lgmadcengvic01v m3coordinator[53282]: 2019-06-25T14:55:32.447+0200 DEBUG query resolved cluster namespace, will use most granular per result {"query": "", "start": "2019-06-25T14:46:45.000+0200", "end": "2019-06-25T14:54:00.000+0200", "fanoutType": "coversAllQueryRange", "namespace": "metrics", "type": "unaggregated", "retention": "10m0s", "resolution": "0s"}

Thx!

@vjtm
Copy link

vjtm commented Jun 25, 2019

One detail. I put the mode back to info but the message still appears.

@robskillington
Copy link
Collaborator Author

@vjtm interesting, this may be due to the fact two loggers exist when coordinator and dbnode run side by side. I might fix that too with this PR.

@robskillington
Copy link
Collaborator Author

Updated with all loggers being constructed equally now in the same process, that should avoid what you're seeing with the logs being logged regardless of config.

@robskillington robskillington changed the title [query] Log at debug level resolved cluster namespaces [query] Log at debug level resolved cluster namespaces, use same logger in query and coordinator Jun 26, 2019
@vtomasr5
Copy link

vtomasr5 commented Jun 26, 2019

Hi,
Some weird is happening now.

I have the following in place (just for testing):

  • three VMs for etcd cluster
  • VM: prometheus + m3coordinator + m3dbnode1
  • VM: m3dbnode2
    Note: both processes m3dbnode and m3coordinator are recent builds from this PR code.

The m3dbnode1 with debug mode activated show debug messages in the logs, but no query resolved cluster namespace log lines.
The m3dbnode2 with debug mode activated doesn't show debug messages at all.

The cluster works fine. I see data in grafana. Metrics are comming from https://github.com/open-fresh/avalanche

I am @vjtm, just using a different account :)

@vtomasr5
Copy link

@robskillington
Copy link
Collaborator Author

Hey @vtomasr5 @vjtm the reason I believe you're not seeing the message anymore is you actually need to set log level to debug for the coordinator specifically in your config file.

So at the top of m3coordinator.yml adding this:

logging:
  level: debug

I manage to see the correct debug output for the namespaces resolved when sending queries to the coordinator itself now.

Copy link
Collaborator

@arnikola arnikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved /w nits; only real point is we may miss visibility in local v remote storage

SetLookbackDuration(h.lookbackDuration).
SetGlobalEnforcer(nil).
SetInstrumentOptions(h.instrumentOpts.
SetMetricsScope(h.instrumentOpts.MetricsScope().SubScope("debug_engine")))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: split this up into 2 statements?

}

// NewOptions enforces that fields are set when options is created.
func NewOptions(p OptionsParams) (Options, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for, just removing direct struct generation here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was possible to create it without InstrumentOptions and consequently it could easily lead to nil pointer panics. So now it's protected against that by always enforcing it to be set when constructing Options.

SetStore(backendStorage).
SetLookbackDuration(*cfg.LookbackDuration).
SetGlobalEnforcer(perQueryEnforcer).
SetInstrumentOptions(instrumentOptions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: split into 2 statements

debugLog := s.logger.Check(zapcore.DebugLevel,
"query resolved cluster namespace, will use most granular per result")
if debugLog != nil {
for _, n := range namespaces {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick note: currently I'm not sure you'll be able to tell if you're serving the query yourself or serving a remote query here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably ok for just simple debugging. We can get more complex later perhaps?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's fine; only concern is if you get random spam in your logs from remote queries

@@ -241,7 +245,7 @@ func retrieveMetadata(streamCtx context.Context) context.Context {
}
}

return logging.NewContextWithID(streamCtx, id)
return logging.NewContextWithID(streamCtx, instrumentOpts, id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: put options last here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing.

// Attach a rqID with all logs so that its simple to trace the whole call stack
rqID := uuid.NewRandom()
return NewContextWithID(ctx, rqID.String())
return NewContextWithID(ctx, instrumentOpts, rqID.String())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: options last here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing.

func NewContextWithID(ctx context.Context, id string) context.Context {
func NewContextWithID(
ctx context.Context,
instrumentOpts instrument.Options,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: set the options as last param

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing.

@vtomasr5
Copy link

I've added those lines at the top of m3coordinator.yml. Restarted the service.

But I am unable to see the debug logging.
This is what I see:

Jun 26 10:49:55 lgmadcengvic01v m3coordinator[8056]: 2019/06/26 10:49:55 Go Runtime version: go1.12.6
Jun 26 10:49:55 lgmadcengvic01v m3coordinator[8056]: 2019/06/26 10:49:55 Build Version:      v0.10.2
Jun 26 10:49:55 lgmadcengvic01v m3coordinator[8056]: 2019/06/26 10:49:55 Build Revision:     880ba430
Jun 26 10:49:55 lgmadcengvic01v m3coordinator[8056]: 2019/06/26 10:49:55 Build Branch:       r/debug-logs-resolve-cluster-namespaces
Jun 26 10:49:55 lgmadcengvic01v m3coordinator[8056]: 2019/06/26 10:49:55 Build Date:         2019-06-26-10:32:40
Jun 26 10:49:55 lgmadcengvic01v m3coordinator[8056]: 2019/06/26 10:49:55 Build TimeUnix:     1561537960
Jun 26 10:49:55 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561538995.7882032,"msg":"tracing disabled for m3query; set `tracing.backend` to enable"}
Jun 26 10:49:55 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561538995.8114905,"msg":"successfully loaded cache from file","file":"/var/lib/m3kv/_kv_default_env_m3db_embedded.json"}
Jun 26 10:49:55 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561538995.8630068,"msg":"waiting for dynamic topology initialization, if this takes a long time, make sure that a topology/placement is configured"}
Jun 26 10:49:55 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561538995.8637817,"msg":"adding a watch","service":"m3db","env":"default_env","zone":"embedded","includeUnhealthy":true}
Jun 26 10:49:55 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561538995.8656628,"msg":"successfully loaded cache from file","file":"/var/lib/m3kv/m3db_embedded.json"}
Jun 26 10:49:55 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561538995.8790286,"msg":"initial topology / placement value received"}
Jun 26 10:49:56 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561538996.265582,"msg":"resolved cluster namespace","namespace":"metrics"}
Jun 26 10:49:56 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561538996.2656374,"msg":"resolved cluster namespace","namespace":"metricsagg"}
Jun 26 10:49:56 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561538996.4539897,"msg":"successfully updated topology","numHosts":1}
Jun 26 10:49:56 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561538996.5211146,"msg":"successfully updated topology","numHosts":1}
Jun 26 10:49:56 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561538996.860387,"msg":"configuring downsampler to use with aggregated cluster namespaces","numAggregatedClusterNamespaces":1}
Jun 26 10:49:56 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561538996.8606749,"msg":"successfully loaded cache from file","file":"/var/lib/m3kv/_kv_default_env_m3db_embedded.json"}
Jun 26 10:50:06 lgmadcengvic01v m3coordinator[8056]: {"level":"error","ts":1561539006.886191,"msg":"error initializing namespaces values, retrying in the background","key":"/namespaces","error":"initializing value error:init watch timeout"}
Jun 26 10:50:06 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561539006.9341915,"msg":"received kv update","version":1,"key":"/placement"}
Jun 26 10:50:06 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561539006.934315,"msg":"election manager opened successfully"}
Jun 26 10:50:07 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561539007.9357057,"msg":"election state changed from follower to leader"}
Jun 26 10:50:07 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561539007.944032,"msg":"no m3msg server configured"}
Jun 26 10:50:07 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561539007.9440768,"msg":"starting API server","address":"0.0.0.0:7201"}
Jun 26 10:50:07 lgmadcengvic01v m3coordinator[8056]: {"level":"info","ts":1561539007.9444742,"msg":"registered new interrupt handler"}

On the other hand, in the m3dbnode1 with the debug option I cab see the debug messages. Not in the m3dbnode2 which seems to happen the same as in the m3coordinator.

@robskillington
Copy link
Collaborator Author

@vtomasr5 are you sure you are sending queries to that coordinator?

I'm seeing it in my tests:

{"level":"debug","ts":1561541480.8201895,"msg":"query resolved cluster namespace, will use most granular per result","query":"","start":1561537575,"end":1561541490,"fanoutType":"coversAllQueryRange","namespace":"metrics_10s_48h","type":"aggregated","retention":"48h0m0s","resolution":"10s"}

Relevant build details:

$ docker-compose logs -f m3coordinator01
Attaching to m3_stack_m3coordinator01_1
m3coordinator01_1  | 2019/06/26 09:30:25 Go Runtime version: go1.12.6                                                                                                                                                                       m3coordinator01_1  | 2019/06/26 09:30:25 Build Version:      v0.10.2
m3coordinator01_1  | 2019/06/26 09:30:25 Build Revision:     531cc35f
m3coordinator01_1  | 2019/06/26 09:30:25 Build Branch:       r/debug-logs-resolve-cluster-namespaces
m3coordinator01_1  | 2019/06/26 09:30:25 Build Date:         2019-06-26-07:06:32
m3coordinator01_1  | 2019/06/26 09:30:25 Build TimeUnix:     1561532792

@vjtm
Copy link

vjtm commented Jun 26, 2019

I'm sorry. Prometheus was not starting fine.

It works perfect.

@robskillington robskillington merged commit bc2c906 into master Jun 26, 2019
@robskillington robskillington deleted the r/debug-logs-resolve-cluster-namespaces branch June 26, 2019 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants