Add raw mysql fields to the mysql event #3001

ruflin · 2016-11-14T18:12:04Z

This will include all mysql stats into the event under mysql.stats.raw. All keys will change to lower case.

Discussions:

What should be namespace for these fields? raw, extended
Should we normalise the keys to lower case?
Should we replace _ by .?
- This can cause namespace conflicts if a_b and a_b_c has a value (happens in mysql)
- Also we could hit index.mapping.depth.limit (default 20)
Should this be experimental and available for all metricsets?
What are we doing about the dynamic filed limit? -> disable it? Add size or number of fields limit on dynamic mapping elasticsearch#11443 (default 1000)

Potential improvements:

Remove duplicates which are already part of the event

TODO:

Update docs
Update changelog

tsg · 2016-11-14T18:17:53Z

+1 for raw and I think we should stick to the original names as much as possible (since this is the "raw" section). That means keeping _ as they are, which also avoids the transformation issues.

I guess it would be nicer to have this in it's own metricset.

ruflin · 2016-11-15T08:04:08Z

@tsg I did some tests with key names in elasticsearch and also spaces and lots of special characters just work. That means also for MongoDB we can use the original keys. In case one of the services uses dots we then have to decide if these should be replaced by _ to prevent potential conflicts.

I would prefer to have the "raw" fields inside the same metricset as it is like an extension of the metricset itself and only one config option is need to enable the additional fields. Overtime I think more and more fields will be moving from the raw to the structured document.

Implementation wise I would probably make raw a default config option that is available in each metricset, but it is up to the MetricSet to implement the additional fields.

andrewkroh · 2016-11-15T14:32:27Z

Adding both the curated and raw fields seems like it creates a lot of duplicated data. As a user I think I would only want to store one copy of the data. Maybe we could remove fields that are in the schema from the raw fields. This means the module can send data for fields that are unknown a priori, and we can still provide an optimized experience for the fields that are known (like normalizing storage units to bytes so they render nicely in Kibana or using scaled_floats in mappings for percentages).

+1 for lower-casing raw fields

ruflin · 2016-11-15T15:17:35Z

Agree that we should not duplicate fields. It would be nice if we could use apply.Schema() to remove the fields which were actually found so we would only iterate over the existing fields afterwards.

tsg · 2016-11-15T17:19:15Z

I was trying to think through a likely user scenario:

user tries Metricbeat with default settings. She is generally happy.
But one day, she discovers that a metric that is exported by the service is not present in Metricbeat
Looks up a solution, learns about the raw option. Enables it.
Happy about the one metric she wanted, but not happy about the other 435 metrics added 😢

To me it sounds like we should not care about duplication but rather allow to cherry pick raw metrics. I'm not sure how to do that from the usability point of view, especially how to find out which metrics are available.

CC @brandonmensing

brandonmensing · 2016-11-15T17:37:59Z

I think part of the joy of Metricbeat is that it's easy so I'm with you on the sadness of that user scenario!

I think this usage pattern might not be the most common so maybe a phased approach here would be good. Maybe we do 1 today, 2 later?

Just grab the whole thing. Having the options is still better than not.
Allow the user to specify a list of extra fields to pick up. User has to look it up in the docs or a 3rd party reference to figure out what those fields are.
When we learn more about how much anyone uses this, maybe we'll migrate extra fields into the main metricset and/or come up with a better way to allow them to configure extra fields. Maybe we develop an extra metricset of 'advanced' fields.

ruflin · 2016-11-15T18:53:45Z

@tsg I think your use case will describe it pretty well. Our "standard" metricset should cover 90-95% of the uses cases. More experience users will look for additional fields. Based on the unhappiness in point 4 I hope 5 happens: Open Github issue or PR to add the missing metric. The hard part will be where do we say "no" and not add it to the base metricset?

If someone only wants to add one metric I would hope we can solve this with filters:

Drop event if (mysql.status.info.raw.* AND NOT mysql.status.info.raw.your_metric)

Otherwise a whitelist approach as proposed by @brandonmensing could also be interesting. But both approaches should probably only be added in a second step when we see how / if people are going to use it.

BTW: This reminds me of the first metricbeat implementation where there was a base, extended and full metricset instead of the filters.

andrewkroh · 2016-11-17T16:09:27Z

metricbeat/mb/mb.go

@@ -57,6 +57,7 @@ type MetricSet interface {
 	Host() string   // Host returns a hostname or other module specific value
 	// that identifies a specific host or service instance from which to collect
 	// metrics.
+	Raw() bool // Raw defines if raw metrics should be sent


I'd like to avoid adding new methods to this interface were possible. In this case, when you need this information it is available in the MetricSet by m.Module().Config().Raw.

agree, changed.

andrewkroh · 2016-11-17T16:12:21Z

metricbeat/mb/mb.go

@@ -111,6 +118,7 @@ type ModuleConfig struct {
 	MetricSets []string                `config:"metricsets" validate:"required"`
 	Enabled    bool                    `config:"enabled"`
 	Filters    processors.PluginConfig `config:"filters"`
+	Raw        bool                    `config:"raw"`


It would be nice to also add a godoc comment explaining the meaning of this field.

andrewkroh · 2016-11-17T16:13:56Z

metricbeat/module/mysql/status/status_integration_test.go

+	_, exists := rawData["Threads_cached"]
+	assert.False(t, exists)
+
+	// Check random oder raw field if available


Changed to Check a raw field if it is available

andrewkroh · 2016-11-17T16:28:58Z

metricbeat/mb/mb.go

@@ -111,6 +118,7 @@ type ModuleConfig struct {
 	MetricSets []string                `config:"metricsets" validate:"required"`
 	Enabled    bool                    `config:"enabled"`
 	Filters    processors.PluginConfig `config:"filters"`
+	Raw        bool                    `config:"raw"`


It would be nice to also add a godoc comment explaining the meaning of this field.

andrewkroh · 2016-11-17T16:43:28Z

metricbeat/schema/schema_test.go

+	assert.True(t, schema.HasKey("test_b"))
+	assert.True(t, schema.HasKey("test_a"))
+	assert.False(t, schema.HasKey("nooo"))
+}


I don't think HasKey should return true for both the "source" keys and the "output" keys. Here's the behavior I expected which returns true only for the "source" keys.

func TestHasKey(t *testing.T) { schema := Schema{ "test": Conv{Key: "Test", Func: nop}, "test_obj": Object{ "test_a": Conv{Key: "TestA", Func: nop}, "test_b": Conv{Key: "TestB", Func: nop}, }, } assert.True(t, schema.HasKey("Test")) assert.True(t, schema.HasKey("TestA")) assert.True(t, schema.HasKey("TestB")) assert.False(t, schema.HasKey("test")) assert.False(t, schema.HasKey("test_obj")) assert.False(t, schema.HasKey("test_a")) assert.False(t, schema.HasKey("test_b")) assert.False(t, schema.HasKey("other")) }

which would mean that hasKey() would be:

func hasKey(key string, mappers map[string]Mapper) bool { for _, mapper := range mappers { if mapper.HasKey(key) { return true } } return false }

obviously you are right -> wrong test leads to wrong implementation 👎

ruflin · 2016-11-17T18:43:28Z

@andrewkroh New version pushed. The current config option is generic. Even though I currently only documented it in the status metricset as this is the only place available. As soon as we add it to other metricsets, we should have a general documentation.

andrewkroh · 2016-11-17T18:47:07Z

metricbeat/module/mysql/status/status.go

@@ -32,6 +32,7 @@ type MetricSet struct {
 	mb.BaseMetricSet
 	dsn string
 	db  *sql.DB
+	raw bool


Is this unused?

andrewkroh · 2016-11-17T18:47:31Z

metricbeat/mb/mb.go

@@ -81,6 +81,7 @@ type BaseMetricSet struct {
 	name   string
 	module Module
 	host   string
+	raw    bool


Is this unused?

Each metricset can add raw fields under the `raw` namespace in addition to the existing fields. The raw fields should contain the raw data coming the service itself. Fields are not changed in any way. As a first example, raw fields were added to mysql status. Remove unused raw variables

ruflin added the discuss Issue needs further discussion. label Nov 14, 2016

ruflin force-pushed the mysql-extended branch from b997208 to d7237c0 Compare November 15, 2016 08:25

ruflin closed this Nov 15, 2016

ruflin reopened this Nov 15, 2016

ruflin mentioned this pull request Nov 15, 2016

Mysql and Postgres modules should allow addition of custom queries #2987

Closed

ruflin force-pushed the mysql-extended branch 2 times, most recently from 2f7537d to c6ee4d5 Compare November 17, 2016 15:51

ruflin added Metricbeat Metricbeat review in progress Pull request is currently in progress. and removed discuss Issue needs further discussion. labels Nov 17, 2016

andrewkroh reviewed Nov 17, 2016

View reviewed changes

ruflin force-pushed the mysql-extended branch from c6ee4d5 to 0ef4a3f Compare November 17, 2016 18:42

andrewkroh reviewed Nov 17, 2016

View reviewed changes

andrewkroh approved these changes Nov 17, 2016

View reviewed changes

ruflin removed the in progress Pull request is currently in progress. label Nov 18, 2016

ruflin force-pushed the mysql-extended branch from 71cda3d to c02a9a9 Compare November 18, 2016 09:39

andrewkroh merged commit 0b33ee3 into elastic:master Nov 18, 2016

ruflin deleted the mysql-extended branch November 18, 2016 13:23

ruflin mentioned this pull request Nov 30, 2016

Metricbeat: Apache 2.4.23 field not found errors #3074

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add raw mysql fields to the mysql event #3001

Add raw mysql fields to the mysql event #3001

ruflin commented Nov 14, 2016 •

edited

tsg commented Nov 14, 2016

ruflin commented Nov 15, 2016

andrewkroh commented Nov 15, 2016

ruflin commented Nov 15, 2016

tsg commented Nov 15, 2016

brandonmensing commented Nov 15, 2016

ruflin commented Nov 15, 2016 •

edited

andrewkroh Nov 17, 2016

ruflin Nov 17, 2016

andrewkroh Nov 17, 2016

ruflin Nov 17, 2016

andrewkroh Nov 17, 2016

ruflin Nov 17, 2016

andrewkroh Nov 17, 2016

andrewkroh Nov 17, 2016

ruflin Nov 17, 2016

ruflin commented Nov 17, 2016

andrewkroh Nov 17, 2016

ruflin Nov 17, 2016

andrewkroh Nov 17, 2016

ruflin Nov 17, 2016

Add raw mysql fields to the mysql event #3001

Add raw mysql fields to the mysql event #3001

Conversation

ruflin commented Nov 14, 2016 • edited

tsg commented Nov 14, 2016

ruflin commented Nov 15, 2016

andrewkroh commented Nov 15, 2016

ruflin commented Nov 15, 2016

tsg commented Nov 15, 2016

brandonmensing commented Nov 15, 2016

ruflin commented Nov 15, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin commented Nov 17, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin commented Nov 14, 2016 •

edited

ruflin commented Nov 15, 2016 •

edited