Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add full metricset and docs for redis #1063

Merged
merged 1 commit into from Mar 3, 2016

Conversation

Projects
None yet
4 participants
@ruflin
Copy link
Collaborator

commented Feb 29, 2016

Currently all data that is reported by redis is sent as an event.

In a second step the notion of a level like minimal, basic, full should be introduced that can be configured so not all data is sent.

It is always possible to use drop_fileds to minimize the number of fields sent. Fields are nested to make it easier for example to remove all server information.

Add full metricset and docs for redis
Currently all data that is reported by redis is sent as an event.

In a second step the notion of a level like minimal, basic, full should be introduced that can be configured so not all data is sent.
It is always possible to use drop_fileds to minimize the number of fields sent. Fields are nested to make it easier for example to remove all server information.

@ruflin ruflin added the Metricbeat label Feb 29, 2016

"os": info["os"],
"process_id": info["process_id"],
"clients": common.MapStr{
"connected_clients": info["connected_clients"],

This comment has been minimized.

Copy link
@ruflin

ruflin Feb 29, 2016

Author Collaborator

As we can use namespacing, should we remove clients from the field names or keep it in as people are use to it from redis?

"cluster": common.MapStr{
"cluster_enabled": info["cluster_enabled"],
},
"cpu": common.MapStr{

This comment has been minimized.

Copy link
@ruflin

ruflin Feb 29, 2016

Author Collaborator

Fields would be shorter if we remove use_cpu as it is the same for all.

@ruflin

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 29, 2016

@tsg @monicasarbu @andrewkroh @urso I would like to discuss some points based on this PR as it provides a good example:

  1. Field names: Should we use the identical field names as are provided by the systems or should we shorten the names? Several system report a flat hierarchy and have long field names to prevent conflicts. Because we can store nested documents to remove these prefixes. The advantage is shorter name, disadvantage could be that people are used to look up the full names.
  2. Reporting levels: Lots of metricsets contain too much information. They also report version and other things we rarely change. I think it is good to have these values in in case someone wants to use them. But I don't think they should be reported by default. There are different options here:
    • Split up into different metricsets: Instead of putting all into one metricset, it could be split up into multiple metricsets. This would mean multiple requests to fetch all data from a system. I prefer to keep all data which is retrieved with one request in metricset.
    • Use reporting levels: I suggest to introduce something like three levels of reporting (minimum, basic, full) and based on the level set, more or less data is sent. What is sent for each level would have to be defined by the creator. The good thing is if one level does not fit, full can always be used with drop_fields to get to the set needed. A better name for "level" and the levels itself would be nice

Thoughts?

@tsg

This comment has been minimized.

Copy link
Collaborator

commented Feb 29, 2016

Re 1: It's probably a case-by-case decision and perhaps for some systems it makes more sense to rename the fields, but I have a slight preference for usually keeping the name as in the source. It signals that that is indeed the raw value as we read it.

Re 2: Just for reference, in Packetbeat and Topbeat we essentially do "sensible defaults" + ways of enables more metrics or disable metrics one by one. In Topbeat, "sensible defaults" is almost "full" (I think missing is only the per-core information), in Packetbeat it's more restrictive by default (e.g. raw byte data is not included).

Would be interesting to look at the other community Beats to see if they felt the need for such levels, or full + drop_fields would be just enough.

Another thing to take into account is that if a field has very few values, it's stored quite efficiently in Elasticsearch, so it might be not such a big issue (note to self: test this).

"uptime_in_days": info["uptime_in_days"],
"hz": info["hz"],
"lru_clock": info["lru_clock"],
"config_file": info["config_file"],

This comment has been minimized.

Copy link
@tsg

tsg Feb 29, 2016

Collaborator

I like the simplicity of this code, but it's also repetitive enough that it would be worth doing the assignments programmatically?

This comment has been minimized.

Copy link
@ruflin

ruflin Mar 1, 2016

Author Collaborator

The problem is that I left out very few fields. There are a few fields which are computed for existing field. I left these fields out as they can be computed (hopefully) in Kibana an do not have to be persisted.

It would be nice if we could simplify / generate the above in the future. I would wait for some more implementations and perhaps we find a common pattern.

This comment has been minimized.

Copy link
@urso

urso Mar 1, 2016

Collaborator

programmatically:

names := []string{ ... }
for _, name := range names {
    event[name] = info[name]
}

This comment has been minimized.

Copy link
@ruflin

ruflin Mar 1, 2016

Author Collaborator

@urso you forget the few fields which I didn't include. Of course I could remove these fields manually instead. One other potential issue with automating this is the mapping. If we set it manually, we know exactly which fields are sent and can set the mapping. If we don't and new fields are added on the server side (for example newer version of redis), new fields are sent without mapping.

"used_cpu_sys": "210.63",
"used_cpu_sys_children": "0.00",
"used_cpu_user": "113.11",
"used_cpu_user_children": "0.00"

This comment has been minimized.

Copy link
@tsg

tsg Feb 29, 2016

Collaborator

Doubles as strings in JSON could be a problem, if explicit ES mappings are not set. As I think explicit mapping are needed anyway, I think it's good to keep it like this, otherwise we'd have the types defined in two places.

This comment has been minimized.

Copy link
@ruflin

ruflin Mar 1, 2016

Author Collaborator

Yes, still working on the mapping part. It would be nice if we only have to define it in one place.

@ruflin

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 1, 2016

  1. Looking at other implementations reveals a mix of "sensible default", everything configurable and sending everything. Looking at Packetbeat it seems like there are specific configuration params for some protocols (like include_additionals and include_authorities for dns). I would like to prevent that and find an approach which works for all implementations.

I think if someone wants "all" data, every metricset implementation should be able to send all data. But by default we should go with the sensible approach. I was thinking of the levels to not introduce metricset specific configurations. In the above case this would be for example "cpu: true" to enable cpu.

In the redis example I would see the following three levels:

  • minimum: stats
  • basic: stats, memory, cpu
  • full: all
@ruflin

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 1, 2016

@monicasarbu For the filtering implementation (#830), is the following assumption correct assuming the full redis document above is taken?

  • include_fields: ["redis-info.stats"]: This would send stats with all its nested fields under stats but leave out all other fields
  • exclude_fields: ["redis-info.stats.sync_full"]: This would send all except the nested field defined.
@monicasarbu

This comment has been minimized.

Copy link
Contributor

commented Mar 1, 2016

@ruflin That's right!

@@ -33,4 +33,4 @@ mysql:
- MYSQL_ROOT_PASSWORD=test

redis:
image: redis:3.0.6
image: redis:3.0.7

This comment has been minimized.

Copy link
@urso

urso Mar 1, 2016

Collaborator

also add versioning to libbeat it's docker-compose.yml

@urso

This comment has been minimized.

Copy link
Collaborator

commented Mar 1, 2016

  1. regarding naming I'm quite on the fence. On the one hand we're reporting the raw values + users might be accustomed to the namings. On the other hand (even though we've some kind of namespacing), I like some kind of normalization. Think having similar metrics from different kind of systems names should be similar for searchability.
  2. kinda like the idea of reporting levels as it's quite simple. More advanced filtering can still be done from within generic filters, logstash or node ingest. Regarding multiple calls to query stats (if two metricsets use same data source), it makes sense to split processing in data acquisition and data extraction. That is, metricsets defines (e.g. by name) the data source and if multiple metricsets use the same data source, the query is done only once and all metricsets are feed from raw data.
@ruflin

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 1, 2016

@urso

  1. Quite on the fence to change or keep the names? About similarity of names between systems is something I think is something we should delay for later as there are too many variables.
  2. In general I would prefer to prevent that multiple metricsets make the same call, as from my perspective that is one of the "criterias" for a metricset.
@urso

This comment has been minimized.

Copy link
Collaborator

commented Mar 2, 2016

  1. to change names
  2. that's what I'm meaning. metricsets should be able extract event data from raw data + if any 2 metricsets would do the same call, it's done only once -> metricset specify 'data-source', but do not do the call themselves.
@ruflin

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 2, 2016

As a summary:

  • We go with the same names that are provided by the services
  • A metricset should always have all data available. It is not clear yet based on what the reduction happens:
    • Filtering (yes)
    • Levels
    • Specific Configuration options
  • Each metricset must have a mapping

As the only thing that this PR does is providing the full data set for redis I suggest to move forward with this PR and decide on the data reduction later. I will make a proposal for that.

@urso Seems like we are on the same page.

tsg added a commit that referenced this pull request Mar 3, 2016

Merge pull request #1063 from ruflin/redis-metricbeat-enhancements
Add full metricset and docs for redis

@tsg tsg merged commit 5f75d2d into elastic:master Mar 3, 2016

4 checks passed

CLA Commit author has signed the CLA
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
default Merged build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.