Remove cont_id tag from docker plugin #1015

sparrc · 2016-04-12T21:00:24Z

this would make container_id a field

also renaming cont_name and cont_image to container_name and
container_image.

sparrc · 2016-04-12T21:02:42Z

including some potentially interested parties for review. See description above for an overview of the changes this would bring, some of which would require adjustment to your queries.

gena01 · 2016-04-12T21:22:18Z

@sparrc can you make this one optional?

I know it's an edge case at this point but graphing metrics localized to a single container is based on container_id and if we remove that then we can't track the behavior of the container over time.

jchauncey · 2016-04-12T21:27:14Z

Yeah container id is definitely needed

sparrc · 2016-04-12T21:27:26Z

@gena01 @jchauncey what if we made it a field instead of a tag?

gena01 · 2016-04-13T14:33:46Z

@sparrc can you remind me what the difference is?

sparrc · 2016-04-13T15:48:56Z

@gena01

In line-protocol terms:

cpu,host=localhost usage=99 <timestamp>
     ^tag            ^field

basically, tags are indexed and fields are not. This means that WHERE queries on fields will be less performant than tags. You will still be able to run SELECT queries on fields. The major drawback of tags is that each unique set of measurement/tags creates a new "series" in influxdb, which increases cardinality of the database (which significantly effects performance)

There's a more detailed doc here: https://docs.influxdata.com/influxdb/v0.12/concepts/schema_and_data_layout/#encouraged-schema-design

jchauncey · 2016-04-13T15:56:25Z

What is the reasoning behind removing it? Just trying to get a little context.

sparrc · 2016-04-13T16:08:31Z

some users complained because when you bring a docker container up and down, the ID will change. This means that for every container restart you will create a new container_id tag value, which creates a new series in InfluxDB.

Creating a new series increases the cardinality and too many series can bog down the performance of the database.

So the reason behind this is because it can easily create a very poorly performing InfluxDB schema design, since most docker users stop & restart containers frequently.

jchauncey · 2016-04-13T16:11:31Z

Ah ok. My first thought is more philosophical in that I wonder how they are graphing the data? I would want to slice the data along the container_id so when new containers come online or go away I would see that reflected in the graph. Container names are pointless in almost all cases because they are either going to be teh same or some random junk that is hard to parse anyways.

If going from tag to field helps with performance than thats fine with me.

sparrc · 2016-04-13T16:24:19Z

@jchauncey, that should still work, SELECT container_id FROM docker will still be a valid query.

The main difference would be that SELECT <some_metric> FROM docker WHERE container_id='...' will have worse performance.

jchauncey · 2016-04-13T16:28:57Z

K i doubt people are doing the second (unless they have some templated query they are using to filter a graph)

sparrc · 2016-04-13T16:52:28Z

cc @tripledes @asdfsx @AdithyaBenny @zstyblik @sporokh including some more people who I think will want to know about this potential change

sporokh · 2016-04-14T07:58:42Z

@sparrc Thanks a lot for notifying. Much appreciate.

asdfsx · 2016-04-14T08:12:38Z

thx for notifying！！！

zstyblik · 2016-04-14T08:55:21Z

@sparrc turning container ID into field makes sense as you've explained.

gena01 · 2016-04-15T17:22:26Z

@sparrc would it make sense to make cont_name a field as well?

sparrc · 2016-04-15T17:27:03Z

I suppose it depends on how the user is using container names in docker. Some people might setup their containers with the same name on restart, others might go with the randomized docker default names.....

but since docker randomizes by default, I'm thinking that you're right and that we should make container_name a field as well.

Anyone here have other thoughts? Do you ever do WHERE queries using the cont_name tag? (ie SELECT <metric> FROM docker_cpu WHERE cont_name='foo')

jchauncey · 2016-04-15T17:30:09Z

Knowing how containers are named in kubernetes it would probably be best if container name was a field.

zarnovican · 2016-04-17T07:12:45Z

I'm one of those hit by "high cardinality of docker measurements" issue. From my point, #1017 would fix the problem. You would simply filter-out tag which has randomized value.

However, we have static container names, which is used in Grafana queries like:

SELECT "usage_percent" FROM "docker_cpu"
WHERE "host" = '$host' AND $timeFilter
GROUP BY "cont_name"

Would I be able to GROUP BY name if it was a field ??

tripledes · 2016-04-17T11:01:02Z

@sparrc thanks for pinging.

I guess this change makes sense specially for InfluxDB, right? Just wondering would the change affect other outputs which are not that flexible from a data structure point of view...any idea?

whetherharder · 2016-04-17T13:14:50Z

May be the best way is to make it configurable ? In my particular case
grouping data by cont_name is important to have required detalization
17.04.2016 14:01 пользователь "Sergio Jimenez" notifications@github.com
написал:

@sparrc https://github.com/sparrc thanks for pinging.

I guess this change makes sense specially for InfluxDB, right? Just
wondering would the change would affect other outputs which are not that
flexible from a data structure point of view...any idea?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#1015 (comment)

sparrc · 2016-04-17T15:52:51Z

Investigating it further, I actually think that we'll need to keep container_name a tag (but I will keep container_id a field).

The reason is that, if we don't make container_name a tag, then there is nothing to uniquely identify two containers with the same image. If container_name is creating too much cardinality then users can use the new tagexclude argument to remove it from their measurements if they don't need it (see #1017)

To @tripledes point, this is true for InfluxDB, Graphite, and prometheus

sparrc · 2016-04-17T19:22:48Z

Thinking about this a bit more, I think that it makes the most sense to rename a few of the measurements, specifically:

docker_cpu
docker_mem
docker_net

would be renamed to:

docker_container_cpu
docker_container_mem
docker_container_net

Why? Because these metrics are specifically tracking per-container stats. The problem with per-container stats, in some use-cases, is that if containers are short-lived AND names are not kept consistent, then the series cardinality will balloon very quickly.

So adding "_container" to each metric would:

make it more clear that these metrics are per-container, and
allow users to easily drop per-container metrics if cardinality is an issue (namedrop = ["docker_container_*"])

NOTE to clarify, we will still be changing container_id to a field, and container_name will remain a tag (so GROUP BY and WHERE will still work on container_name).

thoughts?

jchauncey · 2016-04-17T19:24:04Z

Sounds good to me.

sporokh · 2016-04-18T08:28:32Z

@sparrc totally agree with you considering container_name. I think it should be keep as tag, first of all cause you've already mentioned - we removing the ability to identify two containers with the same image, and the second one - stable production environments run by docker/docker-compose do not use randomly generated names.
I guess the problem only with:
Kubernetes
Docker when you don't set "--name"

- renaming cont_name and cont_image to container_name and container_image. - cont_id is now a field, called container_id - docker_cpu, docker_mem, docker_net measurements have been renamed to docker_container_cpu, docker_container_mem, and docker_container_net closes #1014 closes #1052

sparrc · 2016-04-18T21:56:46Z

OK, this has now been merged into master as I think I've given everyone time to provide input, and overall I felt like there was positive feedback on this change, despite it being a breaking change.

see the changelog for a full rundown

thanks all for the input!

sparrc force-pushed the cs1014 branch from 9183e6a to 0f5898e Compare April 13, 2016 16:44

sparrc force-pushed the cs1014 branch from 0f5898e to 0e9088f Compare April 17, 2016 22:17

sparrc force-pushed the cs1014 branch 3 times, most recently from 90dd487 to f559214 Compare April 18, 2016 17:39

sparrc force-pushed the cs1014 branch from f559214 to 36d330f Compare April 18, 2016 21:17

sparrc merged commit 36d330f into master Apr 18, 2016

sparrc deleted the cs1014 branch April 18, 2016 23:00

sparrc mentioned this pull request May 18, 2016

telegraf 0.13 docker plugin problem: why do you consider "container_id" as value not key? #1225

Closed

ctrlok mentioned this pull request Aug 28, 2017

Skip non-numerical values in graphite format #3179

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove cont_id tag from docker plugin #1015

Remove cont_id tag from docker plugin #1015

sparrc commented Apr 12, 2016

sparrc commented Apr 12, 2016

gena01 commented Apr 12, 2016

jchauncey commented Apr 12, 2016

sparrc commented Apr 12, 2016

gena01 commented Apr 13, 2016

sparrc commented Apr 13, 2016

jchauncey commented Apr 13, 2016

sparrc commented Apr 13, 2016

jchauncey commented Apr 13, 2016

sparrc commented Apr 13, 2016

jchauncey commented Apr 13, 2016

sparrc commented Apr 13, 2016

sporokh commented Apr 14, 2016

asdfsx commented Apr 14, 2016

zstyblik commented Apr 14, 2016

gena01 commented Apr 15, 2016

sparrc commented Apr 15, 2016

jchauncey commented Apr 15, 2016

zarnovican commented Apr 17, 2016

tripledes commented Apr 17, 2016 •

edited

Loading

whetherharder commented Apr 17, 2016

sparrc commented Apr 17, 2016

sparrc commented Apr 17, 2016 •

edited

Loading

jchauncey commented Apr 17, 2016

sporokh commented Apr 18, 2016

sparrc commented Apr 18, 2016

Remove cont_id tag from docker plugin #1015

Remove cont_id tag from docker plugin #1015

Conversation

sparrc commented Apr 12, 2016

sparrc commented Apr 12, 2016

gena01 commented Apr 12, 2016

jchauncey commented Apr 12, 2016

sparrc commented Apr 12, 2016

gena01 commented Apr 13, 2016

sparrc commented Apr 13, 2016

jchauncey commented Apr 13, 2016

sparrc commented Apr 13, 2016

jchauncey commented Apr 13, 2016

sparrc commented Apr 13, 2016

jchauncey commented Apr 13, 2016

sparrc commented Apr 13, 2016

sporokh commented Apr 14, 2016

asdfsx commented Apr 14, 2016

zstyblik commented Apr 14, 2016

gena01 commented Apr 15, 2016

sparrc commented Apr 15, 2016

jchauncey commented Apr 15, 2016

zarnovican commented Apr 17, 2016

tripledes commented Apr 17, 2016 • edited Loading

whetherharder commented Apr 17, 2016

sparrc commented Apr 17, 2016

sparrc commented Apr 17, 2016 • edited Loading

jchauncey commented Apr 17, 2016

sporokh commented Apr 18, 2016

sparrc commented Apr 18, 2016

tripledes commented Apr 17, 2016 •

edited

Loading

sparrc commented Apr 17, 2016 •

edited

Loading