Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No refresh of file service discovery targets if only __meta labels changed #3693

Closed
ajaegle opened this Issue Jan 16, 2018 · 11 comments

Comments

Projects
None yet
3 participants
@ajaegle
Copy link

ajaegle commented Jan 16, 2018

What did you do?

Using file_sd_config as discovery mechanism for a target group generated by a custom docker swarm service discovery (similar to https://github.com/ContainerSolutions/prometheus-swarm-discovery). When updating the targets file, I couldn't see any update in the targets list of the Prometheus web interface. My first assumption was that there are issues with fsnotify inside of container volumes, but this was not the case. The issue I faced was that there is no updated configuration visible in the targets view when only __meta labels change.

After some investigation, I found the reason for the omitted updates in the code where the list of scrape tasks is synced using a hash function of the target. This hash is calculated using the url and the labels but not the discoveredLabels where these __meta labels are part of.

// hash returns an identifying hash for the target.
func (t *Target) hash() uint64 {
	h := fnv.New64a()
	h.Write([]byte(fmt.Sprintf("%016d", t.labels.Hash())))
	h.Write([]byte(t.URL().String()))

	return h.Sum64()
}

Is it an intended behavior that an update of __meta labels does not trigger an update of the scrape targets - probably to prevent too frequent updates? My assumption was that I can use these __meta labels to add all information I get when talking to the Docker API and so I would be able to make use of them during the relabeling phase. In my opinion, it can be a regular use case that service labels are updated without any further changes of the target urls or regular labels. If one should not use these, which would be a better "namespace" for these custom labels?

What did you expect to see?

Even if the targets and regular labels remain the same, the configuration should reflect the change/addition/removal of __meta labels.

What did you see instead? Under which circumstances?

The configuration update was not applied. The targets view still lists the old combination of __meta labels.

Environment

Prometheus 2.0.0 deployed using Docker Swarm - and also reproduced with a local setup on a pure Ubuntu Server 16.04 LTS VM.

  • System information:

Host: Ubuntu 16.04.3 LTS
Docker 17.12.0-ce
Linux 4.4.0-108-generic x86_64

  • Prometheus version:

2.0.0

  • Prometheus configuration file:
scrape_configs:
  - job_name: 'services'
    file_sd_configs:
      - files:
        - "/xxx/targets.json"
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 16, 2018

The labels which are the output of relabelling are used. If a meta label is changed which has no impact on target labels, there's no need to update the target.

@ajaegle

This comment has been minimized.

Copy link
Author

ajaegle commented Jan 17, 2018

Thanks for your clarification. I was confused by the information displayed in the /targets UI because it lists the available metrics "before relabeling" which made me assume that I can see all available (even unused during relabeling) labels of the listed targets.

If I use one of the labels during relabeling, the value is there, but somehow the information displayed doesn't match the real state of the target. Assume I build and deploy a full relabeling configuration and later modify the process that generates the targets file for file_sd to list one more __meta label, its value will be used during relabeling but I will not see the original __meta label value (in the overlay) until a restart of Prometheus or a change in the relabeling configuration happens.

So using the __meta label value works as intended but the information displayed in the overlay is not the latest state of all __meta labels.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 17, 2018

That's a bit of a bug then, especially as the next release makes that information more obvious.

@ajaegle

This comment has been minimized.

Copy link
Author

ajaegle commented Jan 17, 2018

Good to hear that you also see this as kind of a bug. Where can I find more details on the mentioned extension of displaying this information? Please let me know if I can support.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 17, 2018

If you compile master you'll see it under the Status menu.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 19, 2018

@ajaegle if you can give an example for the targets.json - before and after an update it will help to replicate and find the culprit.

@ajaegle

This comment has been minimized.

Copy link
Author

ajaegle commented Jan 26, 2018

To reproduce this problem which is present in 2.0.0 and 2.1.0, use a default Prometheus package with an additional job using file_sd_configs with a file named targets/targets.json.

- job_name: 'services'
  file_sd_configs:
    - files:
      - "targets/targets.json"

Use this content as initial targets configuration when starting Prometheus (as targets/targets.json inside the Prometheus installation)

[
  {
    "Targets": [
      "localhost:9091"
    ],
    "Labels": {
      "__meta_label_target_will_be_refreshed": "false"
    }
  },
  {
    "Targets": [
      "localhost:9092"
    ],
    "Labels": {
      "__meta_label_target_will_be_refreshed": "true"
    }
  }
]

When everything is up, check targets or service discovery view (since 2.1.0) to see that there are two endpoints each having exactly one __meta label besides the __meta_filepath.

image

Copy over the following content into the targets.json where both endpoints have one additional label. The 9091 added one __meta label whereas the 9092 got an additional non-__meta label.

[
  {
    "Targets": [
      "localhost:9091"
    ],
    "Labels": {
      "__meta_label_target_will_be_refreshed": "false",
      "__meta_label_new": "meta-only"
    }
  },
  {
    "Targets": [
      "localhost:9092"
    ],
    "Labels": {
      "__meta_label_target_will_be_refreshed": "true",
      "__non_meta_label_new": "non-meta-label"
    }
  }
]

Refresh the targets / service discovery view to see that only one endpoint has updated labels:

image

Additional info: If the __meta_label_new of 9091 is used in a relabel config such that the list of target labels changes after relabeling, then you will see the first endpoint also updated (due to a new hash of the target labels).

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 26, 2018

@ajaegle thanks that info is super useful , I will try to to find some time to work on this.

@ajaegle

This comment has been minimized.

Copy link
Author

ajaegle commented Jan 26, 2018

@krasi-georgiev Please see the details in my initial post before researching by yourself. The reason why one endpoint is not updated is obviously using the cached version because the hash hasn’t changed. But I’m not sure what the side effects would be if one includes those meta labels into the hash calculation. Probably there was a good reason not to do this...

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 6, 2018

@ajaegle thanks, your info really saved some time 👍 PR is open

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.