Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upmarathon_sd_config broken in 2.5.0 (using servicePort instead of actual port) #4855
Comments
This comment has been minimized.
This comment has been minimized.
|
#4499 is probably the culprit but I'm not familiar enough with Marathon to understand what can/should be done. |
This comment has been minimized.
This comment has been minimized.
|
Reverting to v2.4.3 solves this, btw.
…On Mon, Nov 12, 2018, 1:12 PM Simon Pasquier ***@***.***> wrote:
#4499 <#4499> is probably
the culprit but I'm not familiar enough with Marathon to understand what
can/should be done.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4855 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMYqAHF9AzmCK92WqFZSvTFD9k4tHYzIks5uuVe3gaJpZM4YZTWd>
.
|
simonpasquier
added
kind/bug
component/service discovery
labels
Nov 13, 2018
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
hi @rjanovski, thanks for the report. This is indeed related to #4499. I'm missing some info in your report, but I can take a good guess at the cause of the problem: before 2.5.0, only services using host networking were discovered. 2.5.0 introduced support for all currently possible network configurations: host, bridge and container networking, using both Your scrape configuration looks overly broad; it will assume all discovered ports in the cluster are supposed to be scraped. While this might correct in your environment (only scraping host-networked services), with 2.5.0 this assumption no longer holds. I've added an example Prometheus configuration to the repo a while ago at https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-marathon.yml. Both the Marathon app definition and the Prometheus configuration are documented there, please take a look and report back here. @simonpasquier should this be added to some kind of release notes? The documentation was already extended, but every user blindly upgrading will run into this issue. Not sure what the project's way of dealing with this is. |
This comment has been minimized.
This comment has been minimized.
|
@rjanovski: are you able to update your configuration with the details provided by @ti-mo? Regarding release notes, it was mentioned as |
This comment has been minimized.
This comment has been minimized.
adrien-f
commented
Nov 19, 2018
•
|
Greetings, Upgraded to 2.5 today expecting nothing to change even with those changes but the SD picks up the wrong port and uses the ServicePort instead of the task's. We're using Marathon 1.5.6. The examples given doesn't work because the SD doesn't even see the task's ports and so I cannot rewrite them. |
This comment has been minimized.
This comment has been minimized.
|
@adrien-f thanks for the report! As I wrote before, I'm not using Marathon and I don't think that any of the Prometheus maintainers is familiar with it either. Maybe you can share the output of @ti-mo any thouhgts? |
This comment has been minimized.
This comment has been minimized.
adrien-f
commented
Nov 19, 2018
•
|
@simonpasquier ofc, this was my next step as I was looking through the PR to see what could be done ! Here's one task: Click to expand!{
"app": {
"id": "$APPID",
"backoffFactor": 1.15,
"backoffSeconds": 1,
"container": {
"type": "DOCKER",
"docker": {
"forcePullImage": true,
"image": "$DOCKER_IMAGE",
"privileged": false
},
"volumes": [],
"portMappings": [
{
"containerPort": 9503,
"hostPort": 0,
"labels": {},
"protocol": "tcp",
"servicePort": 10003
}
]
},
"cpus": 0.5,
"disk": 0,
"env": {},
"executor": "",
"healthChecks": [],
"instances": 1,
"labels": {
"prometheus": "true"
},
"maxLaunchDelaySeconds": 3600,
"mem": 128,
"gpus": 0,
"networks": [
{
"mode": "container/bridge"
}
],
"requirePorts": false,
"upgradeStrategy": {
"maximumOverCapacity": 1,
"minimumHealthCapacity": 1
},
"version": "2018-11-12T09:49:05.172Z",
"versionInfo": {
"lastScalingAt": "2018-11-12T09:49:05.172Z",
"lastConfigChangeAt": "2018-11-08T16:57:23.245Z"
},
"killSelection": "YOUNGEST_FIRST",
"unreachableStrategy": {
"inactiveAfterSeconds": 0,
"expungeAfterSeconds": 0
},
"tasksStaged": 0,
"tasksRunning": 1,
"tasksHealthy": 1,
"tasksUnhealthy": 0,
"deployments": [],
"tasks": [
{
"ipAddresses": [
{
"ipAddress": "172.17.0.2",
"protocol": "IPv4"
}
],
"stagedAt": "2018-11-19T09:04:41.778Z",
"state": "TASK_RUNNING",
"ports": [
31978
],
"startedAt": "2018-11-19T09:04:42.979Z",
"version": "2018-11-12T09:49:05.172Z",
"id": "$TASKID",
"appId": "$APPID",
"slaveId": "541512ae-594b-453b-bb09-01c62fda2e1a-S3",
"host": "$SLAVEHOST",
"healthCheckResults": []
}
],
"lastTaskFailure": {}
}
}I would expect to have a target of http://$SLAVEHOST:31978/metrics with this task. Marathon is a weird beast I agree, let me know if you need more info from me, I'll also keep looking into the code. |
This comment has been minimized.
This comment has been minimized.
|
@adrien-f Thanks for the report, this is indeed a significant case I've overlooked. I'll come with a fix tomorrow. Turns out As is usually the case with this family of products, it's all exceedingly hairy. I'll figure it out, sorry for any inconvenience caused. |
This comment has been minimized.
This comment has been minimized.
adrien-f
commented
Nov 20, 2018
•
|
@ti-mo it's okay, between users of Marathon we must support each other This is terribad but in the meantime I hardcoded a fix, waiting for your new version. I'll review and test it. index 32b9824b..40f21b80 100644
--- a/discovery/marathon/marathon.go
+++ b/discovery/marathon/marathon.go
@@ -503,7 +503,7 @@ func targetEndpoint(task *Task, port uint32, containerNet bool) string {
host = task.Host
}
- return net.JoinHostPort(host, fmt.Sprintf("%d", port))
+ return net.JoinHostPort(host, fmt.Sprintf("%d", task.Ports[0]))
} |
This comment has been minimized.
This comment has been minimized.
|
no worries @ti-mo :) @simonpasquier , Thanks for getting this fixed, guys! Click to expand
|
rjanovski commentedNov 12, 2018
•
edited
Bug Report
What did you do?
tried to configure prometheus to scan marathon services
What did you expect to see?
services being scanned on their correct port
What did you see instead? Under which circumstances?
services were scanned using their "servicePort" with is not exposed and hence did not return any result
Environment
dcos 2.10.2
System information:
Prometheus version:
Prometheus configuration file:
Click to expand
Click to expand