Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose router stats port when prometheus is installed #6636

Closed
wants to merge 3 commits into from

Conversation

davidaah
Copy link
Contributor

@davidaah davidaah commented Jan 6, 2018

Today prometheus is configured out of the box to try and scrape the openshift hosted router stats/metrics endpoint however it is unable to do so because the firewall on each physical node is not permitting this port to exposed.

This change exposes the firewall port when prometheus is installed (default here is True corresponding to openshift_prometheus_state default state of present )

Currently getting: Get http://<ip of node hosting router>:1936/metrics: dial tcp <ip of node hosting router>:1936: getsockopt: no route to host

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1552235

@openshift-ci-robot
Copy link

Hi @davidaah. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 6, 2018
@rh-atomic-bot
Copy link

Can one of the admins verify this patch?
I understand the following commands:

  • bot, add author to whitelist
  • bot, test pull request
  • bot, test pull request once

@@ -26,6 +26,7 @@ openshift_cluster_domain: 'cluster.local'
##########
r_openshift_hosted_router_firewall_enabled: "{{ os_firewall_enabled | default(True) }}"
r_openshift_hosted_router_use_firewalld: "{{ os_firewall_use_firewalld | default(False) }}"
r_openshift_hosted_router_prometheus_enabled: "{{ openshift_prometheus_state == 'present' | default(True) }}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to reviewer, this may need openshift_prometheus_state and openshift_prometheus_state == 'present' but wasn't sure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably need to verify openshift_prometheus_state is defined. openshift-hosted playbook could run independently of prometheus

Copy link
Contributor Author

@davidaah davidaah Jan 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be fixed now

@davidaah
Copy link
Contributor Author

@zgalor @michaelgugino would you be able to review this?

@michaelgugino
Copy link
Contributor

I'm not sure that this is the correct way to expose the stats port. This would implement a firewall rule on the first_master; this workload might be scheduled any place. @sdodson thoughts?

@davidaah
Copy link
Contributor Author

davidaah commented Jan 10, 2018

yea, ultimately it needs to hit every server that the router will be deployed on. i believe this should be a well known entity by leveraging the host labels indicating router deployment and it should be static unless a user adds another host with the router specific label

update: @michaelgugino after additional review, I agree it does seem the way firewall.yml is invoked from router.yml makes the firewall tasks not execute against the hosts where the router or registry is actually living. unfortunately, i am not sure what exactly the original intent of the firewall.yml but to use it in this way, I would need to re-factor it out as a separate task in the top level hosted router playbook which only runs on nodes which have the host var corresponding to the router node selector used by the daemonset

@davidaah
Copy link
Contributor Author

one thing that may be of potential concern is that this type of change would expose the stats port outside of the cluster (this may be somewhat unavoidable given that prometheus targets are set to the public IPs of the routers/router nodes)

@Klaas-
Copy link
Contributor

Klaas- commented Oct 26, 2018

Is there any progress on this? Prometheus just got to GA and this seems like it should be fixed for GA :)
Have you thought about opening the metrics port for all node ips as source addresses instead of opening it for everyone?

@vrutkovs
Copy link
Member

This is no longer needed - in 3.11 routers is configured to expose metrics and prometheus operator automatically configures it as a target

Please reopen this PR for release-3.10 is needed

@vrutkovs vrutkovs closed this Oct 26, 2018
@Klaas-
Copy link
Contributor

Klaas- commented Nov 5, 2018

Hi,
I can say this is not working with 3.11 (ose).

I made a clean install and it still fails to access all routers in a multi-router setup.

It can only access one router -- my guess: the one thats running on the same node as prometheus.

Greetings
Klaas

megian pushed a commit to vshn/openshift-ansible that referenced this pull request Nov 5, 2018
The PR 6636 has been closed on the officially repository, but the port
issue is not resolved.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants