Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate router stats port #14759

Closed
kwoodson opened this issue Jun 19, 2017 · 9 comments
Closed

duplicate router stats port #14759

kwoodson opened this issue Jun 19, 2017 · 9 comments

Comments

@kwoodson
Copy link
Contributor

kwoodson commented Jun 19, 2017

While debugging bugzilla ( https://bugzilla.redhat.com/show_bug.cgi?id=1455459 ) I discovered that there is an issue with the oc adm router --dry-run -o yaml and the generated stats ports.

When generating routers we pass in the data that we expect when building and use the dry-run feature to determine if what is in memory is different than what we are currently running. This method works great.

In openshift 3.6, something changed with the generation of the dry-run data. It outputs multiple stats ports.

          ports:
          - containerPort: 80
          - containerPort: 443
          - containerPort: 1939
            name: stats
            protocol: TCP
          - containerPort: 1935
            name: router-stats
            protocol: TCP

When I dry-run the router creation with the --stats-port=1939 I get the expected output of the stats port being on 1939. The oddity is that I get a new router-stats port listening on 1935. No matter what is passed to the router command this is always produced. When running multiple routers this breaks as the 1935 is always taken by the first router to land.

This can be reproduced by the following command:
oc adm router router1 --expose-metrics=False --external-host-insecure=False --host-network=True --images=registry.ops.openshift.com/openshift3/ose-${component}:${version} --latest-images=False --ports=80:80,443:443 --replicas=1 --selector=type=infra --service-account=router --stats-port=1938 --dry-run=True -o yaml

Version

3.6.116

Steps To Reproduce
  1. oc adm router router1 --stats-port 1939 --dry-run -o yaml
  2. Verify there are 2 separate stats ports created.
Current Result

There are two separate stats ports available for stats.

Expected Result

There should be a single stats port.

Additional Information

It appears that the stats port and the health check port were shared on port 1936. With the introduction to prometheus the stats port changed to defaultstatsport - 1 which puts it on 1935.

It looks like this code was added:
https://github.com/openshift/origin/blame/master/pkg/cmd/admin/router/router.go#L684-L691

It would appear that the new code needs to reconcile the old stats port located here:
https://github.com/openshift/origin/blame/master/pkg/cmd/admin/router/router.go#L534-L543

The new router-stats should be removed and the original router stats should be updated. It should honor the command line parameter to stats as well.

@juanvallejo juanvallejo self-assigned this Jun 19, 2017
@juanvallejo
Copy link
Contributor

cc @smarterclayton

@smarterclayton
Copy link
Contributor

We generate two stats ports - one for haproxy, and one that is part of the haproxy prometheus stats. So there are actually two ports.

@kwoodson
Copy link
Contributor Author

@smarterclayton, @juanvallejo,

Can we make the prometheus stats port configurable? By requiring the port be 1935, this guarantees only a single router instance running on a node. This is problematic in a multi-router sharded environment.

@kwoodson kwoodson reopened this Jun 20, 2017
@smarterclayton
Copy link
Contributor

smarterclayton commented Jun 20, 2017 via email

@smarterclayton
Copy link
Contributor

Proxying.

@smarterclayton
Copy link
Contributor

Ok, here's what I'm going to do:

  1. if you ask for new style metrics (which oadm router will by default), i'm going to turn off the stats port.
  2. i'm going to change the reload script to use a better check for accessing the router
  3. i'm going to have the new healthz endpoint in the router stats perform a proxy health check to the router process, or lean on the stats endpoint to do it
  4. i'm going to change the health checks from oadm router back to be http probes, since openshift-router won't have the connection limits and is in a different process

@smarterclayton
Copy link
Contributor

@pecameron @knobunc FYI

@smarterclayton
Copy link
Contributor

Moving this to 3.6.0 and p1 because this has API compatibility concerns if we ship it.

@smarterclayton
Copy link
Contributor

Please see #14790 and comment - i think this preserves compatibility with old configs, custom configs (within reason), and reduces the number of overlapping flags (listen-addr overrides stats port, asking for new style metrics disables old stats port, specific handling of template router health checks from the openshift-router process)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants