Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to lower the timeout for some replication checks #236

Closed
baryluk opened this issue Nov 24, 2021 · 2 comments
Closed

Ability to lower the timeout for some replication checks #236

baryluk opened this issue Nov 24, 2021 · 2 comments
Assignees

Comments

@baryluk
Copy link

baryluk commented Nov 24, 2021

Hi,

we are considering using https://github.com/camptocamp/ipahealthcheck_exporter for monitoring of some of our IPA servers. It uses ipa-healthcheck internally, but I find it extremely slow and not well suited for the real time monitoring task.

One of the reasons is that ipa-healtcheck is non-local.

We would like to run it in a mode, where it does not contact any other servers, and only reports status based on locally available information (i.e. ipa server already knows its own state of the replication, and state of its peers, so ipa-healtcheck does not need to recheck it).

Second but related issue, is that ipa-healthcheck contacts other replicas, but if they are down, it takes about 2 minutes for the checks to timeout, which makes the entire check to take about 8 minutes to finish.

We would like to either disable these remote checks, or reduce timeout to about 10 seconds.

Example of checks that are very slow:

[
...
  {
    "source": "ipahealthcheck.ds.replication",
    "check": "ReplicationCheck",
    "result": "ERROR",
    "uuid": "2b03e49b-7c70-45f7-9c40-3bf3cb7afd8a",
    "when": "20211124100425Z",
    "duration": "261.222873",
    "kw": {
      "key": "DSREPLLE0005",
      "items": [
        "Replication",
        "Agreement"
      ],
      "msg": "The replication agreement (foo.example.com-to-bar.example.com) under \"dc=example,dc=com\" is not in synchronization,\nbecause the consumer server is not reachable."
    }
  },

...
  {
    "source": "pki.server.healthcheck.clones.connectivity_and_data",
    "check": "ClonesConnectivyAndDataCheck",
    "result": "ERROR",
    "uuid": "37858400-a560-4e91-815c-1cc5ed6b0c1c",
    "when": "20211124100214Z",
    "duration": "131.126375",
    "kw": {
      "status": "ERROR:  pki-tomcat : Internal error testing CA clone. Host: bar.example.com Port: 443"
    }
  },
  {
    "source": "pki.server.healthcheck.clones.connectivity_and_data",
    "check": "ClonesConnectivyAndDataCheck",
    "result": "SUCCESS",
    "uuid": "f4d7f33b-76fe-4dee-872d-303753a40c60",
    "when": "20211124100214Z",
    "duration": "131.126410",
    "kw": {
      "instance_name": "pki-tomcat",
      "status": "KRA Clones tested successfully, or not present."
    }
  },
  {
    "source": "pki.server.healthcheck.clones.connectivity_and_data",
    "check": "ClonesConnectivyAndDataCheck",
    "result": "SUCCESS",
    "uuid": "64b9fbd8-adc1-489f-9337-f30c06a565ef",
    "when": "20211124100214Z",
    "duration": "131.126443",
    "kw": {
      "instance_name": "pki-tomcat",
      "status": "OCSP Clones tested successfully, or not present."
    }
  },
  {
    "source": "pki.server.healthcheck.clones.connectivity_and_data",
    "check": "ClonesConnectivyAndDataCheck",
    "result": "SUCCESS",
    "uuid": "4ff3be98-21d9-4a23-8012-bfa9ed63f423",
    "when": "20211124100214Z",
    "duration": "131.126460",
    "kw": {
      "instance_name": "pki-tomcat",
      "status": "TKS Clones tested successfully, or not present."
    }
  },
  {
    "source": "pki.server.healthcheck.clones.connectivity_and_data",
    "check": "ClonesConnectivyAndDataCheck",
    "result": "SUCCESS",
    "uuid": "f400bc3b-d961-4d6e-a159-55ab191ea82b",
    "when": "20211124100214Z",
    "duration": "131.126475",
    "kw": {
      "instance_name": "pki-tomcat",
      "status": "TPS Clones tested successfully, or not present."
    }
  },
...
]

Version: ipa-healthcheck-0.7-6.module_el8.5.0+921+2b5d5825.noarch

@rcritten
Copy link
Collaborator

An RFE for excluding certain sources/check exists, #176
I had always intended to wrap each check with a timer. That is something we can investigate.

@rcritten rcritten self-assigned this Dec 3, 2021
rcritten added a commit to rcritten/freeipa-healthcheck that referenced this issue Dec 3, 2021
A timeout will raise a new exception, TimeoutError. This
can be caught and handled inside an individual check, otherwise
it will be handled by run_plugin.

freeipa#236

Signed-off-by: Rob Crittenden <rcritten@redhat.com>
rcritten added a commit to rcritten/freeipa-healthcheck that referenced this issue Jan 11, 2022
A timeout will raise a new exception, TimeoutError. This
can be caught and handled inside an individual check, otherwise
it will be handled by run_plugin.

freeipa#236

Signed-off-by: Rob Crittenden <rcritten@redhat.com>
rcritten added a commit to rcritten/freeipa-healthcheck that referenced this issue Feb 1, 2022
A timeout will raise a new exception, TimeoutError. This
can be caught and handled inside an individual check, otherwise
it will be handled by run_plugin.

freeipa#236

Signed-off-by: Rob Crittenden <rcritten@redhat.com>
rcritten added a commit that referenced this issue Feb 1, 2022
A timeout will raise a new exception, TimeoutError. This
can be caught and handled inside an individual check, otherwise
it will be handled by run_plugin.

#236

Signed-off-by: Rob Crittenden <rcritten@redhat.com>
@baryluk
Copy link
Author

baryluk commented Feb 9, 2022

@rcritten Thank you a lot for the fix. We are now testing the new freeipa-healtcheck and ipahealthcheck-exporter, and we are getting really good scraping times. 1.95 seconds, instead of ~7 minutes.

We will test with one of the replicas down, and see if it is is reasonably fast still, but it is very promising.

Closing tentatively.

@baryluk baryluk closed this as completed Feb 9, 2022
joeldavidparker added a commit to joeldavidparker/freeipa-healthcheck that referenced this issue Jun 24, 2022
A timeout will raise a new exception, TimeoutError. This
can be caught and handled inside an individual check, otherwise
it will be handled by run_plugin.

freeipa/freeipa-healthcheck#236

Signed-off-by: Rob Crittenden <rcritten@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants