-
-
Notifications
You must be signed in to change notification settings - Fork 158
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #228 from nolar/liveliness
Liveliness/readiness probes
- Loading branch information
Showing
14 changed files
with
378 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,6 +31,7 @@ Kopf: Kubernetes Operators Framework | |
hierarchies | ||
async | ||
loading | ||
probing | ||
peering | ||
scopes | ||
errors | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
============= | ||
Health-checks | ||
============= | ||
|
||
Kopf provides a minimalistic HTTP server to report its health status. | ||
|
||
|
||
Liveness endpoints | ||
================== | ||
|
||
By default, no endpoint is configured, and no health is reported. | ||
To specify an endpoint to listen for probes, use :option:`--liveness`: | ||
|
||
.. code-block:: bash | ||
kopf run --liveness=http://:8080/healthz --verbose handlers.py | ||
Currently, only HTTP is supported. | ||
Other protocols (TCP, HTTPS) can be added in the future. | ||
|
||
|
||
Kubernetes probing | ||
================== | ||
|
||
This port and path can be used in a liveness probe of the operator's deployment. | ||
If the operator does not respond for any reason, Kubernetes will restart it. | ||
|
||
.. code-block:: yaml | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
spec: | ||
template: | ||
spec: | ||
containers: | ||
- name: the-only-one | ||
image: ... | ||
livenessProbe: | ||
httpGet: | ||
path: /healthz | ||
port: 8080 | ||
.. seealso:: | ||
|
||
Kubernetes manual on `liveness and readiness probes`__. | ||
|
||
__ https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/ | ||
|
||
.. seealso:: | ||
|
||
Please be aware of the readiness vs. liveness probing. | ||
In case of operators, readiness probing makes no practical sense, | ||
as operators do not serve traffic under the load balancing or with services. | ||
Liveness probing can help in disastrous cases (e.g. the operator is stuck), | ||
but will not help in case of partial failures (one of the API calls stuck). | ||
You can read more here: | ||
https://srcco.de/posts/kubernetes-liveness-probes-are-dangerous.html | ||
|
||
.. warning:: | ||
|
||
Make sure that one and only one pod of an operator is running at a time, | ||
especially during the restarts --- see :doc:`deployment`. | ||
|
||
|
||
Probe handlers | ||
============== | ||
|
||
The content of the response is empty by default. It can be populated with | ||
probing handlers: | ||
|
||
.. code-block:: python | ||
import datetime | ||
import kopf | ||
import random | ||
@kopf.on.probe(id='now') | ||
def get_current_timestamp(**kwargs): | ||
return datetime.datetime.utcnow().isoformat() | ||
@kopf.on.probe(id='random') | ||
def get_random_value(**kwargs): | ||
return random.randint(0, 1_000_000) | ||
The probe handlers will be executed on the requests to the liveness URL, | ||
and cached for a reasonable period of time to prevent overloading | ||
by mass-requesting the status. | ||
|
||
The handler results will be reported as the content of the liveness response: | ||
|
||
.. code-block:: console | ||
$ curl http://localhost:8080/healthz | ||
{"now": "2019-11-07T18:03:52.513803", "random": 765846} | ||
.. note:: | ||
Liveless status report is simplistic and minimalistic at the moment. | ||
It only reports success if the health-reporting task runs at all. | ||
It can happen so that some of the operator's tasks, threads, or streams | ||
do break, freeze, or become unresponsive, while the health-reporting task | ||
continues to run. The probability of such case is low, but not zero. | ||
|
||
There are no checks that operator actually operates anything | ||
(unless they are implemented explicitly with the probe-handlers), | ||
as there are no reliable criteria for that -- total absence of handled | ||
resources or events can be an expected state of the cluster. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
import asyncio | ||
import datetime | ||
import logging | ||
import urllib.parse | ||
from typing import Optional, Tuple, MutableMapping | ||
|
||
import aiohttp.web | ||
|
||
from kopf.reactor import causation | ||
from kopf.reactor import handling | ||
from kopf.reactor import lifecycles | ||
from kopf.reactor import registries | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
LOCALHOST: str = 'localhost' | ||
HTTP_PORT: int = 80 | ||
|
||
_Key = Tuple[str, int] # hostname, port | ||
|
||
|
||
async def health_reporter( | ||
endpoint: str, | ||
*, | ||
registry: registries.OperatorRegistry, | ||
ready_flag: Optional[asyncio.Event] = None, # used for testing | ||
) -> None: | ||
""" | ||
Simple HTTP(S)/TCP server to report the operator's health to K8s probes. | ||
Runs forever until cancelled (which happens if any other root task | ||
is cancelled or failed). Once it will stop responding for any reason, | ||
Kubernetes will assume the pod is not alive anymore, and will restart it. | ||
""" | ||
probing_container: MutableMapping[registries.HandlerId, registries.HandlerResult] = {} | ||
probing_timestamp: Optional[datetime.datetime] = None | ||
probing_max_age = datetime.timedelta(seconds=10.0) | ||
probing_lock = asyncio.Lock() | ||
|
||
async def get_health( | ||
request: aiohttp.web.Request, | ||
) -> aiohttp.web.Response: | ||
nonlocal probing_timestamp | ||
|
||
# Recollect the data on-demand, and only if is is older that a reasonable caching period. | ||
# Protect against multiple parallel requests performing the same heavy activity. | ||
now = datetime.datetime.utcnow() | ||
if probing_timestamp is None or now - probing_timestamp >= probing_max_age: | ||
async with probing_lock: | ||
now = datetime.datetime.utcnow() | ||
if probing_timestamp is None or now - probing_timestamp >= probing_max_age: | ||
|
||
activity_results = await handling.activity_trigger( | ||
lifecycle=lifecycles.all_at_once, | ||
registry=registry, | ||
activity=causation.Activity.PROBE, | ||
) | ||
probing_container.clear() | ||
probing_container.update(activity_results) | ||
probing_timestamp = datetime.datetime.utcnow() | ||
|
||
return aiohttp.web.json_response(probing_container) | ||
|
||
parts = urllib.parse.urlsplit(endpoint) | ||
if parts.scheme == 'http': | ||
host = parts.hostname or LOCALHOST | ||
port = parts.port or HTTP_PORT | ||
path = parts.path | ||
else: | ||
raise Exception(f"Unsupported scheme: {endpoint}") | ||
|
||
app = aiohttp.web.Application() | ||
app.add_routes([aiohttp.web.get(path, get_health)]) | ||
|
||
runner = aiohttp.web.AppRunner(app, handle_signals=False) | ||
await runner.setup() | ||
|
||
site = aiohttp.web.TCPSite(runner, host, port, shutdown_timeout=1.0) | ||
await site.start() | ||
|
||
# Log with the actual URL: normalised, with hostname/port set. | ||
url = urllib.parse.urlunsplit([parts.scheme, f'{host}:{port}', path, '', '']) | ||
logger.debug("Serving health status at %s", url) | ||
if ready_flag is not None: | ||
ready_flag.set() | ||
|
||
try: | ||
# Sleep forever. No activity is needed. | ||
await asyncio.Event().wait() | ||
finally: | ||
# On any reason of exit, stop reporting the health. | ||
await asyncio.shield(runner.cleanup()) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.