[serve] Add component logger + basic access logging #23558

edoakes · 2022-03-29T15:36:02Z

Why are these changes needed?

Adds a "component logger" to standardize logging across the HTTP proxy, controller, and deployment replicas. This logger has the format:
<level> <timestamp> <component> <component_id> - <message>

component will be one of {controller, http_proxy, <deployment_name>} and component_id is the {pid, node_ip, replica_id} for each of these, respectively.

For the HTTP proxy and replicas, it also adds a default access log with a message of the form:
<level> <timestamp> <component> <component_id> - <method> <route> <status> <latency_ms>

The format across HTTP and handle calls is standard, so a single log parser can be used to ingest both. For ServeHandle calls, the method will be HANDLE, the route will be the actor method, and the status will be OK or ERROR.

One more drive-by small change: this PR removes the status printing in serve run because it interleaves with the log messages and is a pretty bad UX. We could consider still monitoring the statuses and printing something if they enter UNHEALTHY, but the controller should log that anyways.

Example

Code:

import ray
from ray import serve

@serve.deployment
class F:
    def do_raise(self, *args):
        raise Exception("oh no")

@serve.deployment
class G:
    def __call__(self, *args):
        return "hi from g"

@serve.deployment
class A:
    def __init__(self, a, b):
        self._a = a
        self._b = b

    async def __call__(self, *args):
        await self._b.remote()
        await self._a.do_raise.remote()


a = A.bind(F.bind(), G.bind())

Logs for a single request:

(ray) eoakes@Edwards-MBP ray % serve run test:a
2022-03-29 08:47:32,860 INFO scripts.py:290 -- Loading app from import path: 'test:a'.
2022-03-29 08:47:34,344 INFO services.py:1460 -- View the Ray dashboard at http://127.0.0.1:8265
(ServeController pid=33746) INFO 2022-03-29 08:47:35,784 controller 33746 - Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=33746) INFO 2022-03-29 08:47:35,889 controller 33746 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
(HTTPProxyActor pid=33512) INFO 2022-03-29 08:40:36,929 http_proxy node:127.0.0.1-0 - GET / 500 9.2ms
(G pid=33515) INFO 2022-03-29 08:40:36,924 G G#WHxwaw - HANDLE __call__ OK 0.1ms
(A pid=33516) INFO 2022-03-29 08:40:36,928 A A#pcZGTw - HANDLE __call__ ERROR 6.2ms
(F pid=33514) INFO 2022-03-29 08:40:36,927 F F#kaJefb - HANDLE do_raise ERROR 0.2ms

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…e-access-log

edoakes · 2022-03-29T15:50:20Z

@simon-mo I probably need to polish this up a bit, but PTAL and leave feedback on the high-level approach.

python/ray/serve/utils.py

simon-mo · 2022-03-29T20:50:21Z

High level looks good. Can we clarify the intended usage now for:

Ray Serve utils libraries (say http utils or checkpoint path)
Ray Serve component (say http proxy, replica, driver, rest api, and component)
User code

jiaodong

the output in PR summary looks awesome

python/ray/serve/controller.py

edoakes · 2022-03-31T18:49:08Z

@jiaodong my working definition of "component" here is a physical process, so all logs from controller/proxy/deployment replicas will have their component listed.

We could consider including the file name / line number if we want to narrow down where the logs come from in each process.

edoakes · 2022-03-31T18:55:16Z

@simon-mo @jiaodong to clarify the logging behavior (and I will add this to the documentation):

All physical components use the component logger, it prefixes log lines with component name, id, etc.
Users should use/configure the "ray.serve" logger inside of replicas.
Utils, subcomponents, etc. (like long poll host or http utils) should use the component logger of their parent component.

edoakes · 2022-03-31T18:58:18Z

@simon-mo @jiaodong I added the file:lineno to the log msg which makes it more clear when using long poll host or utils:

(ServeController pid=47573) INFO 2022-03-31 11:56:45,588 controller 47573 checkpoint_path.py:17 - Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=47573) INFO 2022-03-31 11:56:45,693 controller 47573 http_state.py:110 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
(HTTPProxyActor pid=47575) INFO:     Started server process [47575]
2022-03-31 11:56:47,164 SUCC scripts.py:298 -- Deployed successfully.
(HTTPProxyActor pid=47575) INFO 2022-03-31 11:56:51,536 http_proxy 127.0.0.1 http_proxy.py:318 - GET / 200 3.2ms
(A pid=47577) INFO 2022-03-31 11:56:51,535 A A#sJxQQl test.py:13 - hello eoakes
(A pid=47577) INFO 2022-03-31 11:56:51,535 A A#sJxQQl replica.py:448 - HANDLE __call__ OK 0.4ms
(HTTPProxyActor pid=47575) INFO 2022-03-31 11:57:06,279 http_proxy 127.0.0.1 http_proxy.py:318 - GET / 200 1.8ms
(A pid=47577) INFO 2022-03-31 11:57:06,278 A A#sJxQQl test.py:13 - hello eoakes
(A pid=47577) INFO 2022-03-31 11:57:06,278 A A#sJxQQl replica.py:448 - HANDLE __call__ OK 0.2ms
kj^C2022-03-31 11:57:27,776     INFO scripts.py:306 -- Got KeyboardInterrupt, shutting down...

…e-access-log

edoakes added 7 commits March 25, 2022 15:25

WIP

2ce3045

Merge branch 'master' of https://github.com/ray-project/ray into serv…

5b1c100

…e-access-log

Merge branch 'master' of https://github.com/ray-project/ray into serv…

970f4e7

…e-access-log

WIP

746667d

working!

8d6f8c4

Merge branch 'master' of https://github.com/ray-project/ray into serv…

77e45ef

…e-access-log

fix

bb14b9e

edoakes changed the title ~~[WIP][serve] Add HTTP & handle access logging~~ [WIP][serve] Add component logger + basic access logging Mar 29, 2022

edoakes assigned simon-mo Mar 29, 2022

edoakes requested review from jiaodong and shrekris-anyscale March 29, 2022 15:50

edoakes added 2 commits March 29, 2022 08:58

fix

e90378f

fix release tests

48411d6

simon-mo reviewed Mar 29, 2022

View reviewed changes

python/ray/serve/utils.py Show resolved Hide resolved

jiaodong reviewed Mar 30, 2022

View reviewed changes

python/ray/serve/controller.py Outdated Show resolved Hide resolved

python/ray/serve/controller.py Outdated Show resolved Hide resolved

edoakes added 3 commits March 31, 2022 11:04

add test_logging

10bdf89

fix

0d03ec2

fix

7dd635c

add debug

f855a70

add file and lineno

70b8072

edoakes added 5 commits March 31, 2022 12:01

fix

0d07850

fix

74ed32b

add exception line

cc5904d

add exception log

272b978

don't pass around logger

0e50fe8

edoakes requested review from kfstorm, raulchen and richardliaw as code owners April 7, 2022 18:19

Merge branch 'master' of https://github.com/ray-project/ray into serv…

9fc29ca

…e-access-log

edoakes force-pushed the serve-access-log branch from 674ab1f to 9fc29ca Compare April 7, 2022 18:22

edoakes added 2 commits April 7, 2022 13:31

docs

dc326f9

nit

5746c2e

edoakes removed request for ericl, scv119, jjyao, smorad, stephanie-wang, raulchen, kfstorm, richardliaw, sven1977, gjoliver, clarkzinzow, ArturNiederfahrenhorst, suquark, jovany-wang, avnishn and fishbone April 7, 2022 18:39

edoakes added 6 commits April 7, 2022 16:18

Merge branch 'master' of https://github.com/ray-project/ray into serv…

202434f

…e-access-log

Merge branch 'master' of https://github.com/ray-project/ray into serv…

2e94c54

…e-access-log

fix

d615f2f

Merge branch 'master' of https://github.com/ray-project/ray into serv…

363ac20

…e-access-log

bump timeout

5ba8bb4

skip for now

18cc255

edoakes merged commit de227ac into ray-project:master Apr 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[serve] Add component logger + basic access logging #23558

[serve] Add component logger + basic access logging #23558

edoakes commented Mar 29, 2022 •

edited

edoakes commented Mar 29, 2022

simon-mo commented Mar 29, 2022

jiaodong left a comment

edoakes commented Mar 31, 2022

edoakes commented Mar 31, 2022

edoakes commented Mar 31, 2022 •

edited

[serve] Add component logger + basic access logging #23558

[serve] Add component logger + basic access logging #23558

Conversation

edoakes commented Mar 29, 2022 • edited

Why are these changes needed?

Example

Related issue number

Checks

edoakes commented Mar 29, 2022

simon-mo commented Mar 29, 2022

jiaodong left a comment

Choose a reason for hiding this comment

edoakes commented Mar 31, 2022

edoakes commented Mar 31, 2022

edoakes commented Mar 31, 2022 • edited

edoakes commented Mar 29, 2022 •

edited

edoakes commented Mar 31, 2022 •

edited