Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] Add component logger + basic access logging #23558

Merged
merged 91 commits into from
Apr 12, 2022

Conversation

edoakes
Copy link
Contributor

@edoakes edoakes commented Mar 29, 2022

Why are these changes needed?

Adds a "component logger" to standardize logging across the HTTP proxy, controller, and deployment replicas. This logger has the format:
<level> <timestamp> <component> <component_id> - <message>

component will be one of {controller, http_proxy, <deployment_name>} and component_id is the {pid, node_ip, replica_id} for each of these, respectively.

For the HTTP proxy and replicas, it also adds a default access log with a message of the form:
<level> <timestamp> <component> <component_id> - <method> <route> <status> <latency_ms>

The format across HTTP and handle calls is standard, so a single log parser can be used to ingest both. For ServeHandle calls, the method will be HANDLE, the route will be the actor method, and the status will be OK or ERROR.

One more drive-by small change: this PR removes the status printing in serve run because it interleaves with the log messages and is a pretty bad UX. We could consider still monitoring the statuses and printing something if they enter UNHEALTHY, but the controller should log that anyways.

Example

Code:

import ray
from ray import serve

@serve.deployment
class F:
    def do_raise(self, *args):
        raise Exception("oh no")

@serve.deployment
class G:
    def __call__(self, *args):
        return "hi from g"

@serve.deployment
class A:
    def __init__(self, a, b):
        self._a = a
        self._b = b

    async def __call__(self, *args):
        await self._b.remote()
        await self._a.do_raise.remote()


a = A.bind(F.bind(), G.bind())

Logs for a single request:

(ray) eoakes@Edwards-MBP ray % serve run test:a
2022-03-29 08:47:32,860 INFO scripts.py:290 -- Loading app from import path: 'test:a'.
2022-03-29 08:47:34,344 INFO services.py:1460 -- View the Ray dashboard at http://127.0.0.1:8265
(ServeController pid=33746) INFO 2022-03-29 08:47:35,784 controller 33746 - Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=33746) INFO 2022-03-29 08:47:35,889 controller 33746 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
(HTTPProxyActor pid=33512) INFO 2022-03-29 08:40:36,929 http_proxy node:127.0.0.1-0 - GET / 500 9.2ms
(G pid=33515) INFO 2022-03-29 08:40:36,924 G G#WHxwaw - HANDLE __call__ OK 0.1ms
(A pid=33516) INFO 2022-03-29 08:40:36,928 A A#pcZGTw - HANDLE __call__ ERROR 6.2ms
(F pid=33514) INFO 2022-03-29 08:40:36,927 F F#kaJefb - HANDLE do_raise ERROR 0.2ms

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@edoakes edoakes changed the title [WIP][serve] Add HTTP & handle access logging [WIP][serve] Add component logger + basic access logging Mar 29, 2022
@edoakes
Copy link
Contributor Author

edoakes commented Mar 29, 2022

@simon-mo I probably need to polish this up a bit, but PTAL and leave feedback on the high-level approach.

@simon-mo
Copy link
Contributor

High level looks good. Can we clarify the intended usage now for:

  • Ray Serve utils libraries (say http utils or checkpoint path)
  • Ray Serve component (say http proxy, replica, driver, rest api, and component)
  • User code

Copy link
Member

@jiaodong jiaodong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the output in PR summary looks awesome

python/ray/serve/controller.py Outdated Show resolved Hide resolved
python/ray/serve/controller.py Outdated Show resolved Hide resolved
@edoakes
Copy link
Contributor Author

edoakes commented Mar 31, 2022

@jiaodong my working definition of "component" here is a physical process, so all logs from controller/proxy/deployment replicas will have their component listed.

We could consider including the file name / line number if we want to narrow down where the logs come from in each process.

@edoakes
Copy link
Contributor Author

edoakes commented Mar 31, 2022

@simon-mo @jiaodong to clarify the logging behavior (and I will add this to the documentation):

  • All physical components use the component logger, it prefixes log lines with component name, id, etc.
  • Users should use/configure the "ray.serve" logger inside of replicas.
  • Utils, subcomponents, etc. (like long poll host or http utils) should use the component logger of their parent component.

@edoakes
Copy link
Contributor Author

edoakes commented Mar 31, 2022

@simon-mo @jiaodong I added the file:lineno to the log msg which makes it more clear when using long poll host or utils:

(ServeController pid=47573) INFO 2022-03-31 11:56:45,588 controller 47573 checkpoint_path.py:17 - Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=47573) INFO 2022-03-31 11:56:45,693 controller 47573 http_state.py:110 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
(HTTPProxyActor pid=47575) INFO:     Started server process [47575]
2022-03-31 11:56:47,164 SUCC scripts.py:298 -- Deployed successfully.
(HTTPProxyActor pid=47575) INFO 2022-03-31 11:56:51,536 http_proxy 127.0.0.1 http_proxy.py:318 - GET / 200 3.2ms
(A pid=47577) INFO 2022-03-31 11:56:51,535 A A#sJxQQl test.py:13 - hello eoakes
(A pid=47577) INFO 2022-03-31 11:56:51,535 A A#sJxQQl replica.py:448 - HANDLE __call__ OK 0.4ms
(HTTPProxyActor pid=47575) INFO 2022-03-31 11:57:06,279 http_proxy 127.0.0.1 http_proxy.py:318 - GET / 200 1.8ms
(A pid=47577) INFO 2022-03-31 11:57:06,278 A A#sJxQQl test.py:13 - hello eoakes
(A pid=47577) INFO 2022-03-31 11:57:06,278 A A#sJxQQl replica.py:448 - HANDLE __call__ OK 0.2ms
kj^C2022-03-31 11:57:27,776     INFO scripts.py:306 -- Got KeyboardInterrupt, shutting down...

@edoakes edoakes merged commit de227ac into ray-project:master Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet