Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Add logger with backend and replica tags #14251

Merged
merged 11 commits into from
Feb 26, 2021

Conversation

architkulkarni
Copy link
Contributor

@architkulkarni architkulkarni commented Feb 22, 2021

Why are these changes needed?

This PR adds a new API method serve.get_backend_logger(). When called from a backend, it returns a logger which prepends the backend tag and replica tag to each log line.

class Counter:
    def __init__(self):
        self.logger = serve.get_backend_logger()
        self.count = 0

    def __call__(self, request):
        self.count += 1
        self.logger.info(f"count: {self.count}")
        return {"count": self.count}

The PR also includes an end-to-end tutorial for filtering Ray logs by backend tag using Loki and Grafana.
loki

Related issue number

Closes #13917
Closes #13916

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@simon-mo
Copy link
Contributor

I wonder if this can a bit more magical and easier to use:

logger = logging.getLogger("serve")

class MyBackend:
    def __call__(self, ..):
        logger.info("...")

@architkulkarni
Copy link
Contributor Author

That would be really cool @simon-mo, do you have an idea of how to implement that? I guess we would need to wrap the standard python logger and implement each method .info(), .debug(), etc, which seems complicated but maybe isn't that bad.

Comment on lines 214 to 224
For an general overview of logging in Ray, see `Ray Logging <../ray-logging.html>`__.

When looking through log files of your Ray Serve application, it is useful to know which backend and replica each log line originated from.
To automatically tag your logs with the current backend tag and replica tag (of the form ``backend_tag#<random letters>``), use the following function:

.. autofunction:: ray.serve.get_backend_logger

To write your own custom logger using Python's ``logging`` package, you may find the following two methods useful:

.. autofunction:: ray.serve.get_current_backend_tag
.. autofunction:: ray.serve.get_current_replica_tag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a subsection here saying "configuring logger" or something, then another below that says "Loki Tutorial"


Ray Serve logs can be ingested by your favorite external logging agent. Ray logs from the current session are exported to the directory `/tmp/ray/session_latest/logs` and remain there until the next session starts.

Here is a quick walkthrough of how to explore and filter your logs using `Loki <https://grafana.com/oss/loki/>`__.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note here that we'll set it up manually but it's much easier to configure on kubernetes. Actually, it might be worth having another tutorial on k8s. We can do that in the future.

@@ -1012,6 +1012,29 @@ def f(starlette_request):
}


def test_backend_logger(serve_instance):
# Tests that that backend logger can be created and used without errors.
# Does not test the correctness of the log output.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth testing this to avoid regressions. Can we redirect the logger to a tempfile and read that file or something similar?

@simon-mo
Copy link
Contributor

@architkulkarni the heavily lifting is already done by ray_logging.py wrote by @rkooo567.

You can just add this

diff --git a/python/ray/serve/backend_worker.py b/python/ray/serve/backend_worker.py
index 5740cf4f5..977cde45b 100644
--- a/python/ray/serve/backend_worker.py
+++ b/python/ray/serve/backend_worker.py
@@ -1,4 +1,5 @@
 import asyncio
+import logging
 import traceback
 import inspect
 from collections.abc import Iterable
@@ -258,6 +259,13 @@ class RayServeReplica:
 
         self.restart_counter.record(1)
 
+        ray_logger = logging.getLogger("ray")
+        for handler in ray_logger.handlers:
+            handler.setFormatter(
+                logging.Formatter(
+                    handler.formatter._fmt +
+                    f" component=serve backend={self.backend_tag} replica={self.replica_tag}"
+                ))
         asyncio.get_event_loop().create_task(self.main_loop())
 
     def get_runner_method(self, request_item: Query) -> Callable:

and ask user to use the ray logger:

logger = logging.getLogger("ray")

class Counter:
    def __init__(self):
        self.count = 0

    def __call__(self, request):
        self.count += 1
        logger.info(f"count: {self.count}")
        return {"count": self.count}

the context can be automatically filled in.

...
(pid=39154) 2021-02-22 17:19:31,201	INFO a.py:17 -- count: 10 component=serve backend=my_backend replica=my_backend#BmnjUg

This eliminate the need for the explicit get_backend_logger and make user's code cleaner.

Additionally, I think we should put get_backend_tag and others behind a context API. Something like serve.get_context().backend_tag or something.

Copy link
Contributor

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure to make the following changes.

  • Use ray logger
  • move get_current_x -> some sort of context API


def get_backend_context():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edoakes thoughts on this API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for context, I based it off the API for get_runtime_context() in Ray Core: https://docs.ray.io/en/master/package-ref.html#runtime-context-apis.

Comment on lines +262 to +269
ray_logger = logging.getLogger("ray")
for handler in ray_logger.handlers:
handler.setFormatter(
logging.Formatter(
handler.formatter._fmt +
f" component=serve backend={self.backend_tag} "
f"replica={self.replica_tag}"))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkooo567 I want to make sure this ok for the ray logging stack. Thoughts?

backend_tag = _INTERNAL_REPLICA_CONTEXT.backend_tag
replica_tag = _INTERNAL_REPLICA_CONTEXT.replica_tag

return BackendContext(backend_tag, replica_tag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just merge InternalReplicaContext and BackendContext? You can make the controller_name a private field _internal_controller_name or something. So we don't have an extra data structure that's just a subset of InternalReplicaContext.

additionally, naming wise, this is ReplicaContext, not BackendContext because it contains the replica_tag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that makes a lot of sense

@architkulkarni architkulkarni added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 25, 2021
@architkulkarni
Copy link
Contributor Author

Let me also move the doc image from this repo to the external images repo.

@edoakes edoakes assigned simon-mo and unassigned edoakes Feb 25, 2021
@architkulkarni architkulkarni added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Feb 26, 2021

.. code-block:: bash

INFO -- Some info! component=serve backend=my_backend replica=my_backend#krcwoa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update this? it should print ray logging style now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good catch. Updated

@simon-mo simon-mo merged commit f9364b1 into ray-project:master Feb 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
3 participants