Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging middleware #8

Closed
aturon opened this issue Nov 8, 2018 · 33 comments
Closed

Logging middleware #8

aturon opened this issue Nov 8, 2018 · 33 comments
Assignees
Labels
feature A feature that's ready for implementation good first issue Good for newcomers
Milestone

Comments

@aturon
Copy link
Collaborator

aturon commented Nov 8, 2018

Design and implement middleware for logging, drawing inspiration from (or even re-using) that from existing Rust web frameworks.

@aturon aturon added good first issue Good for newcomers feature A feature that's ready for implementation labels Nov 8, 2018
@bIgBV
Copy link
Contributor

bIgBV commented Nov 9, 2018

I'd like to take a crack at it, it seems like a good way to get to know the codebase and also a chance to understand how other rust frameworks work.

@yoshuawuyts
Copy link
Member

yoshuawuyts commented Nov 9, 2018

Some prior art in framework loggers:

edit (09/11/2018): I suspect tower-web is looking into building on top of tokio-trace, which would introduce sample-based logging, which seems similar to the system described in this paper. (Personally I think this would be super cool to support, but that's probably a few steps beyond what we'd want to start with, haha).


Some requirements for a complete logging solution I'm seeing:

  • Have structured logs, so the format can be created with a consistent schema, which helps when integrating with log aggregators.
  • Be able to switch between stdout, and syslog output, probably defaulting to stdout. With a future iteration also including journald output. This should provide a good balance between compatibility and performance.
  • Ideally decoupling the logger instantiation from the framework-specific logger. This would allow configuring the logger instantiation through command line modules such as clap-log-flag.

I think with these requirements in mind, basing a logger off the slog crate would probably make a lot of sense. We could coordinate with the CLI WG, and make sure there's a slog-compatible CLI frontend created also.

@aturon
Copy link
Collaborator Author

aturon commented Nov 9, 2018

@bIgBV great, please do! We can use this thread to coordinate any further design questions -- and thanks much @yoshuawuyts for the preliminary research! <3

@bIgBV
Copy link
Contributor

bIgBV commented Nov 9, 2018

@yoshuawuyts thank you so much for the great literature. I've got a nice 4 hour flight ahead of me, so I'm going to spend some time reading the paper and going through the codebases of the frameworks you mentioned here.

@aturon yeah, I was curious about where I can ask further questions, but using the issue would be a good idea for having a reference in the future if we ever want to revisit a certain aspect of the design

@KodrAus
Copy link
Contributor

KodrAus commented Nov 11, 2018

Tangentially related: we're trying to get some kind of structured logging into the log crate. It's still very much in the design phase though.

I think besides structured logging one of the big things for a web framework is contextual logging so that a correlation id can be associated with each request and used for later aggregation. One interesting design in this space is ASP.NET Core's hierarchical request ids.

The choice of logging framework might affect how loggers are exposed deep into web applications. For example it might not always be possible to explicitly pass a logger through to the callsite.

@bIgBV
Copy link
Contributor

bIgBV commented Nov 16, 2018

As I've been thinking of the design for logging, I'm just writing out some thoughts here.

One thing that came to mind is the difference between configuration between different environments in which the application is run. Coming from Django, the conventional way of doing it is having different settings which use the dictConfig format. In fact, Django exposes a plethora of knobs to fine tune the behavior. These settings live in different files which are loaded based on which environment the app is running in.

Of the frameworks mentioned in this thread, only rocket seems to have of setting up configuration based on the environment. This is something we need to consider for Tide as well. How will we configure the various settings used by a web framework? This would ideally be the place to have sane default logging options set up.

The next thing I was considering is how will logging hook into the request lifecycle. Here there seem to be different approaches employed by the frameworks I went through.

Middleware

Actix and Gotham seem to hook into the request lifecycle and use a middleware to log out request information to the terminal. From what I can tell, with actix you have to instantiate env_logger and then add a middleware. It provides a way to set a custom format for the logs themselves. But by having to directly instantiate env_logger users have to think about this configuration from the begging itself. Gotham on the other hand uses the Log crate and only provides configurable log levels. There are two options for the kinds of formats.

Warp too seems similar in this regard. It doesn't have a middleware in the traditional sense, but the logging is done by instantiating a log object which is passed as a filter. This wraps a function which logs request and response information passed to it by the framework.

I think actix's approach is good, but it could be better. A few things that I find not so good with this are:

  • You have to think about log configuration even when you are just getting started. This personally feels like an ergonomics issue for me.
  • A hard dependency on env_logger means that if someone wants something more customizable, you have to write your own middleware.
  • The configuration of logging is now a part of the application code. I feel like the web framework configuration should be it's own concern.

Custom loggers

Rocket in contrast doesn't use middleware for logging, even though when first writing a rocket application you see logs being written to the terminal. From what I can tell, rocket wraps over the Log crate and provides the user macros for logging and setting the log level for different environments as a part of the application configuration. Out of all the rust web frameworks I've seen this is the most flexible option as it provides with some default while providing with a way of adding custom instrumentation to the code.

This implementation comes closest to the configuration flexibility that is present in django which has defaults set up and provides a way of configuring various aspects of logging based on the level and environment. It also has support for logging to different backends and custom formats.

Conclusions

This turned out to be far longer than I anticipated, but these are some final thoughts.

The slog crate comes closest to providing the most flexible features that a modern web framework would need. Right now, it seems to be the best choice to use when you want to instrument you application code.

I have the following questions in mind.

  • How do we handle configuration.
  • How much configuration control do we want to provide to the

I think that the answer to these two questions will largely contribute to how the logging will look in Tide.

If you're read till here then thank you for humoring me! Please let me know your thoughts and any misconceptions I might have :)

@bIgBV
Copy link
Contributor

bIgBV commented Nov 18, 2018

Expanding on the previous message, I think a good starting point is to first have a private integration with slog, where we create a root logger and a corresponding logging middleware which is created by default when a new App instance is initialized.

Following this, we can provide macros/functions to perform logging which wrap a child logger of the root logger for any user to import into their code. To provide for flexibility, we can expose a Logger struct and a corresponding builder for the struct which would enable the user to fine tune logging if they so need.

We can iterate upon this further to consider broader questions such as app configuration and make appropriate changes. But since Tide is the one wrapping slog and providing an API for users, we can control the level of tuning and the API to expose.

@KodrAus
Copy link
Contributor

KodrAus commented Nov 18, 2018

I think a good starting point is to first have a private integration with slog, where we create a root logger and a corresponding logging middleware which is created by default when a new App instance is initialized.

Getting started with a private integration sounds like a good way to go to me 👍 There are also benefits to just picking a framework and bundling it though. If we didn't want to expose the logging framework directly then I imagine we might eventually end up producing a bridging API that frameworks would conform to so the user could decide exactly how they want to integrate logging into their application and Tide can just work with whatever its given. I don't think our logging ecosystem is mature enough to make those kinds of bets yet though. The wrapped logger you're proposing could evolve in that direction.

Following this, we can provide macros/functions to perform logging which wrap a child logger of the root logger for any user to import into their code.

I think it would also be good to be able to capture logs produced by the log crate by default too. slog can do this. Logging is a cross-cutting concern so I would personally be a little skeptical of pulling a dependency on a web framework or a web framework's specific logging infrastructure down to the lowest reaches of my application to support logging.

@bIgBV
Copy link
Contributor

bIgBV commented Nov 18, 2018

Logging is a cross-cutting concern so I would personally be a little skeptical of pulling a dependency on a web framework or a web framework's specific logging infrastructure down to the lowest reaches of my application to support logging.

Yeah, that definitely makes sense. I've begun work on getting a very simple logging middleware using slog integrated into the framework codebase. I should have a PR open for it soon. Once the base is in place, we could have more discussion around the external API.

@bIgBV
Copy link
Contributor

bIgBV commented Nov 20, 2018

I was thinking about the the way to add logging internal to the framework. One design idea I have is to to store the logger in the App struct. This was the RootLogger can be used wherever the App struct is passed to. But I don't think it's the best idea.

Any thoughts?

@bIgBV
Copy link
Contributor

bIgBV commented Nov 21, 2018

So I think I found out how to capture log statements from the log crate, see here. So if set up correctly, this will provide two way interoperability between log and slog making for a really flexible API.

@KodrAus
Copy link
Contributor

KodrAus commented Nov 22, 2018

@bIgBV Neat! The only challenge there is that it looks like slog_stdlog requires an old version of log, so I think folks might run into trouble with mismatched dependencies if they try use it. It looks like there's a PR to fix that though.

@bIgBV
Copy link
Contributor

bIgBV commented Nov 23, 2018

@KodrAus but looks like the PR has been inactive for a while :/

@KodrAus
Copy link
Contributor

KodrAus commented Nov 23, 2018

@bIgBV It looks like it, doesn't it. So we might just want to roll our own log -> slog integration. Oh I just noticed you've got a PR up, so maybe we could look at integration and public API once we've got the internal logging sorted?

@bIgBV
Copy link
Contributor

bIgBV commented Nov 23, 2018

@KodrAus Yeah, that was my initial plan. The PR is just to get internal logging started. At least something as basic as what requests are being served by the application. Once this is in place, I was planning for hooking into configuring different drains, the default being writing to the stdout, but adding support for writing to files and as such. Once this PR is merged, developers can start using logs directly during development of the framework as well.

And I was considering the option of rolling our own integration. I think for the most part we can just lift the functionality we want from slog-stdlog right, because I don't think we necessarily need slog->log integration.

@aturon aturon added this to the 0.1 Release milestone Nov 25, 2018
@bIgBV
Copy link
Contributor

bIgBV commented Nov 25, 2018

@aturon I've got a question regarding the 404 handler. I want to add a log statement to the handler to the following line:

https://github.com/rust-net-web/tide/blob/9ee7426fda8c337d2ead99c9f92612a23f31d243/src/app.rs#L125-L129

But in order to do that, I need access to the RootLogger which is a part of the middleware. Do you know of a way to go about doing that?

One idea I have is to override the logging macros and expose it to the crate so that people can use it internally in tide. Rocket does this and I think it also exposes these overriden log macros to users. We can do the same and also provide a way to capture all logs from the log crate.

@tzilist
Copy link
Contributor

tzilist commented Nov 26, 2018

@bIgBV I am hoping once I finish up the PR for allowing a fallback handler, this should be solved automatically :) Hopefully should have something in the next few days!

@bIgBV
Copy link
Contributor

bIgBV commented Nov 27, 2018

@tzilist I really liked the way the issue was handled. I really like this approach.

Now that a basic logger is a part of the framework, I'd like to now focus on providing with a simple API for developers of Tide to add logging statements. By default slog strips trace and debug logs, so it would be pretty useful for developers when implementing new features.

My idea is that once we have a basic API and some internal users, we can then further improve this by providing a way for capturing logs from the user applications. This can be in the form of a overridden macro which uses the RootLogger internally. On top of that, as @KodrAus mentioned, we can provide a way to bridge log -> slog statements as well.

@KodrAus
Copy link
Contributor

KodrAus commented Dec 3, 2018

Just to link things up, #98 proposes adding an elapsed field to the root logger. It's something we could do inline, but I feel like that's a good use-case we could start building contextual logging around.

@bIgBV
Copy link
Contributor

bIgBV commented Dec 3, 2018

@KodrAus could you expand on that? What exactly does contextual logging mean here?

@KodrAus
Copy link
Contributor

KodrAus commented Dec 4, 2018

Ah, sorry. I should've added some more context to my comment about contextual logging :)

For gathering timings, maybe it could look something like:

await!(logger.time("do some work", async move {
    await!(do_some_work());
});

Where logger.time automatically logs how long the inner async block took to complete when it finishes:

{ "elapsed": 13, "msg": "`do some work` completed in 13ms" }

For other pieces of context, like a correlation id, or that "do some work" tag in the above example, I imagine getting a hold of a derived logger that carries that extra piece of state and automatically adds it to logs recorded using that logger would be the most portable way to carry that state through a chain of futures that might execute on multiple threads. That's how slog supports contextual logging.

An alternative, if there's some reliable way to carry ambient state with a chain of futures now wherever they execute, is to stuff some context there and any logger can grab that ambient context to avoid threading loggers through your code manually.

@bIgBV
Copy link
Contributor

bIgBV commented Dec 4, 2018

@KodrAus love the pun!

This is an interesting approach, but I have a couple of questions regarding this design:

Is adding individual methods like time a good idea?

I ask this because I can't think of a lot of other contextual data which can be logged (caused my ignorance of the subject, not authority 😅 ). So I feel like a more generic approach which works with different kinds of data will work instead of individual methods for each tag we want to add.

Right now, we can use the macros from slog, but in order to completely abstract over it, we will have to have our own internal representation of a Logger, so adding helper methods on it is a good idea, but I feel that we don't have enough data to know what the most common usecases are. I feel having one generic way of adding information would be a good idea, and then once a particular case is used often, we can add a convenience method.

Will this be exposed to framework users as well?

In the same vein, based on the design on the Logger type, we'll have to design the user facing API and how they can wrap their own methods in contextual logs.

@KodrAus
Copy link
Contributor

KodrAus commented Dec 4, 2018

Is adding individual methods like time a good idea?

That's a fair question. Thinking about it more, time is a bit of a special case and probably not actually a good example of contextual logging at all. I probably should've treated them separately to begin with. The idea of time is to give users an easy way to log the entry and exit for a block of code along with how long it took. It's something I at least do regularly in web apps I build, usually wrapping around complete requests, and then around other significant pieces of work, like database operations. It's probably not so useful for tide itself to use internally and could probably be ignored for now.

Right now, we can use the macros from slog, but in order to completely abstract over it, we will have to have our own internal representation of a Logger, so adding helper methods on it is a good idea, but I feel that we don't have enough data to know what the most common usecases are.

Abstracting over the slog macros sounds good, because it leaves the door open for punting on whether ambient context is carried by the logger, or some other way within tide itself. I think the one most common cases for end-users would be associating a correlation id with all logs for a particular request. For tide itself, individual middleware might want to get a logger and push a contextual property that identifies itself in all the log statements it generates. In fact, we might not even allow you to get a hold of a logger without also providing some context:

fn handle<'a>(&'a self, ctx: RequestContext<'a, Data>) -> FutureObj<'a, Response> {
    let logger = ctx.logger("my_middleware");

    info!(logger, ...);
}

Right now I feel like I might just be randomizing this issue, but if we want to improve the base logging API from experience using it internally then I think we should be on the lookout for changes that could be supported by features that'll be useful for end-users too.

@bIgBV
Copy link
Contributor

bIgBV commented Dec 4, 2018

I think the one most common cases for end-users would be associating a correlation id with all logs for a particular request.

I've been thinking about this, and was wondering how we could approach this. Who would provide the correlation id? I think that tide should instrument log statements automatically with it, removing one concern from the user code and providing a consistent interface for logging in general. But in order to do that, we'll have to capture log records made using the log crate and attach correlation id's there as well.

Right now, the biggest blocker for me when it comes to experimenting with these changes is that there is no tide-global store of data. I think this would be the right place to store the RootLogger, and then use it to instantiate child loggers based on the context. For example, your in the your example snippet, the logger method on RequestContext could just instantiate a child logger and add the my_middleware tag to it and return that. Currently, RequestContext only contains request specific data, and no data store. I've been experimenting with such a global store in #100 , but the API right now is pretty un-ergonomic.

but if we want to improve the base logging API from experience using it internally then I think we should be on the lookout for changes that could be supported by features that'll be useful for end-users too.

And yes, I feel like internal experimentation with logging API's would give us enough experience when deciding on a user facing API. I like this approach because it means that if we want to change the API, we won't have to worry about backwards compatibility and breaking our users' code. I feel we should have a relatively set API by 0.1.

@yoshuawuyts
Copy link
Member

yoshuawuyts commented Jan 9, 2019

An alternative, if there's some reliable way to carry ambient state with a chain of futures now wherever they execute, is to stuff some context there and any logger can grab that ambient context to avoid threading loggers through your code manually.

I'm not sure if this is quite the answer you're looking for, but I made an example of a Drop based timer / logger (playground). Perhaps this might be a useful alternative to a closure-based API.

Usage

fn main () {
    time_scope!(1); // times the full duration of the current block
    println!("<handling request>");
    
    let timer = Timer::new(2);
    println!("<running extractors>");
    drop(timer);
    
    let timer = Timer::new(3);
    println!("<execution business logic>");
    drop(timer);
}

Output

<handling request>
<running extractors>
[2] elapsed 3.299µs
<execution business logic>
[3] elapsed 2.48µs
[1] elapsed 43.778µs

@bIgBV
Copy link
Contributor

bIgBV commented Jan 9, 2019

Isn't this similar to what tokio-trace does?

@yoshuawuyts
Copy link
Member

@bIgBV oh yeah, I guess it's similar! I looks like it still uses futures 0.1 tho, which is unfortunate. Perhaps it could be ported to a standalone crate and upgrade the futures version for use in Tide?

I'm not sure how much work that would be; we should probably figure out if it's indeed what we want before taking on a bunch of extra work I reckon, haha.

@bIgBV
Copy link
Contributor

bIgBV commented Jan 9, 2019

@yoshuawuyts I feel that once #109 lands, we can use the store to store(pun unintended) the RootLogger instance. This would mean we can provide simple log macros, and provide support to capture logs from the log crate. That way, users can begin adding logs to their applications, and people working on tide can also add log statements.

That gives us a starting point to go from. Once tokio-trace is stable, we can look at it again. The rust logging landscape seems to be going through some changes as the log crate is also considering adding structured logging (see earlier comment by @KodrAus ), so I feel that an MVP would be a good idea.

@KodrAus
Copy link
Contributor

KodrAus commented Jan 10, 2019

I think it is worth us spending some time figuring out exactly what kind of logging story we want to provide in tide, independent of the implementation details. That way we know what an MVP actually looks like.

If the goal is to offer some bridging API that an end-user can use to capture logs produced by tide internally, then questions like contextual logging and supporting convenient methods like time become questions about how much we need to surface in tide, and how much is necessary for end-users to integrate with their log framework of choice and make sense of their application's logs in the context of tide.

@bIgBV bIgBV mentioned this issue Jan 13, 2019
8 tasks
@bIgBV bIgBV self-assigned this Feb 22, 2019
@yoshuawuyts
Copy link
Member

During triage we've agreed on the following:

  1. we should prioritize logging over tracing, as it's easier to implement.
  2. we should create a separate logging crate for tide to experiment with logging.
  3. the crate should re-export the log macros so no other peer dependencies are needed.
  4. the crate should log information about requests.
  5. when structured logging is implemented in log (An RFC for structured logging rust-lang/log#296), we should move to start using that.

@prasannavl
Copy link
Contributor

prasannavl commented Apr 3, 2019

The agreed on triage above looks good. I haven't gone through the details of much of the discussion above, so sorry if they are redundant. But just faced an issue while testing so thought I'll chime in.

Some points:

  • I'm not at all a fan of using slog as the default logger in any form. I think it should just be the log crate, and everything else should come in as external adapter middleware (whether in crate or external), or from the application.
  • I think this should now be easier given An RFC for structured logging rust-lang/log#296 has landed.
  • Just ran a simple server, with some stress testing, end up with slog throwing out a bunch of errors due to the async logging dropping off messages.
Apr 03 03:22:05.929 INFO GET /hello 200
Apr 03 03:22:05.929 INFO GET /hello 200
Apr 03 03:22:05.930 INFO GET /hello 200
Apr 03 03:22:05.930 ERRO slog-async: logger dropped messages due to channel overflow, count: 11
Apr 03 03:22:05.930 ERRO slog-async: logger dropped messages due to channel overflow, count: 25

This could be a fine configuration handling back-pressure, but I think the decision should always be left to the application to reduce surprises, especially with respect to missing logs.

And a simple synchronous logger like env_logger instead of slog-async is simple, obvious and failure proof, integrating with most of the rust-ecosystem. I think the default logging middleware included as a part of tide (or in tide blessed repos) should likely start with env_logger, with the rest being optional choices.

Some more data points:

  • It's a common case to not use async logging in applications, and instead always throw them into stdout, and have a log collector do the rest of the job however they deem fit from there.
  • I personally tend to use env_logger first and structured logging is usually another layer in-between that sends off async messages to a log observer. So, when if starts to drop off messages, env_logger still does it's job and there is something.

I'd prefer tide's defaults to be simple, having less surprises than something esoteric or packed with functionality by default.

@bIgBV
Copy link
Contributor

bIgBV commented Apr 3, 2019

@prasannavl I agree with what you are saying. This has come up a while ago in the meetings and the current consensus is to simply use the log crate and it's macros and the framework user will simply use any off the shelf Logger implementations like env_logger, pretty_env_logger or even slog (or even tokio-trace) as they see fit. Right now I'm waiting for #156 to land so that I can go in and remove the logging middleware and update the examples to use env_logger instead.

This was referenced May 15, 2019
@drewish
Copy link

drewish commented Mar 15, 2020

Since #222 has been merged can this issue be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A feature that's ready for implementation good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

7 participants