Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] [Next] Query Logging Improvements #942

Open
abonander opened this issue Dec 29, 2020 · 31 comments
Open

[Proposal] [Next] Query Logging Improvements #942

abonander opened this issue Dec 29, 2020 · 31 comments
Labels

Comments

@abonander
Copy link
Collaborator

abonander commented Dec 29, 2020

A lot of people will probably agree with me when I say there's room for improvement with how we currently log queries.

  1. Logging is very noisy because we always print the whole query at INFO. I think at most we should be printing at INFO is the source file and line of the query (which we can do with #[track_caller] and std::panic::Location::caller()), maybe the first couple keywords in the query as context, and the time the query took. We should only be logging the full query (with parameters interpolated) at TRACE.

  2. Logging is expensive because we pretty-print the SQL. I think we shouldn't be doing nearly that much to it. At most, we should be removing excess indentation so the query isn't weirdly left-padded in logs. I think sqlformat can help us there instead of rolling it ourselves. (cc @shssoichiro)

  3. It doesn't make sense for ConnectOptions::{log_statements, log_slow_statements} to take a log::LevelFilter; instead, it should be Option<log::Level>. Then, if someone is using a crate like tracing that includes log but doesn't want to use log directly, they can still disable query logging by passing None (Add a way to disable query logging programmatically #939).

  4. (Bug) we don't actually obey the log-level setting for sqlx::query, only the global log setting. This doesn't need to be in next, though; I'll be fixing it shortly. Done: fix(logging): make query logging obey level setting for sqlx::query target #945

@abonander
Copy link
Collaborator Author

abonander commented Dec 29, 2020

I'm thinking we can change how much we log based on the log level set for sqlx::query:

At INFO, we only print a short summary of the query as mentioned in bullet point 1, so something like this:

[INFO sqlx::query] "SELECT foo, ... FROM bar ..." at src/db/my_query.rs:42 executed in 490.32ns

At DEBUG (and the slow-query logging at WARN or otherwise configured level), we print that plus query parameters:

[DEBUG sqlx::query] "SELECT foo, ... FROM bar ..." at src/db/my_query.rs:42 executed in 490.32ns; parameters: ["baz", 1234567, "2020-12-29T12:28:19-08:00"]

At TRACE, we print the whole query verbatim (only excess left-padding removed) with parameters interpolated:

[TRACE sqlx::query] query at src/db/my_query.rs:42 executed in 490.32ns:
SELECT 
    foo, 
    baz, 
    quux
FROM bar
WHERE 
    baz = 'baz' AND quux <> 1234567 AND created_at > `2020-12-29T12:28:19-08:00`
LIMIT 5

Thanks to #[track_caller], we can get source file and line info for both the query macros and functions.

@SomeoneToIgnore
Copy link
Contributor

SomeoneToIgnore commented Dec 30, 2020

Looks great, thanks for coming with this up.
I have nothing to add to the issue message, but the first comment with the query format proposal looks worth discussing.

  • On DEBUG level, to avoid accidentally printing a couple of thousands parameters, might be good to limit them with some arbitrary number.

  • I like the idea to separate the verbosity levels and wonder if we should drag the separation further and split logging level and logging verbosity?

If I've understood the current proposal right, sqlx query verbosity changes depending on the log::Level we set for either.
With this model, would it ever be possible to WARN slow query with parameters in the log message? Maybe somebody would be insterested in logging the whole slow query with some non-TRACE level.

If you think that might make sense, consider keeping the log level just the log level and add some kind of verbosity parameter to the query settings builder, allowing to switch the formats.

==
Come to think of it, this separation might serve as a better compromise with non-log users due to being framework agnostic.

@abonander
Copy link
Collaborator Author

abonander commented Dec 30, 2020

With this model, would it ever be possible to WARN slow query with parameters in the log message?

I didn't think of slow query logging when I wrote that out but yeah, it should definitely include the parameters so it's possible to repro the slowdown. So the slow query log at WARN should probably be the same format as DEBUG.

On DEBUG level, to avoid accidentally printing a couple of thousands parameters, might be good to limit them with some arbitrary number.

That's hard cause like, what arbitrary number is acceptable? What about for a slow-query log? Then you'd probably want all the parameters for reproducibility. Perhaps for slow-query logging we should have an option to dump the full query data with the complete SQL and all parameters to a file or some other sink.

@shssoichiro
Copy link
Contributor

I like the ideas for INFO and TRACE, but I don't think printing parameters is a good idea. It has the potential for injecting PII into logs.

@abonander
Copy link
Collaborator Author

abonander commented Dec 30, 2020

True, but it could come in handy for reproducing slow queries. Instead of changing output for DEBUG/WARN, maybe we have a setter on ConnectOptions?

/// Log query arguments (values substituted into value placeholders, i.e. `$N` / `?`).
///
/// ### Note: Beware of Personally Identifiable Information
/// This will likely expose personally identifiable information (PII), or otherwise infringe on privacy of users 
//// (which may be protected by law), if the logs are submitted to third-party providers like Sentry, Firebase, etc.
///
/// Take care when enabling this option.
pub fn log_arguments(&mut self, log_args: bool) -> &mut Self

As suggested in my previous comment, maybe we don't want to lean directly on log for saving reproducible queries but instead allow directly configuring a sink for query data for processing/storage.

@jkoudys
Copy link
Contributor

jkoudys commented Dec 30, 2020

@abonander big agree on the 2nd point. Personally I strongly dislike that the query is formatted at all -- we're losing valuable information in the logs! The original, unaltered string can have a lot of interesting hints at the cause of a problem that are simply being thrown away irrecoverably. I could always pipe the query through an sql formatter from vim if I really cared about it being pretty - taking that hit at runtime seems pointless. The query formatter it's running now is doing a full tokenization, which is so much more expensive than the remove-the-indent bit you suggest.

@SomeoneToIgnore
Copy link
Contributor

So the slow query log at WARN should probably be the same format as DEBUG.

I presume, you have to do the same for Error level too then, keeping in mind that any user can change the log levels for both queries with the code like

let mut connection_options = MySqlConnectOptions::from_str(connection_url)?;
connection_options
    .log_statements(log::LevelFilter::Error)
    .log_slow_statements(log::LevelFilter::Error, Duration::from_secs(1));
let pool = MySqlPoolOptions::new().connect_with(connection_options).await?;

So, putting all formats and queries together, did I get you right:

regular query format slow query format
off
trace full query, interpolated full query, interpolated
debug short summary + parameters short summary + parameters
info short summary short summary + parameters
warn short summary short summary + parameters
error short summary short summary + parameters

?

For me, it's

  • a bit odd to see inconsistent query formats on different log levels
  • just odd to see a library behavior to change due to different log level (configurable via env variables!), I fail to recall any similar cases for libraries. Keeping in mind the performance issues, still looks like a footgun feature in general

Thus said, I still would have preferred something like enum QueryLogFormat { Full, WithParameters(Option<usize>), Summary, None } with the way to set it simiarly to the log::LevelFilter now + some reasonable defaults for both queries.

But the choice is yours after all, I won't insist on this point of view much 🙂

@SomeoneToIgnore
Copy link
Contributor

what arbitrary number is acceptable? What about for a slow-query log?

I think you can either make it configurable, or just pick some arbitrary number (100?) and recommend to log the full log for all parameters.

Parameters' value inflate with their quantity drastically: 1-10 params can be valuable, but what about 1843, especially when they are displayed detached from the main query? :)

Then you'd probably want all the parameters for reproducibility. Perhaps for slow-query logging we should have an option to dump the full query data with the complete SQL and all parameters to a file or some other sink.

Great ideas in general, not sure if feasible to implement right away.
Slow sql detection + configurable query format + the pointer to the place where the query originates might be enough to cover the majority of the use cases.

I like the idea to try and be as failsafe as possible for the end user, but not sure it's possible in general to predict everything and log the query in the best possible format, keeping this whole shebang configurable might be just enough.

@shssoichiro
Copy link
Contributor

I'm also +1 for making query logging configurable, perhaps as an option when building the sqlx Connection, with some sensible default. Probably we would want two parameters, one to configure the standard log level (defaults to either off or the short summary) and one to configure the slow query log level (defaults to the short summary).

@mehcode
Copy link
Member

mehcode commented Dec 30, 2020

Let's take a step back here and go survey how other large SQL frameworks do this. Rails, Django, etc. There must be some inspiration on what works well to draw from.

@mehcode mehcode removed this from the 0.5 milestone Mar 16, 2021
@Fishrock123
Copy link

It would be nice if tracing events could be emitted alongside the logging events (with a separate loglevel) at some granularity.

We'd prefer this stuff in our honeycomb traces. I turn all the default "info" stuff from SQLx down to "debug" because it's not useful enough in production.

@Thomasdezeeuw
Copy link
Contributor

I just wanted to echo the request to change these logs from info to trace. We're actually forced to disable info logging because of SQLx, which normally includes useful stuff as well. Making the logging optional (perhaps by the setting the required LogLevel) and whether or not to log the parameters would also be a big plus for us.

@jplatte
Copy link
Contributor

jplatte commented Sep 3, 2021

The log level can already be customized (and had its default changed) on master since #1371, so that will be part of the next release.

@malobre
Copy link

malobre commented Sep 11, 2021

It would be nice to have query-logging / SQL pretty-printing feature-gated. sqlformat depends on regex which is quite long to compile.

@shssoichiro
Copy link
Contributor

I just published a new patch version of sqlformat that removes the regex dependency. Does this help?

@malobre
Copy link

malobre commented Sep 12, 2021

Yes it does! Thanks a lot.

@bin
Copy link

bin commented Oct 23, 2021

I'd suggest adding the ability to entirely disable logging of the actual query and print just a warning message with the line number when a query is slow, as this would be useful for very large queries. I use large batch insert queries that max out the number of placeholders, and formatting these to log causes my inserts to take about thirty times longer in debug builds and around three times longer in release. It chews up quite a lot of CPU in the process compared to the non-logging insertions (which are mainly I/O bound.)

@abonander
Copy link
Collaborator Author

Yeah, I think I tend to agree cause I personally don't really pay attention at all to the queries being logged, it just adds noise.

And if you're running in, say, Gcloud, you pay by the byte for log storage so noise isn't just annoying but can be costly too. Learned that the hard way recently.

With #[track_caller] we can report the file and line even when using the query functions and not the macros.

@taladar
Copy link
Contributor

taladar commented Jan 15, 2022

And if you're running in, say, Gcloud, you pay by the byte for log storage so noise isn't just annoying but can be costly too. Learned that the hard way recently.

I don't think libraries should tailor their log output to very specific use cases like that when there is the option to just set the log level per module in RUST_LOG. Primarily logging all queries should be something enabled in development and when trying to trace specific bugs only anyway.

@MaxFangX
Copy link

Just want to express a +1 for the original proposal which makes the INFO log much more concise, and moves the pretty-printed, multi-line log to TRACE.

My company is running a warp webserver which uses SeaORM which uses sqlx, and we log at INFO under normal circumstances. Our problem was that whenever anything happens on our server that triggers a DB hit, sqlx's extremely verbose INFO log ends up dominating about 70% of the vertical space in our log outputs, making it really hard to visually identify the high-level application logs which are really what we're interested in. As I understand it, many people are using sqlx relatively deep in their stack, and usually don't care to see full sqlx queries when everything is working fine - it's only when something is going wrong with the queries that people may want to see the full, verbose output - in which case they will set their log level to DEBUG or TRACE (and in which case seeing the full pretty-printed query is extremely helpful).

Of all the dependencies we have, sqlx is the only one where we had to adjust the log level because it was too verbose. I think this is probably the case for most people, and therefore that by default the INFO logs should print a relatively short, single-line log at most. My company ended up disabling sqlx logging entirely, and only turn it back on whenever we are debugging database queries.

@penso
Copy link

penso commented Aug 6, 2022

Agree and same situation than @MaxFangX at my company

@arifd
Copy link

arifd commented Aug 21, 2022

Beyond a discussion of how to format the logs, I would like to propose that any logging of the happy and typical path be reduced to TRACE. For reasons already stated by other people in this thread, but also because this allows me to optimize away slqx logs entirely at compile time by setting a compile time flag.

INFO is valuable to me in production for other reasons, so I can't disable that level at compile time.

Because of this, even with filtering, every database access unnecessarily incurs an extra atomic load and a boolean equality check in release builds.

@MahdiBaghbani
Copy link

MahdiBaghbani commented Nov 21, 2022

I recently developed an API with rust (Poem with Seaql which uses sqlx underneath),
I believe sqlx shouldn't emit anything at INFO level, it makes production environments totally unreadable, specially if the query is complex.
it is occurring for every single request on an API endpoint. it is useful while debugging but not otherwise.

@shssoichiro
Copy link
Contributor

As the original author of the log implementation, and after having used this for a while in production... I am +1 for moving the normal, happy-path logs to DEBUG level. It does add quite a bit of noise, and the comments I've seen about performance are valid as well.

@arifd
Copy link

arifd commented Jan 27, 2023

As the original author of the log implementation, and after having used this for a while in production... I am +1 for moving the normal, happy-path logs to DEBUG level. It does add quite a bit of noise, and the comments I've seen about performance are valid as well.

I don't filter DEBUG in production so I won't get any performance benefit from this change. Perhaps could set the level behind a feature flag or compile time env var, so that at least we have the choice for what works best for us?

on a separate note, I have slow query logging enabled, and the output looks like this:

  2023-01-27T14:53:24.137806Z  WARN sqlx::query: UPDATE entries SET processing_status …; rows affected: 18462, rows returned: 0, elapsed: 1.791s

UPDATE
  entries
SET
  processing_status = 'ready_to_process',
  processing_metadata_is_up_to_date = false,
  processing_error = NULL
WHERE
  id = ANY($1)

If the log event emits the SQL anyway, could we use the space where it prints a snippet of the SQL (UPDATE entries SET processing_status …;) to instead tell us the reason behind this WARN event (slow) - this will let me know i need to bring my attention to the elapsed field, and not otherwise be concerned that i can't understand why it's warning me (or anyone on the team)

@felipelalli
Copy link

felipelalli commented Feb 27, 2023

As the original author of the log implementation, and after having used this for a while in production... I am +1 for moving the normal, happy-path logs to DEBUG level. It does add quite a bit of noise, and the comments I've seen about performance are valid as well.

I would like to adjust the log levels of full queries to TRACE. Is it possible to make this configurable?

Furthermore, I propose setting slow queries to WARN level instead. It might be useful to have the "slow" interval as a configurable parameter, with a default setting of around 100ms.

@CosmicHorrorDev
Copy link
Contributor

I would like to adjust the log levels of full queries to TRACE. Is it possible to make this configurable?

https://docs.rs/sqlx/latest/sqlx/trait.ConnectOptions.html#tymethod.log_statements

Furthermore, I propose setting slow queries to WARN level instead. It might be useful to have the "slow" interval as a configurable parameter, with a default setting of around 100ms.

https://docs.rs/sqlx/latest/sqlx/trait.ConnectOptions.html#tymethod.log_slow_statements

@vexx32
Copy link

vexx32 commented Jun 30, 2023

So currently I am passing a connection string into PgPoolOptions::connect()... I can't find a way to disable statement logging without also having to go through the whole annoyance of splitting out the connection string into pieces in order to supply it to a PgConnectOptions instance. Why is there no way to set a full connection string on PgConnectOptions directly? 😩

@abonander
Copy link
Collaborator Author

abonander commented Jun 30, 2023

There is; you can parse it from the string and modify it in-place:

// Using the standard `FromStr` trait and `str::parse()` method
let mut opts: PgConnectOptions = "<your database URL>".parse()?;
opts.log_statements(...);

let pool = pool_options.connect_with(&opts).await?;

@vexx32
Copy link

vexx32 commented Jun 30, 2023

That's what I was missing, thank you so much! 💖

@temeddix
Copy link

temeddix commented Aug 20, 2023

This needs to be fixed, as this pollutes the terminal significantly. Especially on bulk insertion, the console(or the log file) becomes a mess. The content of the whole query should be excluded or be written as TRACE level of logging IMHO.

Below is the Rust code that I'm using:

let mut mysql_connect_option: MySqlConnectOptions = database_address.parse().unwrap();
mysql_connect_option = mysql_connect_option.log_statements(log::LevelFilter::Trace);
let mysql_pool_option = MySqlPoolOptions::new().max_connections(32);
let mysql_pool = mysql_pool_option
    .connect_with(mysql_connect_option)
    .await
    .expect("Unable to connect to the database");
tracing::info!("Connected to database");

This is the console output:
image
(...goes on and on...)
image

EDIT: This seems to be the trick:

let mut mysql_connect_option: MySqlConnectOptions = database_address.parse().unwrap();
mysql_connect_option = mysql_connect_option.disable_statement_logging();
let mysql_pool_option = MySqlPoolOptions::new().max_connections(32);
let mysql_pool = mysql_pool_option
    .connect_with(mysql_connect_option)
    .await
    .expect("Unable to connect to the database");
tracing::info!("Connected to the database");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests