-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some debug-span and debug-log changes to help with filtering during tracing analysis #11289
Conversation
…into more-tracing
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #11289 +/- ##
==========================================
+ Coverage 71.08% 71.10% +0.01%
==========================================
Files 783 783
Lines 156875 156813 -62
Branches 156875 156813 -62
==========================================
- Hits 111517 111495 -22
+ Misses 40522 40495 -27
+ Partials 4836 4823 -13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -2525,7 +2519,7 @@ impl Chain { | |||
"get_state_response_part", | |||
shard_id, | |||
part_id, | |||
%sync_hash) | |||
?sync_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change necessary? By default Display is intended for anything reaches user eyes, including logs...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like they are the same
impl fmt::Debug for CryptoHash {
fn fmt(&self, fmtr: &mut fmt::Formatter<'_>) -> fmt::Result {
fmt::Display::fmt(self, fmtr)
}
}
impl fmt::Display for CryptoHash {
fn fmt(&self, fmtr: &mut fmt::Formatter<'_>) -> fmt::Result {
self.to_base58_impl(|encoded| fmtr.write_str(encoded))
}
}
(Though for hashes in particular, I have a habit of using Debug, because I think the ethereum libraries implement Display by displaying the hash with ellipsis in the middle and it's really annoying. I guess Near code doesn't do that but the habit and fear has already formed :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for the consistency because in majority of other places hashes are logged with Debug.
chain/client/src/stateless_validation/chunk_endorsement_tracker.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the significance of the "target" parameter in the tracing analysis? How should we choose the target going forward?
@@ -2525,7 +2519,7 @@ impl Chain { | |||
"get_state_response_part", | |||
shard_id, | |||
part_id, | |||
%sync_hash) | |||
?sync_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like they are the same
impl fmt::Debug for CryptoHash {
fn fmt(&self, fmtr: &mut fmt::Formatter<'_>) -> fmt::Result {
fmt::Display::fmt(self, fmtr)
}
}
impl fmt::Display for CryptoHash {
fn fmt(&self, fmtr: &mut fmt::Formatter<'_>) -> fmt::Result {
self.to_base58_impl(|encoded| fmtr.write_str(encoded))
}
}
(Though for hashes in particular, I have a habit of using Debug, because I think the ethereum libraries implement Display by displaying the hash with ellipsis in the middle and it's really annoying. I guess Near code doesn't do that but the habit and fear has already formed :) )
It aids the ability to filter (out) the (un)interesting parts of the code. I personally think that a single word that's used in too many spans/events is insufficiently precise, but this is something that each of us needs to discover independently, unfortunately. |
@nagisa I understand that it aids filtering at the logging level, but this is talking about tracing where we have the ability to take in all traces and then filter/analyze them later. So I wanted to know how @tayfunelmas is using the target tag in his analysis, and if anyone changes the tags in the future, what impact it would have on the tracing analysis tooling. |
It isn't actually practical to send everything that's traced off to somewhere in many cases. It is already a constant >500KiB/s of ingest with the few things we trace at the debug level and that's already enough for grafana tempo to start dropping some of the ingest traffic at our fairly conservative default ingest limits. And with an increase of those limits, so would increase our monthly cost (this holds true regardless of where we ship the traces off to, unless they are held locally for immediate inspection, such as with your tool; but even then storage ain't free…) If we didn't filter our traces at the emitter we'd be looking at potentially dozens of megabytes of traces per second. Components like the compiler or hyper are particularly chatty. So even for traces it is important to have well thought-out targets and levels. Otherwise even just gathering an useful trace becomes a chore (as we recently found out -- enabling host function tracing ended up with traces so truncated that they were largely incomprehensible…) |
Co-authored-by: Simonas Kazlauskas <github@kazlauskas.me>
Co-authored-by: Simonas Kazlauskas <github@kazlauskas.me>
Co-authored-by: Simonas Kazlauskas <github@kazlauskas.me>
…r.rs Co-authored-by: Simonas Kazlauskas <github@kazlauskas.me>
Actually both useful and not. First, I am planning to primarily use the span/log names and the fields. The target field will only help to filter down the logs, even though the data that needs to be processed may be still large, but better than having everything. I am expecting that for certain analysis (eg. chunk/block production), the interesting pieces of the logs will be coming from certain targets such as "client" and "network", this is why I did some changes to the target names for the events that I think are interesting. |
Our style guide, by the way, has some guidance on that. It reads as such:
|
This is not the complete list of changes I plan to do but wanted to get a first batch reviewed first.
There are two kinds of changes applied to debug logs and traces that would help filtering the them when analyzing traces (either through the new tracing UI or through the log files):
Simplify the name, eg. instead of full sentence (unless sentence makes more sense), replace with an identifier-like name (in most cases reflecting the type of operation or function). Move the params substituted in the string to separate log/span fields.
Add missing fields and attempt to standardize the naming for the fields so that when it comes to filtering traces for a certain analysis we know what kind of fields are available across the logs/traces for the same kind of entity/operation. For example:
shard_id
for shard id (also prefer shard_id over UID as it has extra version and it can be obtain by other means).sync_hash
for hash of the state-syncheight
for the block height (use last_has and prev_hash if it is not clear from the context)block_hash
for hash of the blockchunk_hash/chunk_hashes
for hash of the chunkpart_id
for the id of the state-sync parterror
for the error type or messagesync_type
for the sync type (eg. block, head, state)height_included
for the block height a chunk is includedheight_created
for the block height a chunk is created