[worker] - log unknown build error logs in Datadog#3386
Conversation
|
Subscribed to pull request
Generated by CodeMention |
|
Size Change: -492 B (0%) Total Size: 72.7 MB
|
| const rawMessage = | ||
| maybeRawError instanceof Error ? maybeRawError.message : 'An unknown error occurred'; | ||
| void datadogLogs.send({ | ||
| message: `Unknown build error: ${rawMessage}`, |
There was a problem hiding this comment.
is "unknown build error:" helpful here?
| void datadogLogs.send({ | ||
| message: `Unknown build error: ${rawMessage}`, | ||
| level: 'error', | ||
| tags: { build_id: this.buildId }, |
There was a problem hiding this comment.
Is the cardinality on this ok? I've heard high cardinality is where we would pay big bucks
There was a problem hiding this comment.
That doesnt apply to logs fields, that pricing is for custom metrics.
Adding a high-cardinality tag like a Build ID to your logs does not directly increase your per-log "Log Management" price, but it can trigger massive costs elsewhere in your Datadog bill.
Here is how that high-variable tag affects different parts of your bill:
- Log Management (Ingestion & Indexing)
Direct Cost: There is no extra charge for adding more tags or high-cardinality attributes to your logs.
The "Weight" Impact: Adding a long tag string increases the size (bytes) of each log. Since you are billed for ingestion by the gigabyte (~$0.10/GB), a very large number of long tags could slightly increase your ingestion costs over time.- The Trap: "Log-Based Metrics"
The real danger is if you use that tag in a Log-Based Metric.
The Cost: Datadog charges for custom metrics based on the number of unique tag combinations (cardinality).
The Result: If you create a metric from your logs and include the build_id tag, every single build will create a new "unique time series." This can quickly exceed your custom metric allotment and lead to thousands of dollars in overage fees ($0.05 per metric/month).
| if (err.errorCode === errors.ErrorCode.UNKNOWN_ERROR) { | ||
| const rawMessage = | ||
| maybeRawError instanceof Error ? maybeRawError.message : 'An unknown error occurred'; | ||
| void datadogLogs.send({ |
There was a problem hiding this comment.
| void datadogLogs.send({ | |
| datadogLogs.send({ |
?
There was a problem hiding this comment.
Also note technically we might finish the build before we get to send the request. Might be ok but worth mentioning.
| method: 'POST', | ||
| headers: { | ||
| 'Content-Type': 'application/json', | ||
| 'DD-API-KEY': this.apiKey, |
There was a problem hiding this comment.
If worker has access to DataDog API key we could as well set it here — users are going to have access to it.
For step metrics we decided to relay these through www. Why can we not do it for this?
| interface DatadogLogsOptions { | ||
| apiKey: string | null; | ||
| site: string; | ||
| service: string; | ||
| logger: bunyan; | ||
| } |
There was a problem hiding this comment.
Why does AI like to extract these interfaces so much instead of inlining in the constructor…
| ]), | ||
| }); | ||
| } catch (error) { | ||
| this.logger.error(error, 'Failed to send log to Datadog'); |
There was a problem hiding this comment.
| this.logger.error(error, 'Failed to send log to Datadog'); | |
| this.logger.error({ err: error }, 'Failed to send log to Datadog'); |
This is bunyan nomenclature
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3386 +/- ##
==========================================
+ Coverage 49.63% 52.31% +2.69%
==========================================
Files 673 804 +131
Lines 28355 33425 +5070
Branches 5903 6975 +1072
==========================================
+ Hits 14070 17482 +3412
- Misses 13082 14555 +1473
- Partials 1203 1388 +185 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| @@ -316,6 +317,9 @@ export default class BuildService { | |||
| const err = error instanceof errors.BuildError ? error : unknownError; | |||
| const maybeRawError = error instanceof errors.BuildError ? error.innerError : error; | |||
|
|
|||
| const rawErrorMessage = | |||
| maybeRawError instanceof Error ? maybeRawError.message : 'An unknown error occurred'; | |||
There was a problem hiding this comment.
Should we JSON.stringify(maybeRawError) if it's not an Error?
| @@ -316,6 +317,9 @@ export default class BuildService { | |||
| const err = error instanceof errors.BuildError ? error : unknownError; | |||
| const maybeRawError = error instanceof errors.BuildError ? error.innerError : error; | |||
|
|
|||
| const rawErrorMessage = | |||
There was a problem hiding this comment.
nit: move to inside if (== UNKNOWN)
| }, | ||
| shouldThrowOnNotOk: false, | ||
| } | ||
| ).catch(() => {}); |
There was a problem hiding this comment.
What would happen if we didn't catch it? Let's not catch if not necessary.
|
⏩ The changelog entry check has been skipped since the "no changelog" label is present. |
Why
Record Unknown Errors with Datadog logs so that we can index by build ID, track patterns, and determing regressions/fixes. More detailed here https://github.com/expo/universe/pull/24771
How
Use Datadogs log ingestion API https://docs.datadoghq.com/api/latest/logs/#send-logs