-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redshift: Quote column names and send recovery statsd metrics #1329
Conversation
ac8d459
to
2f46859
Compare
2f46859
to
02da1fc
Compare
for { | ||
_ <- setStage(Stage.Committing) | ||
_ <- Manifest.add[F](discovery.origin.toManifestItem) | ||
} yield LoadSuccess(loadedRecoveryTableNames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drawing attention to this change. Previously, we did an extra lookup to the manifest table, and the only reason was to get a timestamp (essentially "now"). After this change we simply use the loader's local version of when is "now". The value is used in monitoring payloads.
Arguably it doesn't belong in this PR, which is supposed to be about adding a statsd metric for recovery tables.
But.... it is related to how we pass around data relating to the load. Data which is used for building metrics and monitoring payloads. So I smuggled it into this PR.
maxTstamp, | ||
Some(shredderStart), | ||
Some(shredderEnd), | ||
Some(recoveryTablesLoaded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if we should omit (not emit) the metric if it is zero? Or if we're loading to snowflake/databricks for which it has no meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this question depends on the downstream dependencies - we have statsD, then we have the metrics relay/custom tooling to push that data to eg. cloudwatch. I'm not sure if I remember this well enough but I think some of those things can fail if they expect a metric that isn't present - but I might be wrong.
I think my opinion stands whatever the answer to the above consideration is though:
- For Snowflake/Databricks, I would omit it, assuming the tf stacks are sufficiently separated that doing so is trivial and doesn't introduce complexity.
- For Redshift I would not omit it, even if there were no recovery tables produced - in the case of some confusing issue, knowing that 0 were produced can be as valuable as knowing that 1 or more were produced. I encountered such a scenario recently in QA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed an extra commit, which disables the metric for Snowflake/Databricks loaders.
val expected = Metrics.KVMetrics.LoadingCompleted( | ||
KVMetric.CountGood(countGood), | ||
KVMetric.CountBad(countBad), | ||
Some(KVMetric.CollectorLatencyMin(collectorLatencyMin)), | ||
Some(KVMetric.CollectorLatencyMax(collectorLatencyMax)), | ||
KVMetric.ShredderLatencyStart(shredderStartLatency), | ||
KVMetric.ShredderLatencyEnd(shredderEndLatency) | ||
KVMetric.ShredderLatencyEnd(shredderEndLatency), | ||
KVMetric.RecoveryTablesLoaded(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this tells us how many recovery tables were used at all, which would include tables that existed before this load, is that correct?
I think this would be useful, but I'd also like to know whether a new recovery table was created in this batch - since I imagine support will usually be most interested in whether or not there is a new problem, when there's more than one existing recovery table in use. Is this possible to do?
(Not entirely sure I've explained the thinking correctly but shout and I'll be happy to clarify!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in a meeting - we provide the ability to see that information in the data we send to ops1 - it would be possible to construct a query that tells us when a new recovery table is used.
The implementation in this PR gives us the ability to get fast feedback for rollout, and tells us broadly about recovery table usage, so I'm happy that it's sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good as far as I can see. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the additional commit looks like a clean way to handle things. :)
I manually merged this into develop branch, so closing it here. |
No description provided.