Redshift: Quote column names and send recovery statsd metrics #1329

istreeter · 2023-12-13T16:58:20Z

No description provided.

istreeter · 2023-12-22T11:09:32Z

modules/loader/src/main/scala/com/snowplowanalytics/snowplow/rdbloader/loading/Load.scala

+                          for {
+                            _ <- setStage(Stage.Committing)
+                            _ <- Manifest.add[F](discovery.origin.toManifestItem)
+                          } yield LoadSuccess(loadedRecoveryTableNames)


Drawing attention to this change. Previously, we did an extra lookup to the manifest table, and the only reason was to get a timestamp (essentially "now"). After this change we simply use the loader's local version of when is "now". The value is used in monitoring payloads.

Arguably it doesn't belong in this PR, which is supposed to be about adding a statsd metric for recovery tables.

But.... it is related to how we pass around data relating to the load. Data which is used for building metrics and monitoring payloads. So I smuggled it into this PR.

istreeter · 2023-12-22T11:14:32Z

...les/loader/src/main/scala/com/snowplowanalytics/snowplow/rdbloader/dsl/metrics/Metrics.scala

+          maxTstamp,
+          Some(shredderStart),
+          Some(shredderEnd),
+          Some(recoveryTablesLoaded)


Wonder if we should omit (not emit) the metric if it is zero? Or if we're loading to snowflake/databricks for which it has no meaning.

I think this question depends on the downstream dependencies - we have statsD, then we have the metrics relay/custom tooling to push that data to eg. cloudwatch. I'm not sure if I remember this well enough but I think some of those things can fail if they expect a metric that isn't present - but I might be wrong.

I think my opinion stands whatever the answer to the above consideration is though:

For Snowflake/Databricks, I would omit it, assuming the tf stacks are sufficiently separated that doing so is trivial and doesn't introduce complexity.

For Redshift I would not omit it, even if there were no recovery tables produced - in the case of some confusing issue, knowing that 0 were produced can be as valuable as knowing that 1 or more were produced. I encountered such a scenario recently in QA.

I just pushed an extra commit, which disables the metric for Snowflake/Databricks loaders.

colmsnowplow · 2024-01-02T10:57:42Z

...loader/src/test/scala/com/snowplowanalytics/snowplow/rdbloader/dsl/metrics/MetricsSpec.scala

      val expected = Metrics.KVMetrics.LoadingCompleted(
        KVMetric.CountGood(countGood),
        KVMetric.CountBad(countBad),
        Some(KVMetric.CollectorLatencyMin(collectorLatencyMin)),
        Some(KVMetric.CollectorLatencyMax(collectorLatencyMax)),
        KVMetric.ShredderLatencyStart(shredderStartLatency),
-        KVMetric.ShredderLatencyEnd(shredderEndLatency)
+        KVMetric.ShredderLatencyEnd(shredderEndLatency),
+        KVMetric.RecoveryTablesLoaded(2)


Seems like this tells us how many recovery tables were used at all, which would include tables that existed before this load, is that correct?

I think this would be useful, but I'd also like to know whether a new recovery table was created in this batch - since I imagine support will usually be most interested in whether or not there is a new problem, when there's more than one existing recovery table in use. Is this possible to do?

(Not entirely sure I've explained the thinking correctly but shout and I'll be happy to clarify!)

Discussed in a meeting - we provide the ability to see that information in the data we send to ops1 - it would be possible to construct a query that tells us when a new recovery table is used.

The implementation in this PR gives us the ability to get fast feedback for rollout, and tells us broadly about recovery table usage, so I'm happy that it's sufficient.

colmsnowplow

Looks good as far as I can see. :)

…aders

colmsnowplow

I think the additional commit looks like a clean way to handle things. :)

istreeter · 2024-01-03T09:10:07Z

I manually merged this into develop branch, so closing it here.

istreeter force-pushed the fix/redshift-quote-column-names branch from ac8d459 to 2f46859 Compare December 14, 2023 11:57

Quote column names in Redshift load statements (close #1330)

02da1fc

istreeter force-pushed the fix/redshift-quote-column-names branch from 2f46859 to 02da1fc Compare December 22, 2023 09:11

Redshift loader: send statsd metrics for recovery tables (close #1331)

873e0a4

istreeter changed the title ~~Quote column names in Redshift load statements~~ Redshift: Quote column names and send recovery statsd metrics Dec 22, 2023

istreeter commented Dec 22, 2023

View reviewed changes

colmsnowplow reviewed Jan 2, 2024

View reviewed changes

colmsnowplow approved these changes Jan 2, 2024

View reviewed changes

amendment 1: disable recovery table metrics for redshift/snowflake lo…

5857953

…aders

colmsnowplow approved these changes Jan 2, 2024

View reviewed changes

istreeter closed this Jan 3, 2024

istreeter deleted the fix/redshift-quote-column-names branch January 3, 2024 09:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redshift: Quote column names and send recovery statsd metrics #1329

Redshift: Quote column names and send recovery statsd metrics #1329

istreeter commented Dec 13, 2023

istreeter Dec 22, 2023

istreeter Dec 22, 2023

colmsnowplow Jan 2, 2024

istreeter Jan 2, 2024

colmsnowplow Jan 2, 2024

colmsnowplow Jan 2, 2024

colmsnowplow left a comment

colmsnowplow left a comment

istreeter commented Jan 3, 2024

Redshift: Quote column names and send recovery statsd metrics #1329

Redshift: Quote column names and send recovery statsd metrics #1329

Conversation

istreeter commented Dec 13, 2023

istreeter Dec 22, 2023

Choose a reason for hiding this comment

istreeter Dec 22, 2023

Choose a reason for hiding this comment

colmsnowplow Jan 2, 2024

Choose a reason for hiding this comment

istreeter Jan 2, 2024

Choose a reason for hiding this comment

colmsnowplow Jan 2, 2024

Choose a reason for hiding this comment

colmsnowplow Jan 2, 2024

Choose a reason for hiding this comment

colmsnowplow left a comment

Choose a reason for hiding this comment

colmsnowplow left a comment

Choose a reason for hiding this comment

istreeter commented Jan 3, 2024