feat(generic-metrics): Add support for gauges in dataset and processo…

…rs (#4912) * add migrations for tables * update to use anyLast (this doesn't work) * try with argMax aggregate function * rename migrations * renumber migrations * renumber migrations again * add raw_timestamp column * fix test distributed storage set key * add granularity-specific retention day columns to raw table * Sum aggregate function instead of count. * remove storage set key creations * remove extra column and add max timestamp aggregation * add gauges storage set key to migration group * feat(rust): Update all dependencies in lockfile (#4892) * feat: Write received_p99 to commit log (#4872) This supports the subscriptions to opt into using received_p99 for scheduling instead of the current orig_message_ts Needs getsentry/arroyo#295 * Revert "feat: Write received_p99 to commit log (#4872)" This reverts commit c7db591. Co-authored-by: lynnagara <1779792+lynnagara@users.noreply.github.com> * initial pass, writable storage * fix output type for gauges messages * note on dlq * write path working * add entity, readable storage for querying * switch to mat view version2 * add tests for gauges processor * add gauges entity key * fix readable gauge storage schema * remove avg from gauges migration * Revert "feat: Write received_p99 to commit log (#4872)" This reverts commit c7db591. Co-authored-by: lynnagara <1779792+lynnagara@users.noreply.github.com> * ref(subscriptions): Move --delay-seconds from CLI arg to yaml definition (#4915) The main motivations for this are: 1. The amount of delay depends on the synchronization timestamp used, and this is defined at the storage level in code. For example if "orig_message_ts" is used, a longer delay will be applied than if "received_p99" is used, since received will be set earlier in the pipeline. 2. The same CLI args get applied in all Sentry deployments, and this makes it easier to keep them in sync 3. Rolling out different values per storage via CLI will probably break some of our templates and require too much rework. There are no functional changes here since we have 60 configured everywhere right now. * feat(rust): Add strategy that does json schema validation (#4901) * spans: add profile_id to tests (#4827) * add test for spans profile_id and fix bug where test was dependent on local timezone * test: Refactor API tests to not reference sessions so it can be removed (#4920) * fix merge conflict * remove avgs support in dataset (storage, entity) and processor * fix aggregate function in entity * add some comments --------- Co-authored-by: Lyn Nagara <lyn.nagara@gmail.com> Co-authored-by: getsentry-bot <bot@sentry.io> Co-authored-by: lynnagara <1779792+lynnagara@users.noreply.github.com> Co-authored-by: Dalitso Banda <dalitso.banda@sentry.io>
getsentry · Nov 1, 2023 · 62f2192 · 62f2192
1 parent ae43829
commit 62f2192
Show file tree

Hide file tree

Showing 9 changed files with 651 additions and 4 deletions.
diff --git a/snuba/cli/devserver.py b/snuba/cli/devserver.py
@@ -228,6 +228,16 @@ def devserver(*, bootstrap: bool, workers: bool) -> None:
                     *COMMON_CONSUMER_DEV_OPTIONS,
                 ],
             ),
+            (
+                "generic-metrics-gauges-consumer",
+                [
+                    "snuba",
+                    "consumer",
+                    "--storage=generic_metrics_gauges_raw",
+                    "--consumer-group=snuba-gen-metrics-gauges-consumers",
+                    *COMMON_CONSUMER_DEV_OPTIONS,
+                ],
+            ),
         ]
         if settings.ENABLE_METRICS_SUBSCRIPTIONS:
             if settings.SEPARATE_SCHEDULER_EXECUTOR_SUBSCRIPTIONS_DEV:

diff --git a/snuba/datasets/configuration/generic_metrics/dataset.yaml b/snuba/datasets/configuration/generic_metrics/dataset.yaml
@@ -7,3 +7,4 @@ entities:
   - generic_metrics_distributions
   - generic_metrics_counters
   - generic_org_metrics_counters
+  - generic_metrics_gauges
diff --git a/snuba/datasets/configuration/generic_metrics/entities/gauges.yaml b/snuba/datasets/configuration/generic_metrics/entities/gauges.yaml
@@ -0,0 +1,255 @@
+version: v1
+kind: entity
+name: generic_metrics_gauges
+
+schema:
+  [
+    { name: org_id, type: UInt, args: { size: 64 } },
+    { name: project_id, type: UInt, args: { size: 64 } },
+    { name: metric_id, type: UInt, args: { size: 64 } },
+    { name: rounded_timestamp, type: DateTime },
+    { name: bucketed_time, type: DateTime },
+    {
+      name: tags,
+      type: Nested,
+      args:
+        {
+          subcolumns:
+            [
+              { name: key, type: UInt, args: { size: 64 } },
+              { name: value, type: UInt, args: { size: 64 } },
+            ],
+        },
+    },
+    {
+      name: min,
+      type: AggregateFunction,
+      args: { func: min, arg_types: [{ type: Float, args: { size: 64 } }] },
+    },
+    {
+      name: max,
+      type: AggregateFunction,
+      args: { func: max, arg_types: [{ type: Float, args: { size: 64 } }] },
+    },
+    {
+      name: sum,
+      type: AggregateFunction,
+      args: { func: sum, arg_types: [{ type: Float, args: { size: 64 } }] },
+    },
+    {
+      name: count,
+      type: AggregateFunction,
+      args: { func: count, arg_types: [{ type: UInt, args: { size: 64 } }] },
+    },
+    {
+      name: last,
+      type: AggregateFunction,
+      args: { func: count, arg_types: [{ type: Float, args: { size: 64 } }] },
+    },
+  ]
+
+storages:
+  - storage: generic_metrics_gauges
+    translation_mappers:
+      functions:
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: min
+            to_name: minMerge
+            aggr_col_name: min
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: minIf
+            to_name: minMergeIf
+            aggr_col_name: min
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: max
+            to_name: maxMerge
+            aggr_col_name: max
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: maxIf
+            to_name: maxMergeIf
+            aggr_col_name: max
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: sum
+            to_name: sumMerge
+            aggr_col_name: sum
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: sumIf
+            to_name: sumMergeIf
+            aggr_col_name: sum
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: count
+            to_name: sumMerge
+            aggr_col_name: count
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: countIf
+            to_name: sumMergeIf
+            aggr_col_name: count
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: last
+            to_name: argMaxMerge
+            aggr_col_name: last
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: lastIf
+            to_name: argMaxMergeIf
+            aggr_col_name: last
+      subscriptables:
+        - mapper: SubscriptableMapper
+          args:
+            from_column_table:
+            from_column_name: tags_raw
+            to_nested_col_table:
+            to_nested_col_name: tags
+            value_subcolumn_name: raw_value
+        - mapper: SubscriptableMapper
+          args:
+            from_column_table:
+            from_column_name: tags
+            to_nested_col_table:
+            to_nested_col_name: tags
+            value_subcolumn_name: indexed_value
+  - storage: generic_metrics_gauges_raw
+    is_writable: true
+    translation_mappers:
+      functions:
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: min
+            to_name: minMerge
+            aggr_col_name: min
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: minIf
+            to_name: minMergeIf
+            aggr_col_name: min
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: max
+            to_name: maxMerge
+            aggr_col_name: max
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: maxIf
+            to_name: maxMergeIf
+            aggr_col_name: max
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: sum
+            to_name: sumMerge
+            aggr_col_name: sum
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: sumIf
+            to_name: sumMergeIf
+            aggr_col_name: sum
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: count
+            to_name: sumMerge
+            aggr_col_name: count
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: countIf
+            to_name: sumMergeIf
+            aggr_col_name: count
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: last
+            to_name: argMaxMerge
+            aggr_col_name: last
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: last
+            to_name: argMaxMerge
+            aggr_col_name: last
+        - mapper: AggregateFunctionMapper
+          args:
+            column_to_map: value
+            from_name: lastIf
+            to_name: argMaxMergeIf
+            aggr_col_name: last
+      subscriptables:
+        - mapper: SubscriptableMapper
+          args:
+            from_column_table:
+            from_column_name: tags_raw
+            to_nested_col_table:
+            to_nested_col_name: tags
+            value_subcolumn_name: raw_value
+        - mapper: SubscriptableMapper
+          args:
+            from_column_table:
+            from_column_name: tags
+            to_nested_col_table:
+            to_nested_col_name: tags
+            value_subcolumn_name: indexed_value
+
+storage_selector:
+  selector: SimpleQueryStorageSelector
+  args:
+    storage: generic_metrics_gauges
+
+query_processors:
+  - processor: TagsTypeTransformer
+  - processor: MappedGranularityProcessor
+    args:
+      accepted_granularities:
+        10: 0
+        60: 1
+        3600: 2
+        86400: 3
+      default_granularity: 1
+  - processor: TimeSeriesProcessor
+    args:
+      time_group_columns:
+        bucketed_time: rounded_timestamp
+      time_parse_columns: [rounded_timestamp]
+  - processor: ReferrerRateLimiterProcessor
+  - processor: OrganizationRateLimiterProcessor
+    args:
+      org_column: org_id
+  - processor: ProjectReferrerRateLimiter
+    args:
+      project_column: project_id
+  - processor: ProjectRateLimiterProcessor
+    args:
+      project_column: project_id
+  - processor: ResourceQuotaProcessor
+    args:
+      project_field: project_id
+
+validators:
+  - validator: EntityRequiredColumnValidator
+    args:
+      required_filter_columns: ["org_id", "project_id"]
+required_time_column: rounded_timestamp
+partition_key_column_name: org_id