Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply a fixed window before writing row metrics #590

Conversation

davidheryanto
Copy link
Collaborator

What this PR does / why we need it:
Apply a fixed window and send the aggregate Feature Row metrics vs sending all the Feature Row metrics directly. This is so that the metrics collector is not overwhelmed and start dropping metrics.

Which issue(s) this PR fixes:

Fixes #528

Does this PR introduce a user-facing change?:
If Telegraf is currently used to export the StatsD metric to Prometheus metric, the names of the Promethes metrics generated are changed:

- feast_ingestion_feature_row_lag_ms_90_percentile ->  
  feast_ingestion_feature_row_lag_ms_percentile_90
- feast_ingestion_feature_row_lag_ms_99_percentile ->  
  feast_ingestion_feature_row_lag_ms_percentile_99
...

This is so that it is consistent with metric name for the feature value: feature_value_percentile_90, feature_value_percentile_99 i.e. percentile_x rather than x_percentile

feature_row_event_time_epoch_ms metric is no longer written to StatsD since this metrics is rarely used from our experience, the lag metrics seems to suffice. This also helps reduce the amount of metrics sent.

In summary these are the Feature Row StatsD metrics written at every fixed window:

Gauge:

  • feature_row_lag_ms_min
  • feature_row_lag_ms_max
  • feature_row_lag_ms_mean
  • feature_row_lag_ms_percentile_90
  • feature_row_lag_ms_percentile_95
  • feature_row_lag_ms_percentile_99
  • feature_value_lag_ms_min
  • feature_value_lag_ms_max
  • feature_value_lag_ms_mean
  • feature_value_lag_ms_percentile_90
  • feature_value_lag_ms_percentile_95
  • feature_value_lag_ms_percentile_99

Count:

  • feature_row_ingested_count
  • feature_value_missing_count

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidheryanto

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@woop
Copy link
Member

woop commented Mar 31, 2020

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prevent Beam jobs from overloading StatsD
3 participants