-
Notifications
You must be signed in to change notification settings - Fork 16
feat: add support for dropping late arrival data #306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #306 +/- ##
============================================
+ Coverage 79.70% 79.77% +0.07%
- Complexity 1297 1302 +5
============================================
Files 118 118
Lines 5159 5177 +18
Branches 467 469 +2
============================================
+ Hits 4112 4130 +18
Misses 837 837
Partials 210 210
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
This comment has been minimized.
This comment has been minimized.
|
|
||
| private Duration configureLateArrivalThreshold(Config jobConfig) { | ||
| Duration configuredThreshold = jobConfig.getDuration(LATE_ARRIVAL_THRESHOLD_CONFIG_KEY); | ||
| Duration minThreshold = Duration.of(30, ChronoUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was suggested by @laxman-traceable . I think, he has suggestions for max value too. But, I have kept it open.
| Duration spanArrivalDelay = | ||
| Duration.of(Math.abs(spanProcessedTime - spanStartTime), ChronoUnit.MILLIS); | ||
|
|
||
| if (spanStartTime > 0 && spanArrivalDelay.compareTo(lateArrivalThresholdDuration) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition spanStartTime > 0 is required as default proto value for the field is 0. So, if there are spans when it is not set value, they are considered valid for backward compatibility.
| String tenantId = "tenant-" + random.nextLong(); | ||
| Map<String, Object> configs = new HashMap<>(getCommonConfig()); | ||
| configs.putAll(Map.of("processor", Map.of())); | ||
| configs.putAll(Map.of("processor", Map.of("late.arrival.threshold.duration", "1d"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the config - late.arrival.threshold.duration - is part of the mandatory config, need to update all existing tests.
| private Duration configureLateArrivalThreshold(Config jobConfig) { | ||
| Duration configuredThreshold = jobConfig.getDuration(LATE_ARRIVAL_THRESHOLD_CONFIG_KEY); | ||
| Duration minThreshold = Duration.of(30, ChronoUnit.SECONDS); | ||
| if (minThreshold.compareTo(configuredThreshold) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a mandatory config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @laxman-traceable suggested that.
|
|
||
| private Duration configureLateArrivalThreshold(Config jobConfig) { | ||
| Duration configuredThreshold = jobConfig.getDuration(LATE_ARRIVAL_THRESHOLD_CONFIG_KEY); | ||
| Duration minThreshold = Duration.of(30, ChronoUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can be defined as private static final minLateArrivalThreshold = Duration.ofSeconds(30);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Description
There are cases when data received by the platform can be late. This can cause problems in an underlying store like pinot, etc.
In either case, we needed a configuration to drop late coming data.
As part of this PR, we are adding the below configuration at the span-normalizer component.
The above config tells that if the span's
start_time_millisand span received time atspan-normalizeris higher than 5m, it will be dropped.Testing
Checklist: