-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(processor): enhance reports to hold transformation and tracking plan metrics #3138
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #3138 +/- ##
==========================================
+ Coverage 51.26% 51.33% +0.07%
==========================================
Files 311 311
Lines 52463 52637 +174
==========================================
+ Hits 26893 27022 +129
- Misses 23943 23979 +36
- Partials 1627 1636 +9
... and 4 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with minor comments
utils/types/reporting_types.go
Outdated
TransformationId string `json:"transformationId"` | ||
TransformationVersionId string `json:"transformationVersionId"` | ||
TrackingPlanId string `json:"trackingPlanId"` | ||
TrackingPlanVersion int `json:"trackingPlanVersion"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] to keep with the idiomatic naming of go Id
-> ID
:
TransformationId string `json:"transformationId"` | |
TransformationVersionId string `json:"transformationVersionId"` | |
TrackingPlanId string `json:"trackingPlanId"` | |
TrackingPlanVersion int `json:"trackingPlanVersion"` | |
TransformationID string `json:"transformationId"` | |
TransformationVersionID string `json:"transformationVersionId"` | |
TrackingPlanID string `json:"trackingPlanId"` | |
TrackingPlanVersion int `json:"trackingPlanVersion"` |
I would propose the same for:
SourceDefinitionId string `json:"sourceDefinitionId"`
DestinationDefinitionId string `string:"destinationDefinitionId"`
SourceCategory string `json:"sourceCategory"`
```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
converted Id to ID format for newly added properties.
@@ -99,7 +113,7 @@ type PUReportedMetric struct { | |||
StatusDetail *StatusDetail | |||
} | |||
|
|||
func CreateConnectionDetail(sid, did, strid, sjid, sjrid, sdid, ddid, sc string) *ConnectionDetails { | |||
func CreateConnectionDetail(sid, did, strid, sjid, sjrid, sdid, ddid, sc, trid, trvid, tpid string, tpv int) *ConnectionDetails { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest against using this function and directly define: ConnectionDetails
.
Having functions with more than 4 arguments is considered a bad practice. When dealing with many arguments is hard to understand which parameter has much with which arguments—making writing and reading harder. The struct, where you can provide the name of the fields is a better option.
utils/types/reporting_types.go
Outdated
SUCCEEDED = "succeeded" | ||
SUCCEEDED_WITHOUT_VIOLATIONS = "succeeded_without_violations" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the difference between succeeded
and succeeded_without_violations
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In trackingplan stage, there will be no status SUCCEEDED unlike other stages.
SUCCEEDED
is split into SUCCEEDED_WITHOUT_VIOLATIONS
+ SUCCEEDED_WITH_VIOLATIONS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we skip SUCCEEDED_WITHOUT_VIOLATIONS
? If it succeeded, it succeeded. If there are violations, I would also skip the SUCCEEDED
part and use something more straightforward, e.g. VIOLATIONS_FOUND
, since what you are interested in is mostly that violations were found, not that we are ignoring these violations by considering the events as succeeded and letting them flow further in the pipeline. This, from what I understand is about to change in the future anyway, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All events sent to tracking plan stage can be categorized into 3 disjoint sets SUCCEEDED_WITHOUT_VIOLATIONS
+ SUCCEEDED_WITH_VIOLATIONS
+ ABORTED (due to violations)
. User is more interested in VIOLATIONS_FOUND (SUCCEEDED_WITH_VIOLATIONS+ ABORTED)
and ABORTED
.Also User might ask for only succeeded without violations. In view of this, 3 disjoint status data are sent to reporting, graphs shown in UI can be constructed based on user requirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Letting events succeed with violations is a configuration setting given to user. User might this setting always without dropping events.
messages = append(messages, eventsByMessageID[userTransformedEvent.Metadata.MessageID].SingularEvent) | ||
} | ||
|
||
for _, message := range messages { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why are we repeating this for every message? The only thing the message affects is the sample payload. Would it be more efficient to increment counters in bulk and use a random sample?
@@ -1289,10 +1370,10 @@ func (proc *Handle) processJobsForDest(partition string, subJobs subJob, parsedE | |||
inCountMap := make(map[string]int64) | |||
inCountMetadataMap := make(map[string]MetricMetadata) | |||
connectionDetailsMap := make(map[string]*types.ConnectionDetails) | |||
statusDetailsMap := make(map[string]*types.StatusDetail) | |||
statusDetailsMap := make(map[string]map[string]*types.StatusDetail) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of key does this second map level holds? I find it hard to understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second map holds this key:fmt.Sprintf("%s:%d:%s:%s:%s", status, event.StatusCode, eventName, eventType, violationErrorType)
For each violationType, one report needs to be sent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also appear to be capturing another, aggregate report too for all types :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, complexity is ever growing.
I think we should plan for complete refactor of reporting metrics
utils/types/reporting_types.go
Outdated
SUCCEEDED = "succeeded" | ||
SUCCEEDED_WITHOUT_VIOLATIONS = "succeeded_without_violations" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we skip SUCCEEDED_WITHOUT_VIOLATIONS
? If it succeeded, it succeeded. If there are violations, I would also skip the SUCCEEDED
part and use something more straightforward, e.g. VIOLATIONS_FOUND
, since what you are interested in is mostly that violations were found, not that we are ignoring these violations by considering the events as succeeded and letting them flow further in the pipeline. This, from what I understand is about to change in the future anyway, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collection of reporting metrics through metric maps was already a complicated affair. Unfortunately, now it appears to be even more fragile, complex and hard to follow, comprehend or reason about.
With the current design, I fear that every time there is a need to capture even more granular statistics, the complexity will grow exponentially.
@@ -1289,10 +1370,10 @@ func (proc *Handle) processJobsForDest(partition string, subJobs subJob, parsedE | |||
inCountMap := make(map[string]int64) | |||
inCountMetadataMap := make(map[string]MetricMetadata) | |||
connectionDetailsMap := make(map[string]*types.ConnectionDetails) | |||
statusDetailsMap := make(map[string]*types.StatusDetail) | |||
statusDetailsMap := make(map[string]map[string]*types.StatusDetail) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also appear to be capturing another, aggregate report too for all types :/
Description
Notion Ticket
https://www.notion.so/rudderstacks/Send-violation-information-tp-id-version-violation-message-payload-c213bd096b1744189ce570d33026cdd2
https://www.notion.so/rudderstacks/Send-Transformation-Id-information-in-report-metrics-912964eb39144675b8905639e471c8a8?pvs=4
Security