-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EBT] Validate event&context payloads in dev-mode #135061
Conversation
// debug-logging before checking the opt-in status to help during development | ||
if (this.initContext.isDev) { | ||
// TODO: In the future we may need to validate the eventData based on the eventType's registered schema (only if isDev) | ||
this.initContext.logger.debug(`Report event "${eventType}"`); | ||
// If in the browser, log the event as such (to make it easier to debug in the browser's console). | ||
// It needs the casting because the logger types expect debug to be a string | ||
// If in the server, stringify to avoid printing [object Object] | ||
this.initContext.logger.debug( | ||
global.window ? (event as unknown as string) : JSON.stringify(event) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to the original issue: I took advantage of this PR to add a debug logger that will print the events. It will help developers to check how their events look like when they implement them :)
d15727a
to
4188021
Compare
4188021
to
2ab26cb
Compare
*/ | ||
type: TelemetryCounterType; | ||
type: keyof typeof TelemetryCounterType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list the potential values from the enum so users can set the values if they want to without importing the enum
. This reduces bundle size because there is no need to import the @kbn/analytics-client
runtime code only for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely better to not use an enum ihmo. but the proper way is to change the TelemetryCounterType
implementation to no longer be an enum instead of using the keyof typeof
trick
type TelemetryCounterType = 'enqueued' | ... | ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose this approach mostly for documentation purposes. The type
format is harder to have documentation for each potential value, while enum
seems more straightforward.
But happy to remove the enum
. I bet it will also save some bundle size 😇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 51f6fb6
(#135061)
@@ -39,6 +40,7 @@ import { TelemetryCounterType } from '../events'; | |||
import { ShippersRegistry } from './shippers_registry'; | |||
import { OptInConfigService } from './opt_in_config'; | |||
import { ContextService } from './context_service'; | |||
import { schemaToIoTs, validateSchema } from '../schema/validation'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried importing this conditionally and via await import
(inside the constructor
, not a top-level import 😅) and it still loads it in the initial chunk of core
... Could it be that the current Bazel-based packaging system doesn't understand how to break packages into chunks when the dynamic import occurs inside a package?
cc @elastic/kibana-operations
In any case, I think we can apply this improvement to reduce bundle size in production in a follow up PR, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inside the constructor
How can you await
from within a non-async method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let potentiallyUndefinedAPI: typeof API | undefined;
...
import(path).then((api) => {
potentiallyUndefinedAPI = api;
});
Pinging @elastic/kibana-core (Team:Core) |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we don't want to enable validation and logging in prod? I see value in having failed validation log messages in prod but understand there might be other consequences, such as end users not being able to do anything about "buggy" events.
packages/analytics/client/src/analytics_client/analytics_client.ts
Outdated
Show resolved
Hide resolved
packages/analytics/client/src/analytics_client/analytics_client.ts
Outdated
Show resolved
Hide resolved
} | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean we return true
regardless of the outcome from validation? If that's the case, all we're getting from a failed validation is an error log message but we still send the event. Am I reading this right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind the return false;
inside the catch
in line 54 😇
|
||
import * as t from 'io-ts'; | ||
import type { RootSchema, SchemaArray, SchemaObject, SchemaValue } from '../types'; | ||
import { excess } from './excess'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how we're supposed to interpret excess
here? Just glancing at the file name doesn't give me an idea of what to expect. Is excess
extra stuff/extra validation/data overflow? I know I'm being pedantic and naming is hard I'm just trying to future-proof ourselves in three years when we can't remember what was supposed to be in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason I called it excess
is because that was the name folks in the io-ts
library referred to it in the original issue where I found the code I based my solution upon: gcanti/io-ts#322
Happy to rename it if you think it's worth it. What would that name be?
packages/analytics/client/README.md
Outdated
|
||
Apart from documentation, the schema is used to validate the payload during the dev-cycle. | ||
This way it adds an extra layer of confidence over the data to be sent. | ||
The validation, however, is disabled in production because users cannot do anything to fix the bug after it is released. Additionally, receiving _buggy_ events can be considered an additional insight into how our users use our products. For example: the buggy event can be caused by a user following an unexpected path in the UI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using validation for two different purposes depending on if we're in dev or prod mode. In dev, it helps us ensure we're sending payloads we expect but in prod it seems that we're using validation failures for something else.
Are we sure we want to use validation for two distinct purposes?
If debug logging isn't already enabled in a cluster and there are issues with that cluster, how are we going to know to enable debug logs for the analytics logger? Are there other symptoms that would make one enable debug logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention with this paragraph was to explain why we only want to validate events in dev mode and never in production.
In production, it's OK to get malformed events because that highlights an unexpected bug in our logic or a corner case we didn't account for and we should allow the event through so we know about it. A good example is the file upload you fixed a couple of months ago: if we had EBT back then, we'd report the filename and import options. The developer tests didn't try hitting the import button without selecting a file, so the EBT validation in dev-mode would have accepted filename as a string. However, in production, getting filename
as undefined or null
, would have highlighted the issue to us via telemetry. If we validate and skip the events in production, we miss those insights.
I'll see if I can reword it to make it clearer because it's obvious it's not 😇
packages/analytics/client/src/analytics_client/analytics_client.test.ts
Outdated
Show resolved
Hide resolved
context as Record<string, unknown> | ||
); | ||
} catch (validationError) { | ||
this.logger.error(validationError); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not going to see these log messages in prod mode. Is that what we want? From the readme, we say that receiving a "buggy" event lets us interpret what happened and we can take that further by having access to these logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The agreement in the design phase was not to run validations in production because:
- We don't want the additional noise in the logs
- We don't want to add the validation overhead to production environments
- Invalid events are also a form of telemetry (they tell us there's an edge case we haven't considered in our testing).
- We don't collect browser logs, so even if we log them, we'd only benefit from those buggy events generated in the server and on Cloud. I don't think it's worth the overhead.
What do you think?
*/ | ||
type: TelemetryCounterType; | ||
type: keyof typeof TelemetryCounterType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely better to not use an enum ihmo. but the proper way is to change the TelemetryCounterType
implementation to no longer be an enum instead of using the keyof typeof
trick
type TelemetryCounterType = 'enqueued' | ... | ...
@@ -39,6 +40,7 @@ import { TelemetryCounterType } from '../events'; | |||
import { ShippersRegistry } from './shippers_registry'; | |||
import { OptInConfigService } from './opt_in_config'; | |||
import { ContextService } from './context_service'; | |||
import { schemaToIoTs, validateSchema } from '../schema/validation'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inside the constructor
How can you await
from within a non-async method?
packages/analytics/client/src/analytics_client/analytics_client.ts
Outdated
Show resolved
Hide resolved
context$.next(undefined); | ||
expect(logger.error).toHaveBeenCalledTimes(1); | ||
expect((logger.error.mock.calls[0][0] as Error).message).toContain( | ||
`Failed to validate payload coming from "Context Provider 'contextProviderA'"` | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ihmo this is bad testing isolation. The client tests should mock validateSchema
and only asserts that the function is properly called and that the system behaves correctly depending on the validateSchema
's response/behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move these tests to a new context_service.test.ts
file. I think they make more sense there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 25c8014
(#135061)
export const excess = <C extends t.HasProps>( | ||
codec: C, | ||
name: string = getExcessTypeName(codec) | ||
): ExcessType<C> => { | ||
const props: t.Props = getProps(codec); | ||
return new ExcessType<C>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: Are we sure there isn't a 3rd party lib for this?
If there's not, this should probably go to a kbn-io-ts-utils
package of something (also I think other teams already have such excess
implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: Are we sure there isn't a 3rd party lib for this?
I couldn't find any 😢
If there's not, this should probably go to a
kbn-io-ts-utils
package of something (also I think other teams already have suchexcess
implementation
We can't move it at the moment because that library imports @kbn/config-schema
and we could leak joi
to the browser 😞
@elasticmachine merge upstream |
e34085d
to
16dcd7e
Compare
16dcd7e
to
51f6fb6
Compare
@elasticmachine merge upstream |
💛 Build succeeded, but was flakyFailed CI StepsMetrics [docs]Module Count
Async chunks
Page load bundle
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@TinaHeiligers I think I addressed your comments, so I'll go ahead and merge so this task doesn't fall to our next sprint as incomplete. Please, let me know if I missed anything and I'll fix it in a follow-up PR :) |
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Summary
This PR uses the defined schemas in the event types' and context providers' registration steps to validate the actual payloads sent.
The validation only applies when running Kibana in dev-mode to highlight any errors developers may incur when iterating over the structure of the event and failing to keep the schema up-to-date. In production, the validation is disabled because the users can't do anything to fix it once a potential bug is released.
Resolves #132250.
Checklist
Delete any items that are not applicable to this PR.
For maintainers