Start Date: 2022-10-05
RFC Type: feature
RFC PR: #25

Summary

In this RFC the Quality Engineering team introduces SDK Payload Tester (will be referred to as "Payload Tester" in the rest of the document) -- a new testing service to ensure better validation and integration testing of payloads produced by Sentry SDKs.

Motivation

Sentry develops and maintains SDKs for many different languages and frameworks. Every SDK has a test suite consisting of dozens (hundreds) of tests that make sure that the SDK’s functionality remains stable (no regressions are introduced when new changes are made). A big portion of those tests are unit tests -- low-level tests that make sure that specific functions or modules of the SDK operate as they should. That type of tests, however, cannot always catch more high-level kinds of bugs/errors, leading to situations when the test suite passes, but produced event payload is not accepted or correctly parsed by Sentry ingestion (namely, Relay). Integration tests -- the kind of tests that are normally meant to identify those issues -- are not present in all SDK repositories, and might vary in their setup and efficiency.

Some examples of cases that are not usually caught by unit tests, but should be caught by integration tests:

The produced payload is not a valid Envelope or JSON object
SDK produces a valid payload body, but some HTTP headers are missing or incorrect
Invalid/unsupported compression method is used by the SDK

In order to ensure consistency in feature implementation and to facilitate the integration testing, we propose a suite of components that can be used for testing payloads produced by Sentry SDKs, and thus setting a common integration-level functionality baseline for all new and existing SDKs.

Requirements

The approach should be mostly SDK-agnostic.
Payload data to be checked/validated by the tester: entire HTTP request
- Envelope body (envelope format)
- HTTP headers
We want to be clear and consistent in what we expect from every SDK: what kind of event type should be produced, where it should be sent, and what are the errors that were generated by the given payload.
It should be easy to add new validation criteria/stages to the system.
The selected approach should be easily automated.

Recommended Approach: Multi-Stage Payload Validation Suite

Payload validation will consist of a series of tests/validations, all of them will have to pass to mark the payload as correct.

Components

The proposed Payload Tester consists of the following components:

Ingestion endpoint

This endpoint -- basically implementing a basic Relay ingestion API -- is a simple mock service that can be targeted by SDKs to send events. It should be able to capture the "raw" payload as sent by the SDK, i.e. without any changes to the headers or the body of the event.
Payload verifier

The payload verifier is the main component of the Payload Tester which is used to verify various "features" of the event payload. It contains a list of test stages that check various parts of a message.

Stages might include:
- Stage that performs the (JSON) schema validation
- Stage that does partial snapshotting of the given event part
- Sending the payload to a real Relay to make sure that no errors are triggered, and use Relay's capture mode to check for errors.
Every stage will return an OK/Error response, and if all stages pass, then the overall OK status is reported.
Test report generator

The report generator is a component that works in collaboration with the Payload Tester and reports a summary of a test session. The summary contains information about which facilities were tested and which tests failed. Looking at the report one can tell how much of the SDK is verified by the Payload Tester and which parts fail.
Echo service for generic integration testing

This is basically how a message that doesn't specify a test id is handled by the Payload Tester. It can be used in various integration tests to check different parts of the message including the http headers, the envelope headers, envelope item headers and item bodies. Items with known json bodies are parsed and returned as JSON objects. Various variants of this are already used in SDK integration tests (e.g: integration-test-server.py) but this allows a single version with whatever bells and whistles we wish to add available for all SDK developers.

Basic architecture:

In order to implement the more "in-depth" stages of the payload verifier, a set of helper utilities will be written that will facilitate common tasks. Among the tools we are considering are:

Creating a simplified specification for writing rules that could be checked against the body of a message (using something like JMESPath). These rules can then be added in a test to quickly test particular parts of the message.
Utilities to extract and query elements inside envelopes
Utilities to validate interrelated timestamps
Common utility functions for simplifying access to well known parts of the event (e.g. breadcrumb iterators, context accessors, span iterators, etc).

Pros:

Custom tests have the advantage that they can be very specific to the particular task they are designed to cover.
Multi-stage approach will allow to add new validation methods without changing the overall architecture.

Cons:

Writing and maintaining custom utilities will require more work than using "naive" approaches presented below.

Options Considered

Some alternative options are presented in this section.

Implementation language/platform

The QE team uses Python and Go as the primary languages to develop our internal tooling.

Option 1A: Python

Option 1B: Golang

The choice of using one or the other is normally determined by the two following concerns:

Availability of libraries that support the development of the tool

For the Payload Tester there are good options for both Python and Go since it only needs a good framework for REST tools and good JSON handling.
Desired performance for the tool

Performance is not the biggest issue for the Payload Tester.

From the point of view of ease of usage and deployment on the testing infrastructure, we will use Docker/Kubernetes, so both the Go and Python solutions have similar complexity.

Given the above, we propose to implement the Payload Tester in Python since it is the easiest solution.

Implementation approaches for payload testing

Option 2A: Naive Snapshots

The idea is to create a battery of event snapshots and testing all SDKs against them.

The workflow could look as follows: emitting events, manually verifying them to be correct and then using them as the “source of truth” by comparing future events generated by SDKs to the saved snapshots.

Pros:

The process of adding new test cases (snapshots) is easy and fast, or at least can be simplified with proper tooling. Take Jest’s approach, for example.
Tests that fail due to various changes in the message generation (format, new items...) can be instataneously fixed by accepting the chnanges (which sets the new message generated as the reference snapshot and fixes the failing test).

Cons

Events have a lot of information that is generated once and is not constant between events even when the events are constructed by the same API in the same conditions. The list of non-constant elements includes: timestamps (we have a lot of them), single-use IDs (like the event ID, span IDs, etc.) This information would have to be stripped from both the snapshot and the events, various events have various places where they create unique entries and a generic stripping method is complicated. Trying to do this negates one of the major advantages of snapshots, i.e. easy creation of snapshots and tests.
Naive snapshot testing is not DRY. A snapshot test will not distinguish between relevant and not relevant parts of a message, it will blindly compare the whole message leading to duplicated checks, this leads to maintenance and diagnostic problems detailed below.
- Maintaining lots of snapshots is tough: adding an attribute to the message, for example, would break a lot of snapshots tests, potentially all, this becomes a nightmare to maintain and it is very error prone, when there are a lot of broken tests people tend to regenerate all snapshots without paying a lot of attention to each individual difference in each broken test.
Snapshot testing does not always point directly to the problem. Since snapshot tests compare the whole message, a failed test does not necessarily indicate a problem in a particular area (say breadcrumbs) of the event, it just says that something has changed.
Common snapshots cannot be used across different SDKs. There are many valid ways to create semantically-identical messages: two SDKs may legitimately construct slightly different payloads for the same type of message: for example, by adding an SDK specific version, or sdk specific attributes. Therefore, using the same snapshot for testing all SDKs will not be a practical idea. Using different snapshots for different SDKs denies one of the main benefits of the Payload Tester which is having a common set of tests for all platforms.

Option 2B: JSON schema

Idea: we provide JSON schemas for resulting events, produced from selected test cases.

Pros:

Testing event shape is a good way to catch a certain range of errors: namely, when a change to the SDK suddenly changes the shape of well-known events.
There are a lot of existing validators written in all sorts of languages: https://json-schema.org/implementations.html

Cons:

Currently used Sentry event format -- envelope -- is not pure JSON. The whole envelope needs also to be tested and that falls outside the abilities of JSON schema.
JSON schema can test structure but not content. Checking that the structure of a JSON message is met is important but not sufficient, in many cases checking what is the information is also important.
JSON schema can’t be used to check for optional fields that should be included in a particular test case (unless we start inventing schemas specific to particular tests).
- If an object is specified (in a JSON schema) as having field “a” as number and field “b” as optional string, then both { “a”: 1, “b”:”x”} and {"a": 1} would validate as being correct messages. If we want to make sure that a particular test sets the “b” field, validating the object against the schema will not tell you anything. Since most of our fields are optional the schema validation doesn’t say a lot. We can, of course, devise another schema (that we would apply to the particular test) in which we would make “b” non optional, but that would mean that you would end up with multiple schemas for our tests that would have to be maintained. At this point it would be much easier to just look at the message for the particular filed.
JSON schemas also tend to suffer from the same generality problems of snapshots, they look at the whole message.

As a tool JSON schema can be useful and might be employed as one of the validation layers, but it is not sufficient.

Drawbacks

The SDK folks will have to take active part in implementing the tests and syncing with the QE team to align expectations and deliverables.

Unresolved questions

Will the approach require a lot of boilerplate code to be written/generated for every SDK?
Does Relay's capture mode have enough capabilities to be useful enough?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0025-sdk-payload.tester.md

0025-sdk-payload.tester.md

Summary

Motivation

Requirements

Recommended Approach: Multi-Stage Payload Validation Suite

Components

Options Considered

Implementation language/platform

Option 1A: Python

Option 1B: Golang

Implementation approaches for payload testing

Option 2A: Naive Snapshots

Option 2B: JSON schema

Drawbacks

Unresolved questions

Files

0025-sdk-payload.tester.md

Latest commit

History

0025-sdk-payload.tester.md

File metadata and controls

Summary

Motivation

Requirements

Recommended Approach: Multi-Stage Payload Validation Suite

Components

Options Considered

Implementation language/platform

Option 1A: Python

Option 1B: Golang

Implementation approaches for payload testing

Option 2A: Naive Snapshots

Option 2B: JSON schema

Drawbacks

Unresolved questions