You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When on-disk and on-wire formats change as part of an upgrade we are fairly disciplined about writing unit tests that verify the immediately preceding version of the format is handled correctly (e.g. it is read and decoded successfully). As recently pointed out by @jcsp this is inherently error prone because the same code that is used to test backwards compatibility is also the same code that generates the test data.
Because of this we often pair the units tests with a manual process in which we simulate an upgrade on top of a data directory that has been populated by an older version of redpanda via some sort of workload that targets the particular data format being changed. By virtue of being a manual process this is also error prone, but also time consuming and tedious.
What we need is tooling and processes that let us test compatibility (1) of data artifacts derived from a specific version of code base and (2) across versions according to a support policy.
Data source
Where we get data is important because it establishes a source of truth. Generally we will generate test data as part of the normal release process and store this data along side metadata such as the date, version, and any other salient input to the process.
Scraped off disk and from the wire as a side effect of running a workload
The data should be provide a good covering set, but the actual volume of data is less important. This data needs to be available in the context of tests that verify compatibility, so it is reasonable to store in a separate Git repository and make it available as a artifact dependency (e.g. 3rdparty.cmake).
Compatibility testing
Input to the testing process is:
The current version of redpanda being tested
Binary data from a range of previous versions
The range of input data is defined by a support policy (e.g. 1 major version back), and the input data itself is taken from the data repository populated by the process outlined above in Data source.
There are two types of compatibility testing that can be performed:
Data can be decoded (e.g. valid crc)
The decoded content is correct
Of these two tests the second is the most challenging: once data is decoded, what is the source of truth that should be used to compare the content against? When decoding into a type in which fields were added and removed, how does content verification change?
In pseudo-code, we need a mechanism for populating the value EXPECTED on the right hand side of the assertion. Of course in practice construction of EXPECTED may be more involved than this because v2 may have added or remove fields, so authoring a test may involve some manual work.
One option that works for synthetically derived test data is to use a random generator and store the seed value so that we can regenerate EXPECTED. This may be a decent compromise in complexity, but it would seem to at least partially suffer from the original problem: old code most be maintained that can regenerate old versions of data structures.
a trade-off for this issue might be to maintain the older versions of structures and encoders, but verify the same binary data is generated as the input data.
One option that works for data from all input sources and doesn't suffer from (or least not as much) the above problem is to store the input data in two representations (e.g. the binary data and a json representation). In this case we maintain support only for decoding old versions, and we rely on out-of-band json representation of the original data as the source of values used in verification.
Testing framework
When data formats for some on-wire or on-disk data changes, it is added to the test framework. The value of the test framework is primarily for automating the selection of data sources based on a support policy and making that data available to unit tests for each individual data type. The extent to testing can be automated for a given type is an open question, but currently is not particularly onerous given the small amount of cases we have to deal with.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
When on-disk and on-wire formats change as part of an upgrade we are fairly disciplined about writing unit tests that verify the immediately preceding version of the format is handled correctly (e.g. it is read and decoded successfully). As recently pointed out by @jcsp this is inherently error prone because the same code that is used to test backwards compatibility is also the same code that generates the test data.
Because of this we often pair the units tests with a manual process in which we simulate an upgrade on top of a data directory that has been populated by an older version of redpanda via some sort of workload that targets the particular data format being changed. By virtue of being a manual process this is also error prone, but also time consuming and tedious.
What we need is tooling and processes that let us test compatibility (1) of data artifacts derived from a specific version of code base and (2) across versions according to a support policy.
Data source
Where we get data is important because it establishes a source of truth. Generally we will generate test data as part of the normal release process and store this data along side metadata such as the date, version, and any other salient input to the process.
my_on_disk_state::generate_test_case()
)The data should be provide a good covering set, but the actual volume of data is less important. This data needs to be available in the context of tests that verify compatibility, so it is reasonable to store in a separate Git repository and make it available as a artifact dependency (e.g. 3rdparty.cmake).
Compatibility testing
Input to the testing process is:
The range of input data is defined by a support policy (e.g. 1 major version back), and the input data itself is taken from the data repository populated by the process outlined above in Data source.
There are two types of compatibility testing that can be performed:
Of these two tests the second is the most challenging: once data is decoded, what is the source of truth that should be used to compare the content against? When decoding into a type in which fields were added and removed, how does content verification change?
In pseudo-code, we need a mechanism for populating the value
EXPECTED
on the right hand side of the assertion. Of course in practice construction ofEXPECTED
may be more involved than this becausev2
may have added or remove fields, so authoring a test may involve some manual work.One option that works for synthetically derived test data is to use a random generator and store the seed value so that we can regenerate
EXPECTED
. This may be a decent compromise in complexity, but it would seem to at least partially suffer from the original problem: old code most be maintained that can regenerate old versions of data structures.a trade-off for this issue might be to maintain the older versions of structures and encoders, but verify the same binary data is generated as the input data.
One option that works for data from all input sources and doesn't suffer from (or least not as much) the above problem is to store the input data in two representations (e.g. the binary data and a json representation). In this case we maintain support only for decoding old versions, and we rely on out-of-band json representation of the original data as the source of values used in verification.
Testing framework
When data formats for some on-wire or on-disk data changes, it is added to the test framework. The value of the test framework is primarily for automating the selection of data sources based on a support policy and making that data available to unit tests for each individual data type. The extent to testing can be automated for a given type is an open question, but currently is not particularly onerous given the small amount of cases we have to deal with.
Beta Was this translation helpful? Give feedback.
All reactions