This is another action-packed release from the Testground team. A solid step in our path towards creating a robust and delightful platform for testing distributed systems at all scales! Read on to learn what's new.
Highlighted v0.2.0 features
- Testground now supports the
testground healthcheck --runner <runner_id> [--fix]command, which automates the verification that all preconditions for the runner to operate properly are met.
- This includes things like Redis processes/containers, sidecar containers, directories, etc.
--fixswitch will attempt automatic healing of healthchecks that fail.
testground runimplicitly performs a healthcheck on the target runner, before scheduling the run.
- For now, the
local:dockerrunners support healthchecks, with
cluster:k8sjoining the party soon.
- Builders will also support healthchecks in the near future.
- Testground now supports the
testground terminate --runner <runner_id>command, which destroys the environment of a runner, including all started test jobs/containers, as well as precondition containers (Redis, sidecar, etc.)
- In the future, this command will be more flexible, allowing the user to indicate the scope of the destruction (#611).
- Supported runners:
🧩 FEATURE: Manual build selectors.
- Testground composition files can now specify selectors to be applied to each build. These translate to build tags in Go builds, and can be used to construct shims for funnel-shaped wildcard test plans, such that a single test plan can target a variety of upstream dependency versions with changing APIs.
- Read more in the docs/EVOLVING_APIs.md design doc.
- In the future, we will introduce automatic build selectors, in the manner described in the above doc.
testground runcommand now supports the
--collect-intoflags that automatically perform output collection (i.e.
testground collect) into an archive with the run ID name (
--collect), or a user-specified file (
- The Kubernetes run environment now bundles Prometheus and pushgateway support, for test plans and other infrastructural components to be able to push metrics proactively.
- pprof enabled on sidecar.
- More improvements coming in this area soon.
Fixes and improvements
We have merged TONS of bug fixes and improvements in the sidecar, sync service, Redis scaling, cluster:k8s runner, k8s networking setup, S3, etc.
- IMPROVEMENTS (observability): measure when testground is ready and testplans are running (#560), sdk runtime metrics (#545), expose pprof port on sidecar (#558), use shared ssh key for kops hosts ; command to extract kube context (#551).
- FIXES (sidecar): listen for more Docker events, not just
stop(#580), free netlink handle (#575), sidecar errors: check for context canceled (#526), fix: remove a runenv hack in the sidecar (#503).
- FIXES (sync service): increase redis max clients (#578), cancel all subscriptions when closing the watcher (#576), use a shared redis client (#574), test redis address resolution, resolve redis once, abort early when we get too many results, improve context abort error, don't panic on error if we're canceling anyways, add pprof port to pods (#553), adjust redis config (#595), avoid logging spurious errors when we shutdown the watcher, feat: use contexts for the sync service (#456), buffer barrier channel (#516).
- FIXES (runtime): runenv: flush logger on close. (#518), feat: distinguish between runenv and run params.
- FIXES (AWS): aws ecr repo must be unique across regions (#536), fix pushing image to remote (aws ecr) (#493), extract S3_ENDPOINT in var, so that we can change region for bucket (#541), fix some minor issues with the S3 collection logic (#533).
- FIXES (k8s networking): wait for flannel initContainer (#563).
- FIXES (exec:go builder): don't hide output from go commands.
- FIXES (cluster:k8s runner): handle canceled context in cluster_k8s.go (#510), configurable pod resource requirements (#513), max allowed pods check in cluster_k8s.go (#509), extract outputs configs to toml configuration (#494).
- FIXES (docker:go builder): docker builder volume (#591).
- DOC: Proposal: dealing with upstream API evolution in test plans (#565).