Conversation
|
how come we're removing it entirely? I think it might still come in handy for cases of latency we might not know about when we add newer features |
|
We discussed this with @jakmeier and the team during the MPC Sync. Our focus is shifting toward component tests. If we want to test delays, we should design explicit tests for that purpose instead of relying on random delays introduced by tools like Toxiproxy. Overall, after using Toxiproxy for over six months, it hasn’t helped us uncover a single issue. We shouldn’t keep things “just in case.” |
|
So, just because it hasn't been an issue doesn't mean it won't be an issue in the future. The whole point of it is that if we end up introducing something that takes an absurd amount of time and everything flakes out on the end where we have more latency than expected. If anything, I would like to keep it until we have an alternative where we know that increases in latency is caught and does not dramatically affect our network. |
|
We will catch that on our DEV network, especially because we have an increased number of nodes there and related metrics. |
|
But that is a much more manual check, and it sometimes becomes a guessing game of what the actual issue is. I'd prefer if these tests actually covered that case for now, because then I can concretely say it's latency, and devnet can back that claim up if we see it there |
|
I'm okay with keeping it around until we have component-level testing to a degree where it can handle such cases at least as good as toxiproxy. Long-term, I argue it's better to remove this tool from the testing suite. The random nature of it makes it a bad fit for tests that run in CI, where we should have quick and deterministic tests only. Stretching that ideal determinism requirement, with component-level testing we can have test cases with relatively long (but not random) latencies on messages, which I think is more valuable than what toxiproxy offers and deterministic enough to be in CI tests. If we want heavier tests, we can add a fuzzer on top of component-based testing. Such a setup can also explore different timings on messages and probably more efficiently than random delays ever could. Personally, I don't see much value in random latency testing on top of what component-level testing can provide. Maybe I could see toxiproxy making sense in a heavier system-level testing setup. It's just that the closer we are to a fully integrated system, the fewer testing tools we should add to the mix. The idea being, that we want to test as close as possible to the production set up. |
No description provided.