Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instream drop on demand #1975

Open
wants to merge 14 commits into
base: develop
from
Open

Instream drop on demand #1975

wants to merge 14 commits into from

Conversation

@willemolding
Copy link
Contributor

willemolding commented Dec 13, 2019

PR summary

To be able to test the behavior of various tests a way to artificially induce failures at different layers of the network stack is required.

We propose using env vars which are picked up inside the in_stream crate to drop connections, messages or packets depending on the test requirement. The currently implementation only drops ws connections by ignoring incoming and outgoing messages for the duration of an error burst.

As this crate is to be used by both the sim2h worker and sim2h server we can simulate failures at either end with no extra work.

Dropping ws Connections

The env var WS_FAILURE_MODEL is a tuple (S, MTBF, MFD) where MTBF is the mean time between failures in ms, MFD is the mean failure duration in ms and $S$ is a seed for the random number generator to ensure repeatability.

The failure model follows a telegraph process which switched between being in an error state (burst) or an ok state. The time between errors and the error burst duration are exponentially distributed random variables independent of each other.

It should be possible to run any of the existing tests with this env var set to see how they operate under unreliable network conditions

Example:

WS_FAILURE_MODEL=(42, 2000, 100) hc-app-spec-test-sim2h

Would run the app spec tests with the client experiencing an error burst on average ever 2 seconds for an average of 100ms.

WS_FAILURE_MODEL=(42, 2000, 100) sim2h_server

Would run the sim2h server with the same failure rates

It is also integrated with the trycp server so calling the RPC method spawn with a failureModel object will set the env vars internally e.g.:

await ws.call('spawn', {"id": "my-player-failure", "failureModel": {"seed": 42, "MTBF": 100, "MFD": 100}})

followups

( any new tickets/concerns that were discovered or created during this work but aren't in scope for review here )

changelog

  • if this is a code change that effects some consumer (e.g. zome developers) of holochain core, then it has been added to our between-release changelog with the format
- summary of change [PR#1234](https://github.com/holochain/holochain-rust/pull/1234)

documentation

@@ -281,6 +298,10 @@ impl<Sub: InStreamStd> InStream<&mut WsFrame, WsFrame> for InStreamWss<Sub> {
}

fn read(&mut self, data: &mut WsFrame) -> Result<usize> {
if self.is_in_failure() {

This comment has been minimized.

Copy link
@willemolding

willemolding Dec 13, 2019

Author Contributor

@neonphog These are the bits I'm not 100% on. I guess this will definitely make everything fail for the duration but do we maybe want to do this another way?

This comment has been minimized.

Copy link
@neonphog

neonphog Dec 13, 2019

Contributor

@willemolding - A couple other options:

  • One would be to move this code to a lower level, and in Tcp and Mem streams, call shutdown when your failure timing is triggered.
  • Alternately, if we wanted to go for more encapsulation, you could create InStreamFailure and InStreamListenerFailure structs that would mostly be pass-through like Tls, but instead of encrypting / decrypting things it would randomly drop its sub-connection object, letting rust close the underlying streams.
@willemolding willemolding requested review from neonphog and zippy Dec 13, 2019
Copy link
Contributor

neonphog left a comment

I like the InStreamFailure idea, but I'm also approving this, because from a CI test perspective, I think this will work fine, and continue working even if we switch up the underlying code organization.

Nice 👍

neonphog and others added 2 commits Dec 13, 2019
@zippy

This comment has been minimized.

Copy link
Member

zippy commented Dec 16, 2019

As per stand-up today, from an activating from testing point of view, adding being able to specify the values from trycp_server "spawn" command is needed. Will review as soon as that's ready.

willemolding and others added 6 commits Dec 17, 2019
// use super::*;
// use std::thread::sleep;

// #[test]

This comment has been minimized.

Copy link
@thedavidmeister

thedavidmeister Dec 18, 2019

Contributor

did you mean to leave this out @willemolding ?

Copy link
Contributor

thedavidmeister left a comment

this is cool, i approve with the caveat that we address @zippy 's concerns re: trycp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.