Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TuneableConsistencyScatter: filter out errors #1086

Merged
merged 2 commits into from Mar 17, 2023

Conversation

rukai
Copy link
Member

@rukai rukai commented Mar 16, 2023

Now that I better understand TuneableConsistencyScatter this fix was pretty easy to put together, if I hit many more problems I will abandon it.

This PR is an attempt to fix this test failure I saw on CI:

shotover   04:43:36.664124Z  INFO shotover_proxy::runner: Starting Shotover 0.1.9
shotover   04:43:36.664157Z  INFO shotover_proxy::runner: configuration=Config { main_log_level: "info,shotover_proxy=info", observability_interface: "0.0.0.0:9001" }
shotover   04:43:36.664164Z  INFO shotover_proxy::runner: topology=Topology { sources: {"redis_prod": Redis(RedisConfig { listen_addr: "127.0.0.1:6379", connection_limit: None, hard_connection_limit: None, tls: None, timeout: None })}, chain_config: {"redis_chain": [TuneableConsistencyScatter(TuneableConsistencyScatterConfig { route_map: {"two": [RedisTimestampTagger, RedisSinkSingle(RedisSinkSingleConfig { address: "127.0.0.1:3332", tls: None, connect_timeout_ms: 3000 })], "three": [RedisTimestampTagger, RedisSinkSingle(RedisSinkSingleConfig { address: "127.0.0.1:3333", tls: None, connect_timeout_ms: 3000 })], "one": [RedisTimestampTagger, RedisSinkSingle(RedisSinkSingleConfig { address: "127.0.0.1:3331", tls: None, connect_timeout_ms: 3000 })]}, write_consistency: 2, read_consistency: 2 })]}, source_to_chain_mapping: {"redis_prod": "redis_chain"} }
shotover   04:43:36.664206Z  WARN shotover_proxy::transforms::distributed::tuneable_consistency_scatter: Using this transform is considered unstable - Does not work with REDIS pipelines
shotover   04:43:36.664249Z  INFO shotover_proxy::config::topology: Loaded chains ["redis_chain"]
shotover   04:43:36.664260Z  INFO shotover_proxy::sources::redis: Starting Redis source on [127.0.0.1:6379]
shotover   04:43:36.664830Z  INFO shotover_proxy::config::topology: Loaded sources [["redis_prod"]] and linked to chains
shotover   04:43:36.665030Z  INFO shotover_proxy::server: accepting inbound connections
shotover   04:43:36.666437Z  INFO connection{id=1 source="RedisSource"}: shotover_proxy::transforms::chain: Buffered chain two was shutdown
shotover   04:43:36.666457Z  INFO connection{id=3 source="RedisSource"}: shotover_proxy::transforms::chain: Buffered chain two was shutdown
shotover   04:43:36.666700Z  INFO connection{id=1 source="RedisSource"}: shotover_proxy::transforms::chain: Buffered chain one was shutdown
shotover   04:43:36.666903Z  INFO connection{id=1 source="RedisSource"}: shotover_proxy::transforms::chain: Buffered chain three was shutdown
shotover   04:43:36.666919Z  INFO connection{id=3 source="RedisSource"}: shotover_proxy::transforms::chain: Buffered chain three was shutdown
shotover   04:43:36.666518Z  INFO connection{id=3 source="RedisSource"}: shotover_proxy::transforms::chain: Buffered chain one was shutdown
test redis_int_tests::multi has been running for over 60 seconds
thread 'redis_int_tests::multi' panicked at 'assertion failed: `(left == right)`
  left: `Err(WRONGTYPE: Operation against a key holding the wrong kind of value)`,
 right: `Ok("OK")`', shotover-proxy/tests/redis_int_tests/assert.rs:12:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/core/src/panicking.rs:64:14
   2: core::panicking::assert_failed_inner
   3: core::panicking::assert_failed
   4: lib::redis_int_tests::assert::assert_ok::{{closure}}
   5: lib::redis_int_tests::basic_driver_tests::test_tuple_args::{{closure}}
   6: lib::redis_int_tests::multi::{{closure}}::{{closure}}
   7: tokio::runtime::runtime::Runtime::block_on
   8: core::ops::function::FnOnce::call_once
   9: serial_test::serial_code_lock::local_serial_core
  10: core::ops::function::FnOnce::call_once
  11: core::ops::function::FnOnce::call_once
             at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
test redis_int_tests::multi ... FAILED

The failing test case:

    flusher.flush().await;
    assert_ok(
        redis::cmd("HMSET")
            .arg("my_key")
            .arg(&[("field_1", 42), ("field_2", 23)]),
        connection,
    )
    .await;

    assert_eq!(
        redis::cmd("HGET")
            .arg("my_key")
            .arg("field_1")
            .query_async(connection)
            .await,
        Ok(42)
    );
    assert_eq!(
        redis::cmd("HGET")
            .arg("my_key")
            .arg("field_2")
            .query_async(connection)
            .await,
        Ok(23)
    );

What is happening here is:

  1. hmset is run on some redis instances 1 and 2 but on 3. This results in the key my_key defined on and 1 and 2 but not 3.
  2. hget is run against redis instances 1, 2 and 3.
  3. hget succeeds on 1 and 2 but returns a type error on instance 3 because a hget can not be run against a null value.
  4. shotover has no preference over the values returned by 1, 2 or 3 and ends up picking the error from instance 3 as its final value.

This PR contains 2 improvements:

  1. filter_map is swapped for a map. It is not appropriate to drop messages as that would break the transform invariants, if we do filter out all messages we should just pick one at random instead.
  2. redis errors are now detected and filtered out, this prevents the issue explored above by ensuring we always pick a successful result instead of an error when possible.

@rukai rukai requested a review from conorbros March 16, 2023 23:05
@rukai rukai enabled auto-merge (squash) March 17, 2023 00:56
@rukai rukai merged commit 218211d into shotover:main Mar 17, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants