Skip to content

Conversation

@Invizory
Copy link
Contributor

@Invizory Invizory commented May 23, 2019

Proposed Changes

Add reject-publish-dlx overflow strategy, which is similar to reject-publish strategy, but also dead-letters rejected messages.

Closes #1443

Types of Changes

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)

Checklist

  • I have read the CONTRIBUTING.md document
  • I have signed the CLA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

Further Comments

TODOs:

@Invizory Invizory force-pushed the dead-letter-rejected-published branch 4 times, most recently from 3a5d535 to 73f59b9 Compare May 23, 2019 19:24
@Invizory
Copy link
Contributor Author

It looks like confirms_rejects_SUITE tests are unstable and also fail in some test runs in master.

Here is the possible outputs of make ct-confirms_rejects_SUITE in both master and this branch:

TEST INFO: 1 test(s), 4 case(s) in 1 suite(s)

== confirms_rejects_SUITE ==

  * [parallel_tests]
    confirms_rejects_conflict (00:23.513)
    mixed_dead_alive_queues_reject (00:11.320)
    dead_queue_rejects (00:01.250)
    policy_resets_to_default (00:04.270)


confirms_rejects_SUITE > parallel_tests > mixed_dead_alive_queues_reject
    #1. {error,
            {timeout_waiting_for_ack,
                [{confirms_rejects_SUITE,mixed_dead_alive_queues_reject,1,
                     [{file,"test/confirms_rejects_SUITE.erl"},{line,176}]},
                 {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1748}]},
                 {test_server,run_test_case_eval1,6,
                     [{file,"test_server.erl"},{line,1263}]},
                 {test_server,run_test_case_eval,9,
                     [{file,"test_server.erl"},{line,1195}]}]}}
TEST INFO: 1 test(s), 4 case(s) in 1 suite(s)

== confirms_rejects_SUITE ==

  * [parallel_tests]
    confirms_rejects_conflict (00:22.307)
    policy_resets_to_default (00:04.322)
    dead_queue_rejects (00:01.218)
    mixed_dead_alive_queues_reject (00:01.331)

@hairyhum
Copy link
Contributor

@Invizory hi, the code looks good. I'm going to look at the test case. Can you also add this behaviour to confirms_rejects_SUITE to expect the same behaviour as reject-publish behaviour?

@hairyhum
Copy link
Contributor

I've increased some timeouts in the master branch. Can you try the mixed_dead_alive_queues_reject test again?
The failure may depend on the environment performance and I can't get it on my machine, but we had it in our CI pipeline once or twice.

@Invizory
Copy link
Contributor Author

Can you also add this behaviour to confirms_rejects_SUITE to expect the same behaviour as reject-publish behaviour?

@hairyhum, sure, that's a good idea.

@Invizory Invizory force-pushed the dead-letter-rejected-published branch from 73f59b9 to bc82718 Compare May 27, 2019 15:38
@Invizory
Copy link
Contributor Author

Invizory commented May 27, 2019

I've increased some timeouts in the master branch. Can you try the mixed_dead_alive_queues_reject test again?

@hairyhum, at the first glance that seems to fix the problem, however, after a few runs in the master branch, I've got the following:

TEST INFO: 1 test(s), 4 case(s) in 1 suite(s)

== confirms_rejects_SUITE ==

  * [parallel_tests]
    dead_queue_rejects (00:01.278)
    confirms_rejects_conflict (00:25.049)
    policy_resets_to_default (00:04.337)
    mixed_dead_alive_queues_reject (00:51.383)


confirms_rejects_SUITE > parallel_tests > mixed_dead_alive_queues_reject
    #1. {error,
            {timeout_waiting_for_ack,
                [{confirms_rejects_SUITE,mixed_dead_alive_queues_reject,1,
                     [{file,"test/confirms_rejects_SUITE.erl"},{line,177}]},
                 {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1748}]},
                 {test_server,run_test_case_eval1,6,
                     [{file,"test_server.erl"},{line,1263}]},
                 {test_server,run_test_case_eval,9,
                     [{file,"test_server.erl"},{line,1195}]}]}}

I've also played with timeouts before and it doesn't seem to help.

Update: Also tested locally on fcae736, still unstable:

TEST INFO: 1 test(s), 4 case(s) in 1 suite(s)

== confirms_rejects_SUITE ==

  * [parallel_tests]
    confirms_rejects_conflict (00:26.216)
    dead_queue_rejects (00:01.305)
    mixed_dead_alive_queues_reject (00:51.343)
    policy_resets_to_default (00:04.323)


confirms_rejects_SUITE > parallel_tests > mixed_dead_alive_queues_reject
    #1. {error,
            {{timeout_waiting_for_nack,{messages,[]}},
             [{confirms_rejects_SUITE,mixed_dead_alive_queues_reject,1,
                  [{file,"test/confirms_rejects_SUITE.erl"},{line,177}]},
              {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1748}]},
              {test_server,run_test_case_eval1,6,
                  [{file,"test_server.erl"},{line,1263}]},
              {test_server,run_test_case_eval,9,
                  [{file,"test_server.erl"},{line,1195}]}]}}

@Invizory
Copy link
Contributor Author

Invizory commented May 27, 2019

The rebased branch is also failing on CI, but on another test and for another reason:

confirms_rejects_SUITE:dead_queue_rejects failed on line 124
Reason: expecting_nack_got_ack

I have failed to reproduce the same behaviour locally.

Add `reject-publish-dlx` overflow strategy, which is similar to
`reject-publish` strategy, but also dead-letters rejected messages.

Closes rabbitmq#1443
@Invizory Invizory force-pushed the dead-letter-rejected-published branch from bc82718 to 89f464f Compare May 27, 2019 19:57
@michaelklishin
Copy link
Collaborator

The test in question tries to kill the queue process in order to simulate a nack. That requires pretty good timing as queues will be restarted when they run into an exception, so there is a natural race condition in this test. I see no good way to address this without changing the supervision tree/implementation.

Let's focus on the rest of the suite and test coverage for this new behavior.

@michaelklishin
Copy link
Collaborator

I guess we can alter the test to run a number of iterations and expect at least some nacks. That'd increase the probability of it passing but that's about it.

@hairyhum
Copy link
Contributor

There is another commit in master which makes the test try harder to kill the queue. But I agree that it's not related to this PR.

This change splits the tests into groups for both the existing `reject-publish`
overflow strategy and the new `reject-publish-dlx`.

The corresponding queue names have been suffixed to make the tests in the
groups independent and thus parallel.

See rabbitmq#1443
@Invizory Invizory force-pushed the dead-letter-rejected-published branch from 3a7d09f to f145c18 Compare May 27, 2019 23:18
@Invizory
Copy link
Contributor Author

Can you also add this behaviour to confirms_rejects_SUITE to expect the same behaviour as reject-publish behaviour?

@hairyhum, done in f145c18.

Add tests for `reject-publish-dlx` overflow strategy for all corresponding
tests for `reject-publish` strategy. The tests are grouped to avoid code
duplication.

See rabbitmq#1443
@Invizory
Copy link
Contributor Author

I’ve also added tests that cover the same behaviour covered by existing tests for reject-publish.

Copy link
Contributor

@hairyhum hairyhum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@hairyhum
Copy link
Contributor

This also needs doc update in https://github.com/rabbitmq/rabbitmq-website

@Invizory
Copy link
Contributor Author

This also needs doc update in rabbitmq/rabbitmq-website

@hairyhum, done in rabbitmq/rabbitmq-website#774.

@hairyhum hairyhum merged commit aeb23f7 into rabbitmq:master May 29, 2019
@hairyhum
Copy link
Contributor

Management help message rabbitmq/rabbitmq-management@1d4be7f

@michaelklishin
Copy link
Collaborator

@Invizory thank you for contributing this!

RedMu added a commit to RedMu/rabbitmq-mock that referenced this pull request Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider an option to dead-letter rejected publishes

3 participants