feat: Add rebalance integration testing via pytest and setup integration testing on CI #53

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

enochtangg merged 36 commits into main from github-ci

Dec 3, 2024

Contributor

enochtangg commented Nov 26, 2024 •

edited

Loading

Overview

This is PR is mainly responsible for adding an end-to-end test for validating that consumer rebalances should not produce 1 or more messages in sqlite. More importantly, this PR introduces python dependencies (pytest) into this project as it is much simpler/quicker to write these end-to-end test in python.

Testing

Run make integration-test

enochtangg added 4 commits

November 26, 2024 16:54


          create template for sentry ci and end to end pytest

5838e3d


          fix path to sqlite3

d967e5a


          clean up

4b9c1e5


          add make command

80ab376

Contributor Author

enochtangg commented Nov 26, 2024

Looking for some feedback on the following:

Thoughts on using pytest for end-to-end integration testing with sentry?
Currently, the test_rebalancing_only_processed_once() doesn't actually produce any messages. One thought was to run the send_task CLI command as a previous step in CI before running the pytest. But this wouldn't be repeatable for other tests. Maybe we could run the CLI command in the python test?
The test could probably use some clean up.

enochtangg requested a review from a team

November 26, 2024 22:35

markstory reviewed

View reviewed changes

python/integration_tests/test_consumer_rebalancing.py Outdated Show resolved Hide resolved

python/integration_tests/test_consumer_rebalancing.py Outdated Show resolved Hide resolved

python/integration_tests/test_consumer_rebalancing.py Outdated Show resolved Hide resolved

evanh reviewed

View reviewed changes

Makefile Show resolved Hide resolved

python/integration_tests/test_consumer_rebalancing.py Outdated Show resolved Hide resolved

python/integration_tests/test_consumer_rebalancing.py

    
              import yaml

              def manage_consumer(

Member

evanh Nov 27, 2024

I'm not sure I understand the purpose of this function. At the moment it starts, and then after a random delay dies. Is that the intended behaviour? There's no guarantee it will process the messages assigned to it. I think Mark had a good idea in his PR, where he passes in the number of expected messages, and can track to see whether the consumer has processed them or not.

Contributor Author

enochtangg Nov 28, 2024 •

edited

Loading

When the function starts, it executes the rust binary in a new child process. Then, thread itself sleeps for a random delay. This gives the consumer a chance to start up, receive assigned partitions, and process messages. Finally, a SIGINT is sent to the process and the consumer shutsdown gracefully.

It is true that we aren't explicitly checking all messages are processed here. It is possible that the consumer is started and dies before it can process any messages (though unlikely given the min_sleep and max_sleep parameters). The purpose of the test is to check for duplicate messages during rebalancing, not necessarily that all messages have been processed.

python/integration_tests/test_consumer_rebalancing.py Show resolved Hide resolved

enochtangg added 2 commits

November 27, 2024 17:50


          update num restarts and clean up comments

9986d38


          Merge remote-tracking branch 'origin' into github-ci

d45e3e6

markstory linked an issue

that may be closed by this pull request

Write end to end testing harness for loading sqlite #34

Closed

markstory reviewed

View reviewed changes

python/integration_tests/test_consumer_rebalancing.py Show resolved Hide resolved

enochtangg added 4 commits

November 28, 2024 14:53


          clean up and fix db path

31ebf8e


          send messages to kafka from sentry

313d6e4


          update gitignore

aaaa867


          use rnd display andom seed value

ae7904f

enochtangg force-pushed the github-ci branch from 3eb3a5e to 61da38e Compare

November 28, 2024 22:03


          fix path to test in ci

79eb4a2

enochtangg force-pushed the github-ci branch from 61da38e to 79eb4a2 Compare

November 28, 2024 22:15

markstory reviewed

View reviewed changes

python/integration_tests/helpers.py Outdated Show resolved Hide resolved

python/integration_tests/test_consumer_rebalancing.py Outdated

    
                          process = subprocess.Popen(

                              [consumer_path, "-c", config_file_path], stderr=subprocess.STDOUT, stdout=log_file

                          )

                          random.seed(random_seed)

Member

markstory Nov 29, 2024

You don't need to set the seed multiple times. Doing it once before you generate other random values is enough.

Contributor Author

enochtangg Nov 29, 2024

One of the outcomes of seeding, is that all (8) consumers restart at the same time for every iteration. FWIW, I think this is also a good test to have because we'd want to simulate both rolling deploys and deploys which shutdown multiple consumers at the same time.

python/integration_tests/test_consumer_rebalancing.py

    
                  min_restart_duration = 5

                  max_restart_duration = 20

                  topic_name = "task-worker"

                  curr_time = int(time.time())

Member

markstory Nov 29, 2024

How do we fixate the time/seed when trying to reproduce a failure? We could read from an environment variable like TEST_SEED 🤷

Contributor

john-z-yang Dec 2, 2024 •

edited

Loading

~~What if we just have the seed as cli argument? And in our gha it's always set to 1 or something~~
Ooops never mind, I see we already went for envar

enochtangg added 4 commits

November 29, 2024 15:23


          produce to kafka with sentry


          clean up


          accept test_seed environment variable

189b578


          small fix

9d152c9

enochtangg marked this pull request as ready for review

November 29, 2024 20:42

Contributor Author

enochtangg commented Nov 29, 2024 •

edited

Loading

This integration test no longer relies on sentry to produce test messages for integration tests for two reasons:

It's nicer to separate the two concerns and functionally test each component without potentially breaking each other. Breaking sentry taskworker will not break taskbroker CI.
With the new devinfra tool, we might not need to clone and install deps for sentry in taskbroker CI. This removes that requirement.

markstory approved these changes

View reviewed changes

Member

markstory left a comment

Looking good. I think we should move forward with this. We can refine and expand it incrementally.

enochtangg changed the title ~~feat: Add template for sentry CI and setup rebalance integration testing~~ feat: Add rebalance integration testing via pytest and setup integration testing on CI

enochtangg force-pushed the github-ci branch 2 times, most recently from c566a7a to 75ca8e0 Compare

December 2, 2024 04:46

enochtangg force-pushed the github-ci branch 2 times, most recently from bd85f0a to 1a197a5 Compare

December 2, 2024 21:54


          upload artifact

1c83e31

enochtangg force-pushed the github-ci branch from 1a197a5 to 1c83e31 Compare

December 2, 2024 22:41

john-z-yang force-pushed the github-ci branch from c897e1a to e407bf9 Compare

December 2, 2024 23:00


          pretty print output

242c047

john-z-yang force-pushed the github-ci branch from e407bf9 to 242c047 Compare

December 2, 2024 23:01

john-z-yang added 6 commits

December 2, 2024 15:14


          show duplicate rows

8dfbc71


          move assertion to end of test

a891666


          print inserted offset and partition

537a7c3


          print highwater mark stored

c5dabda


          revert 2b9064c

96a4e9c


          remove extra logging

22ec37a

john-z-yang force-pushed the github-ci branch from 114f746 to 22ec37a Compare

December 3, 2024 00:49

john-z-yang added 10 commits

December 2, 2024 16:54


          add more restarts


          increase max pending count

86eccaf


          more restarts

b99a662


          more restart durations

1498b26


          better logs

34c98cb


          pretty print output

feccd50


          pretty print output

fa44ef8


          pretty print output

8b91cee


          add cargo build to integration test recipie

b3c53f2


          add more messages

5f25539

Contributor Author

enochtangg commented Dec 3, 2024 •

edited

Loading

After a long investigation, @john-z-yang and I have found what change was causing CI to fail this integration test. Strangely enough, this test passed consistently locally which made investigating this troublesome. When this test was executed on ubuntu, the consumer would write duplicate messages (typically 0-2 before and after the rebalance). Reverting this change: 2b9064c#diff-316edec7494fa460e0a4bdc62a604da6b4caa7933db418d023bbfd27c6277c4dR64 fixed the problem. This is because when using select! macro, the consumer client shutdown step was not fully completed since the check_for_shutdown() yields first during rebalancing. As a result, the commit state was not being saved correctly causing duplicate entries.


          clean up and fix seed value

f3261b8

evanh reviewed

View reviewed changes

src/consumer/kafka.rs

    
                          res

                      }

                  }

                  handle_os_signals(event_sender.clone());

Member

evanh Dec 3, 2024

I don't think this is now working the way we want: how to handle_events and handle_consumer_client signal shutdowns if they error out? I think if those threads just stop running, the consumer won't shut down?

Contributor

john-z-yang Dec 3, 2024 •

edited

Loading

Do you mind elaborating on this? I'm having a hard time coming up with a scenario where those 2 threads panic

enochtangg merged commit 7c9107c into main

3 of 4 checks passed

enochtangg deleted the github-ci branch

December 3, 2024 19:32

john-z-yang mentioned this pull request

test(kafka-test): adds a simple test for task worker integrations getsentry/sentry#80958

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet