Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust handling of ZMQ/RPC errors #2120

Merged
merged 14 commits into from
Jun 27, 2022
Merged

Conversation

solowalker27
Copy link
Contributor

When using Boomer as a worker runner, I encountered some occasional errors on the master when receiving reports from the workers. In that situation, current Locust would break itself by attempting to reset the entire RPC server. Firstly, various issues prevented that from completing successfully. This broke the entire test instead of just fixing the one worker. Secondly, that isn't always necessary for the master to do. In most cases, subsequent reports from the workers are fine. In some other cases, the worker with the bad message resetting its connection to the master can resolve it. This will now reserve the master resetting its server to times when the master gets an error attempting to send to a worker, or if there's any other unexpected RPCError that gets thrown.

@cyberw
Copy link
Collaborator

cyberw commented Jun 23, 2022

Lgtm. Will merge (probably as squash) once we sort out the timeout-issue, probably some time next week...

@cyberw cyberw merged commit 357f106 into locustio:master Jun 27, 2022
danigoland added a commit to danigoland/locust that referenced this pull request Aug 9, 2022
* More robust handling of ZMQ/RPC errors (locustio#2120)

* More robust RPC error handling on msg from worker

* Use dedicated exceptions, fewer nested try blocks

* Fix test_zmqrpc.py

* Undo function split since added new exceptions

* Fix more tests

* Fix some tests

* Fix typo

* Fix scoping of variables

* Add tests for RPC/ZMQ changes

* flake and black fixes

* Remove debug print line

Co-authored-by: Ryan Warner <ryan.warner@edgecast.com>

* Remove timeout parameter from FastHttpUser unit tests

* Update changelog for 2.10

* Increase CONNECT_RETRY_COUNT to avoid workers giving up too soon if master is not up yet

* Escape user supplied data in html report (locustio#2126) (locustio#2127)

* Escape user supplied data in html report (locustio#2126)

authored-by: Tom Herrmann <t.herrmann@sab-engineering.com>

* Replace the MD5 usage by SHA256

MD5 is old, insecure, and can create problems for people using this package when they are trying to pass some compliance requirements (for example, FIPS).

* Fix escaping for exceptions in normal web ui (related to locustio#2126)

* implement table-sorting in report.html

* fix: Fix typo at user/wait_time.py

* improve report sorting

* enabled sorting of error messages as well as stacktraces

* Minor edits to the documentation

* Small documentation correction

* Minor edits to the documentation

* Log an error for every failed attempt to connect to master

The connection timeout and number of attempts are hardcoded, so a failure will take very long
These log lines will allow to troubleshoot issues with the connection to master

* Minor edits to the documentation

* Minor edits to the documentation

* Minor edits to the documentation

* Stop calling attributes 'properties' in some places.

* Give a better error message when someone accidentally sets User.task instead of User.tasks

* Fix detection of accidental TaskSet.task attribute

* fix spelling in comment

* style: add a report favicon

* Removed cache_timeout kwarg from request_stats_full_history_csv for flask 2.2.0

* temporary change to see logs for py38

* restored resource warning masking

* enabled tracemalloc temporarily

* removed tracemalloc

* Ensure no caching of stats history csv (replaces cache_timeout=None which was removed in locustio#2148)

* Update changelog for 2.10.2 (automatic changelog generation is broken, so CHANGELOG.md is incomplete)

* test: Implement failing test for issue locustio#2135

* fix: Set users_dispatcher to None when test is stopped

* chore: Remove misleading docstring in test

* chore: Do not use intermediate variable for one-use

* perf(test): Decrease test runtime

Co-authored-by: solowalker27 <ryan.subscriptions@me.com>
Co-authored-by: Ryan Warner <ryan.warner@edgecast.com>
Co-authored-by: Lars Holmberg <lars.holmberg@svenskaspel.se>
Co-authored-by: Tom Herrmann <linux@randoom.org>
Co-authored-by: Renan Gomes Barreto <RenanGBarreto@users.noreply.github.com>
Co-authored-by: Tom Herrmann <t.herrmann@sab-engineering.com>
Co-authored-by: Lukas Lanzner <l.lanzner@sab-engineering.com>
Co-authored-by: Dmytro Litvinov <me@dmytrolitvinov.com>
Co-authored-by: Xavier Sosnovsky <xso@sosna.ws>
Co-authored-by: Andy Byrne <andybyrne@users.noreply.github.com>
Co-authored-by: gdm85 <gdm85@users.noreply.github.com>
Co-authored-by: Xavier Sosnovsky <sosna@users.noreply.github.com>
Co-authored-by: Lars Holmberg <lars.holmberg@redshirt.se>
Co-authored-by: Lijiawei <1456470136@qq.com>
Co-authored-by: Michael Nester <mike.nester0@gmail.com>
Co-authored-by: Maxence Boutet <maxenceboutet@outlook.com>
Co-authored-by: Maxence Boutet <52334444+mboutet@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants