Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modified check_stopped condition #1769

Merged
merged 1 commit into from May 28, 2021
Merged

Conversation

stanislawskwark
Copy link
Contributor

added STATE_INIT to condition in check_stopped condition for clients - it looks like there's possibility that master is not in STATE_INIT anymore, but some client didn't have enough time to change the state

added STATE_INIT to condition in check_stopped condition for clients - it looks like there's possibility that master is not in STATE_INIT anymore, but some client didn't have enough time to change the state
@stanislawskwark
Copy link
Contributor Author

From what I tested - this seems to be fixing #1762

@cyberw
Copy link
Collaborator

cyberw commented May 28, 2021

thx!

@cyberw cyberw merged commit 088a98b into locustio:master May 28, 2021
@mboutet
Copy link
Contributor

mboutet commented May 31, 2021

@stanislawskwark @cyberw, when I merged this change into #1621, one of the tests began to fail because the assertion checking if the master's state is stopped fails. (the test is not in the locust's master branch, it is new from #1621).

I also observed the behavior when a distributed load test is stopped (i.e. self.stop() is called in the master), the state of the master stays at stopping indefinitely.

I investigated and found that the master stays stopping because when a worker receives the stop message, it also sends back a client_ready message to the master. Thus, the check_stopped never evaluates to True. In summary this is what happens:

  1. Master sends the stop message to all workers.
  2. Upon reception by the worker, the worker sends a client_stopped to the master which result in the master removing the client from its list of workers.
  3. Immediately after sending the client_stopped, the worker also sends a client_ready message which result in the master adding back the worker to its list of workers.
  4. Because the list of worker is now populated with client being in the STATE_INIT state, the check_stopped never evaluates to True and the master never transitions to the STATE_STOPPED state.

Is this intentional?

@stanislawskwark
Copy link
Contributor Author

Intention was not too stop the master right at the beginning of test, when some workers didn't have enough time to switch from init to spawning. Should workers really reconnect and be in init state? If so then this will indeed break stopping, I'm wondering though, wouldn't it make more sense if stopped workers stayed in stopped status without disconnecting? I would really like to avoid reverting this change because it fixes serious issue. Without it tests were unpredictable and restarting whole test, building environments and so on using 80 or more workers takes a lot of time. Unless there's some other solution for this problem

@mboutet
Copy link
Contributor

mboutet commented May 31, 2021

How about the master keeps track of when a test was started and use this in the check_stopped method? That way, the condition would be inhibited for the first t seconds of starting a load test thus giving time for all workers to transition state.

Something like:

    def check_stopped(self):
        if (
            time.time() - self.started_at > int(os.getenv("STARTED_AT_THRESHOLD", 10))
            and self.state not in [STATE_INIT, STATE_STOPPED]
            and all(map(lambda x: x.state not in (STATE_RUNNING, STATE_SPAWNING), self.clients.all))
        ):
            self.update_state(STATE_STOPPED)

By making it configurable via an environment variable, it would be possible to accommodate different load test scale such as yours with lots of workers.

mboutet added a commit to mboutet/locust that referenced this pull request Jun 2, 2021
This fixes the issue of the master staying in the "stopping" state in certain situations. The issue was introduced by locustio#1769.
@jbarz1
Copy link

jbarz1 commented Jun 26, 2021

curious if this will be released soon

@cyberw
Copy link
Collaborator

cyberw commented Jun 27, 2021

curious if this will be released soon

Yes, 1.6.0 should be out in a minute or two!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants