Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance hit after upgrade to 3.2.1 #601

Closed
CyrilMazur opened this issue May 28, 2019 · 8 comments
Closed

Performance hit after upgrade to 3.2.1 #601

CyrilMazur opened this issue May 28, 2019 · 8 comments
Labels

Comments

@CyrilMazur
Copy link

  • Horizon Version: 3.2.1
  • Laravel Version: 5.8.17
  • PHP Version: 7.2.18
  • Redis Driver & Version: predis 1.1.1
  • Database Driver & Version: MySQL 5.7.25

Description:

I noticed a performance hit in production after upgrading to Horizon 3.2.1, which is probably in relation to #589. This is on a EC2 t2 instance, with constant workload, the hit is an additional 5-6% of CPU usage.

See graph below, we upgraded to 3.2.1 on May 22 (you will notice also that the CPU credits went down to zero way faster after the upgrade):

Screenshot 2019-05-28 at 18 53 38

And this graph is after reverting to 3.1.2:

Screenshot 2019-05-28 at 18 56 16

Steps To Reproduce:

Upgrade Horizon from 3.1.2 to 3.2.1.

@AJenbo
Copy link
Contributor

AJenbo commented May 29, 2019

Can you check if this is related to the scheduler or one of the other processes?

@CyrilMazur
Copy link
Author

How can I check that?

@AJenbo
Copy link
Contributor

AJenbo commented Jun 6, 2019

I think this may be because it now calls ps twice, once to find out if it need to auto-scale and once to get the current system load. Unfortunately on my system the performance difference is so minute that I'm unable to tell the difference.

I have 3 options for possible improvements.
1: Differentiate the two calls so that the auto-scaler only get the list of commands and not CPU and mem load.
2: Only call ps once and then store the values in the class for the telementry
3: Tightly couple telementry and autoscaling

The pros:
1: Only getting the relevant data for the caller is probably good idea
2: Only one call will be made each tick
3: Only one call will be made each tick

The cons:
1: It is unclear from my tests if ps collects less data or simply just dosen't return it, so the savings might be negligible.
2: If caller A is no longer called each tick in a later version then caller B will get stale data
3: Tightly coupling things is a bad idea

@CyrilMazur
Copy link
Author

Option 4: boolean to enable / disable telemetry?

For option 1, if you can give me an example of ps command, I can try to benchmark it on my EC2 t2 instance.

@AJenbo
Copy link
Contributor

AJenbo commented Jun 7, 2019

exec ps axo %cpu,%mem,command | grep "supervisor=SomeHorizonName" | grep -v "grep"
vs
exec ps axo command | grep "supervisor=SomeHorizonName" | grep -v "grep"

@CyrilMazur
Copy link
Author

Over 50,000 iterations, no significant difference...

time ./test1.sh
./test1.sh  83.36s user 146.00s system 78% cpu 4:53.98 total
➜  time ./test2.sh
./test2.sh  80.47s user 152.67s system 78% cpu 4:56.86 total

@AJenbo
Copy link
Contributor

AJenbo commented Jun 7, 2019

Hmm less then 4% increased performance dosent look promising for option 1. There would also be a reduction in time spent parsing the result, since scaling would only need the number of lines. But I wouldn't expect this to much of a performance issue either.

Currently this only appears to cost 0.16% cpu at the tick rate on the given system so perhaps more then just ps is to blame here.

@driesvints
Copy link
Member

The original PR that caused the performance hit was reverted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants