Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
salt-cloud being strangely slow #34776
Description of the problem
However my use case is critical on VM deployment time because we offer PaaS solution with VMs deployed on customer request, so need to deliver ready VMs on a click of a button within seconds.
And what puzzles me is why
I have created neck-to-neck simple test with deploying three VMs based on default GCE CentOS7 image using both
Is there something obvious that I'm missing about
I've measured time from start till end of hosts bootstrap. Salt-cloud tells you in console when initial machine bootstrap is over and it moves on to provisioning, i.e. minion installation, master-minion connection and certificates, further provision according to salt states etc. So that is excluded from the timeline I've explained above.
Hi @alexykot. With most providers, it really does only take a few seconds to request a machine. The bulk of the time spent waiting is for machines to become responsive so that Salt Cloud can log in, upload files, and run the deploy script.
You should be able to see all of this activity by adding
I would expect so and it would be totally fine to wait until machines come up online. And fine to wait little longer to provision salt-minion and do keys exchange. But it seems to take much longer than needed to do that. I will retry it with
You need to change the logging format, look at https://docs.saltstack.com/en/latest/ref/configuration/logging/#log-fmt-console and https://docs.saltstack.com/en/latest/ref/configuration/logging/#log-fmt-logfile, more particularly, `
I think I know where the speed problem lies. I did something that totally changed the way it works, and here is what is was.
Initially I was doing the test using the same salt stack instance I use for managing main infrastructure we have. I have the provider for production GCE cloud account configured in there, and I've added another provider for this performance testing, so I had two providers configured at once.
So what I did was commenting out the production provider. I've just renamed it's
And I can point out the exact place in the log where the time difference happened.
with prod provider:
without prod provider:
That's 86 seconds versus 266 seconds.
I have the rest of the log, but I don't see much of other interesting things in there.
An important background note - our production provider has around 300 live instances at the moment.
referenced this issue
Mar 28, 2017
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.