-
Notifications
You must be signed in to change notification settings - Fork 2
Production ready ? #25
Comments
Hi @commarla. Short answer yes, I believe Chemtrail is ready and safe to run in a production environment. That being said, I would advise some caution due to the nature of the work it performs which could have monetary impacts, and generally bigger impact that Sherpa. If you have the ability, I would suggest building off master and using this version which includes a no-op provider. This provider will not perform any actions to your cluster or cloud environment, but will just log intentions for analysis. This will hopefully give you some initial insight into its behaviour, before allowing it to perform actions. The one bigger item currently missing that I am looking into over this weekend is the leadership locking method. I hope to have something basic in before releasing v0.1.0, but Chemtrail has more complicated internal workings when performing scaling than Sherpa and so would benefit from better state machine modelling which will take time to develop. Thanks for the kind words on Sherpa, I hope Chemtrail treats you the same. I will leave this issue open, please add any questions you have on here and I'll be happy to help. |
Hi @jrasell
I tried it quickly maybe I am missing a config or something. |
@commarla I have not seen this before, i'll take a look into it. |
@commarla I am unable to reproduce this locally immediately but have had a quick look into the code to double check what is going on. Would you be able to share the returned JSON from the call |
@jrasell I just call the API and saw that it's an old version of nomad (0.8.6). Let me try to update all my nodes before annoying you more. On this environment I have at least 4 different version of nomad... |
@commarla that makes sense; I was looking at the particular struct section of the Nomad API I use and noticed it was added late in 2018 so thought it could be an older Nomad version. It would make sense to add some safety and fallback in Chemtrail to help avoid this in the future. I'll open a ticket and link it here. |
Hi @jrasell Correct me if I am wrong but since your fix, I can run chemtrail with both nomad 0.10.2 and nomad 0.9.4 but only 0.10.2 nodes seem to be taken in account in the I'm running an ASG with 40 nodes and half of them were always in 0.9.4 and I got an cpu actual at 2 percent. I knew it was false. |
@commarla that seems weird, do you have any other information which could help track this down? |
Hi @jrasell I don't have any other info to give to you but I observe another issue which might be related. At startup chemtrail work fine. Scaling in and out work as expected but after a few days I have this feeling some nodes are forgotten by chemtrail and not taken into account during the resource calculation. I measure the node number and during the night we can observer a scaling in but after a few days the minimum instance count increase. It should be approximatively the same every night. And to confirm my feeling if I restart chemtrail, I observe a huge scaling in operation repetition as you can observe : In red chemtrail restarts and you can see the low are increasing over time. After a restart the night instance count is correct again. My setup is a bit complex, I am using spot instances (so I have high number of new/dead nomad agent) with a mixed instance type ("c5.4xlarge", "c5.2xlarge", "c5.9xlarge", "m5.4xlarge", "r5.4xlarge") |
Hi @jrasell
Do you think I can try chemtrail on my nomad production cluster ? I really need to scale my ASG accordingly with Nomad.
According to my really nice experience with Sherpa, I think I can trust chemtrail !
I'm just asking if there are any known issues or something I should know before trying it ?
Thanks,
The text was updated successfully, but these errors were encountered: