-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Virtual machines connected to Jenkins via Compute Engine plugin are terminated periodically within an hour #61
Comments
I'll have to take a closer look, but at first glance it might have to do with this feature: |
Any update here please? |
no, instances are not inactive. Some job is running on them and they are terminated. That's the problem. |
O, that is very interesting. Can you show me your instance configuration and more logs if possible. |
Did you see the following in your logs at all:
|
No, i did not see this INFO. I have attached some screenshots of used template on Google Cloud Engine and the Jenkins configuration. On the screenshot you can see that we use our own company image for CentOS 7, but the same problem appears on other systems (CentOS 6, Debian 9) and also when I try official CentOS 7 image provided by GCE. What other logs would you like to see? |
today's logs (jenkins slave log, jenkins full log and jenkins job failure) jenkins_system_log.txt I also archived the instance disk, so if you want some logs from the instance, let me know which ones. |
I have created an archive with /var/log/ directory of crashed CentOS 7 instance: |
Ok, so you seem to be SSH'ing in just fine because of the connect fresh as root INFO log. Can you SSH into your instance manually and see if this is a valid path? I think it might not be, and we'll have to look further into that. I feel like I've seen this issue before and it has to do with faulty directories. Just not quite sure how these incorrect paths get generated. |
Ok, so I just tried out with my remote and with ./ as my remote location i get ./home/jenkins
|
yes, i forgot to mention it. I've also tried to search why the slave was copied into /tmp directory but I didn't find anything about it. I also tried to find some event which could delete this agent in /tmp but no cron or something like that was started, the agent.jar is still present in /tmp directory. It is wierd that the instance is always terminated at the same time during an hour. Yesterday it was every XX:47. And also one question - why the agent name is agent.jar, but when I connect the instance to Jenkins manualy, the jar is called /home/jenkins/remoting.jar ? Is it OK? Attaching the agent.jar. You may want to check it. |
When i try to change directory to /home/jenkins/./.jenkins-slave with command 'cd /home/jenkins/./.jenkins-slave' it is valid command. The directory is present (it is ~/.jenkins-slave). So it does not seem to be a problem. |
Attaching also the System Information about the new CentOS-7 instance I've created few minutes ago. |
#69 is this issue a similar problem? But I have Java 8 installed: Sometimes the slave is connected and working almost one hour and then suddenly terminated. |
#69 is because java 8 was not installed. You don't seem to be having that issue since your logs print a bunch of Java errors. |
I doubt this is the issue, but worth trying, can you try using the same image as me and see what happens (Debian cloud) |
yes i can try Cloud Debian. How do you connect these machines to jenkins? Did you generate some ssh keys? Did you create some jenkins account? What other special changes did you make on this machine? |
ok, so i tried to run Debian official image and it is the same situation. Attaching jenkins logs. |
For Linux images, we generate the SSH keys for you. And like I said before, you seem to have no issue SSH'ing. For some reason your agent has trouble running the job. We're going to put out a new release today and I wonder if that will resolve your issues... I'm not able to reproduce this error and it's not clear at all from the stack trace why this is happening. I will work on this extensively the coming week since I will be on bug duty and can dedicate more bandwidth to issues. |
When you say start some job and during an hour, is the job still running when the instance is terminated or did the job complete and you just kept the instance there and it was deleted? |
i have already generated some ssh key,it is ok. |
I start the job, istance is created and the job is running, then at specific time (today every xx:47) the instance is terminated (on google could operations page I can see that the request to terminate comes from jenkins account - stop and delete the instance). Then the machine is not available on Cloud or Jenkins. I can set an option to not delete the disk when the instance is terminated. As I already do (so when I need some logs fromfrom deleted machine, I create the new one manually in google cloud and use the deleted instance's disk and connect to it via ssh). As you can see in logs I have already attached here, the running job is not completed and it is then terminated, because the agent was deleted. |
it is wierd that the termination happens whole day at specific minute of an hour. It does not matter if the instance (job) runs 10minutes or 50minutes. If i run the job at 18:40, instance is terminated at 18:47. The same happens when i run the job at 17:50 it also crashes at 18:47. |
any update here please? |
Sorry about the delay. The reason I ask is because I want to see if some other plugin or retention strategy is interfering with the agents and terminating them improperly since I notice the following plugin might have something to do with what's happening:
|
Hello, ok, i am attaching
thanks |
was it helpful? |
no update here please? |
I can't seem to open any of these files, did you just save everything from the website? There are lots of web-related files. I had wanted to see the logs since I am guessing there could be other plugins interfering with the instances. |
Ok, I've managed to open them. At 9:52:57 i see the following:
However, I'm not seeing any log statements that would indicate we found any instances to terminate. This is possible if no remote instances were found. Line 97 in ce5761e
This may be overkill, but I wonder if you could run Jenkins with your own local build of the plugin and insert more log statements... |
Alright, at 9:52:58 am, which is not too long after 9:52:57 where we saw
The timing of this statement makes me suspect it is because of CleanLostNodesWorker. @ingwarsw Care to contribute any input? |
@lukasova Are you using latest version of plugin? |
@lukasova You dont have maybe few jenkins configured with same cloud? |
Logs seem to show only 1 cloud? |
Not many clouds on one jenkins.. |
@lukasova Check version at least 3.1.1 |
that's true. We have 2 Jenkinses configured with the same cloud. I never realized it could be related. I will update both plugins to version 3.2.0 and if it does not help I will disable testing version of Jenkins and we'll see. Thank you |
problem seems to be fixed after updating plugin to version 3.2.0. Hope it won't appear again :) thank you |
@lukasova thank you for being patient with us! Glad it worked out. |
Hi rachely3n, i'm trying to use Google Compute Engine Plugin but getting an error "Could not list in region in project " please look into them. i didn't find out where i am going to wrong. |
@Mukhtarali212 Usually that issue has to do with your service account credentials. Make sure the credentials you created has the proper permissions. |
Hi rachely3n , Thanks for the reference to resolve that issue, i have one more new issue please see that , there is a error for cloning the git repository in jenkins server when provisioned a new instance from gce plugin VM is launch and job will trigger but getting the error. |
I have upgraded jenkins to 2.426.1 and I am facing similar issue @Mukhtarali212 can you suggest what can be checked here |
I use Compute Engine Plugin (v. 3.0.0) for connecting GCE instances to Jenkins CI (v. 2.159). Jenkins automatically creates the instances (e.g. CentOS 6,7, Debian 9 - I tried official images that provides Google Cloud Engine) when some job is stared, but in specific time in every hour (e.g. every XX:57, yesterday it was every XX:53) all these machines are terminated no matter how long does they run. In logs of machines there are just information about the shutdown, anything special:
...
08:46:33 jenkins-gce-cent-7-cv5jlc systemd: Startup finished in 1min 30.753s.
08:47:54 jenkins-gce-cent-7-cv5jlc systemd-logind: Power key pressed.
08:47:54 jenkins-gce-cent-7-cv5jlc systemd-logind: Powering Off...
...
Steps to reproduce:
Prepare some template in GCE, use it in Jenkins with Google Compute Engine plugin, start some job and during an hour the machines will be terminated.
I attach log from Jenkins about connected machine and log from /var/log/messages from the virtual machine
messages-20190405.txt
jenkins_slave_log.txt
The text was updated successfully, but these errors were encountered: