-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable direct connections to TCP agent listener #602
Conversation
On Windows test `org.csanchez.jenkins.plugins.kubernetes.KubernetesFactoryAdapterTest.autoConfig()` fails because `/src/test/resources/kubenamespace` contains a "\n" line separator but `io.fabric8.kubernetes.client.Config.tryNamespaceFromPath(Config)` uses `System.lineSeparator()` for replacement which is "\r\n" on Windows. ``` config.setNamespace(namespace.replace(System.lineSeparator(), "")); ```
[Remoting#3.34](https://github.com/jenkinsci/remoting/releases/tag/remoting-3.34) introduced [direct inbound TCP agent connections](jenkinsci/remoting#338) (to masters without Web UI/HTTP port). The [feature is documented here](https://github.com/jenkinsci/remoting/blob/master/docs/tcpAgent.md#connect-directly-to-tcp-port).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like it should work. Probably @Vlatombe or I need to try it in a few contexts.
src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesCloud.java
Show resolved
Hide resolved
src/main/java/org/csanchez/jenkins/plugins/kubernetes/PodTemplateBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/csanchez/jenkins/plugins/kubernetes/PodTemplateBuilder.java
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,6 @@ | |||
<div> | |||
When this option is enabled a <a href="https://github.com/jenkinsci/remoting/blob/master/docs/tcpAgent.md#connect-directly-to-tcp-port">direct connection</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use target="_blank"
, and never link to a GitHub branch head reference from the product, as the link could break at any moment. You could use a tagged version, but better is to create a redirect on jenkins.io (example: jenkins-infra/jenkins.io#2290), or simply make sure that everything the user needs to know is listed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -27,6 +27,10 @@ | |||
|
|||
<f:validateButton title="${%Test Connection}" progress="${%Testing...}" method="testConnection" with="name,serverUrl,credentialsId,serverCertificate,skipTlsVerify,namespace" /> | |||
|
|||
<f:entry title="${%Direct Connection}" field="directConnection"> | |||
<f:checkbox default="true"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are optimistic!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I switch to false
? Assuming the feature works, from my perspective this is much more straight forward than configuring and using the HTTP(S) URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main drawback I see with this is that it won't work if the Jenkins instance is set to use a random port for inbound TCP. In case the master restarts, the agent won't be able to reconnect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the default.
For our (Jenkinsfile Runner in Kubernetes) scenario Jenkins restarts are not relevant, but I'll try to test the behaviour.
How important is this topic for you? I mean the agents created by the Kubernetes plugin are typically not living very long.
I can also add some notes to the help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the agents created by the Kubernetes plugin are typically not living very long
But they can survive a restart of the Jenkins master, which might occur within a matter of minutes after the agent pod has started. It is expected to reconnect to the restarted master, even with a random TCP port (Jenkins.slaveAgentPort == 0
).
The existing tests passed, but I suspect this was only because you made this field enabled by default in the GUI form whereas KubernetesTestUtil.setupCloud
uses KubernetesCloud.<init>(String)
which leaves it off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be interesting to validate against cases where Jenkins restarts to make sure the behaviour is correct.
@@ -27,6 +27,10 @@ | |||
|
|||
<f:validateButton title="${%Test Connection}" progress="${%Testing...}" method="testConnection" with="name,serverUrl,credentialsId,serverCertificate,skipTlsVerify,namespace" /> | |||
|
|||
<f:entry title="${%Direct Connection}" field="directConnection"> | |||
<f:checkbox default="true"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main drawback I see with this is that it won't work if the Jenkins instance is set to use a random port for inbound TCP. In case the master restarts, the agent won't be able to reconnect.
@Vlatombe I tested the behaviour. With fixed port the agent has been able to reconnect to the master. With random port, as expected, this is not possible. Below you see the jnlp container log with fixed port. The reconnect works after some time.
The pipeline log (fixed port) looks like this:
|
So based on observation, one thing that could be done is to enable the new way by default unless the incoming TCP port is random. Is it even needed to have a checkbox ? |
Or, if we still have an explicit option for the new mode, add form validation with a warning if you turn it while using a random port that your builds will not survive a master restart. |
I implemented Jesses proposal. I would keep the mode configurable, at least in the beginning, to have the possibility to use the old behaviour in case there are side effects in some scenarios. |
...resources/org/csanchez/jenkins/plugins/kubernetes/KubernetesCloud/help-directConnection.html
Show resolved
Hide resolved
@jglick @Vlatombe |
src/main/java/org/csanchez/jenkins/plugins/kubernetes/PodTemplateBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesCloud.java
Outdated
Show resolved
Hide resolved
src/main/resources/org/csanchez/jenkins/plugins/kubernetes/KubernetesCloud/config.jelly
Show resolved
Hide resolved
@oleg-nenashev thanks for the review. I pushed the changes. |
if (!cloud.isDirectConnection()) { | ||
env.put("JENKINS_URL", cloud.getJenkinsUrlOrDie()); | ||
} else { | ||
Jenkins jenkins = Jenkins.getInstanceOrNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Jenkins.get()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -107,6 +110,7 @@ | |||
private boolean capOnlyOnAlivePods; | |||
|
|||
private String namespace; | |||
private boolean directConnection; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To enable it by default for new instances (and also for unit tests)
private boolean directConnection; | |
private boolean directConnection = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I also had to adjust/fix the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed the tests running only in K8s clusters.
Is Jenkinsfile-kubernetes
the easiest way to test this "locally"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's the one
The PodTemplateBuilderTests now test both, with directConnection=true and directConnection=false (old test behaviour). JenkinsRule is active now since some Jenkins APIs are accessed by the new functionalities.
src/main/java/org/csanchez/jenkins/plugins/kubernetes/PodTemplateBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/csanchez/jenkins/plugins/kubernetes/PodTemplateBuilder.java
Show resolved
Hide resolved
switch to TcpSlaveAgentListener.getAdvertisedHost() Co-Authored-By: Jesse Glick <jglick@cloudbees.com>
This comment has been minimized.
This comment has been minimized.
@Vlatombe @jglick Tests are successful now. Only KubernetesPipelineTest#cascadingDelete is skipped. |
@@ -27,6 +27,10 @@ | |||
|
|||
<f:validateButton title="${%Test Connection}" progress="${%Testing...}" method="testConnection" with="name,serverUrl,credentialsId,serverCertificate,skipTlsVerify,namespace" /> | |||
|
|||
<f:entry title="${%Direct Connection}" field="directConnection"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use an optional block to wrap the jenkinsUrl
in, since the two fields exclude each other.
Something like
- Disable direct connection
then display the Jenkins URL field if it is ticked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't like the reverse logic. And for those already having a jenkinsUrl
configured we would need to activate this flag ("if jenkinsUrl
configured, set disableDirectConnection
to true"). Sounds unnecessary complicated.
What do you think? If you still prefer it this way, I can implement it.
@@ -94,23 +102,43 @@ public void testParseLivenessProbe() { | |||
} | |||
|
|||
@Test | |||
public void testBuildWithoutSlave() throws Exception { | |||
public void testBuildWithoutSlave_directConnection() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like something that could be written using https://github.com/junit-team/junit4/wiki/parameterized-tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I first implemented org.junit.runners.Parameterized
but then switched to junitparams.JUnitParamsRunner
since Parameterized
runs all test methods with the configured Parameters, not only the ones where it is wanted.
org.junit.runners.Parameterized does not allow to run only specific test methods with parameters.
@oleg-nenashev @Vlatombe @jglick how shall we continue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@oleg-nenashev @Vlatombe any chance to review (Oleg) and release this? |
I can't reproduce the merge voter issue. All 27 KubernetesPipelineTests are successful in my cluster for the merge commit of d64a8b2 and 3386474. |
Looks like an infra issue. |
I believe this feature broke my existing Jenkins k8s setup. Rolling back to 1.20.1 fixed things. JNLP container image kept on crashing when Jenkins provisioned pods. Tried updating the jnlp-slave 3.35-5 image (no alpine available yet it seems?), but that did not fix it either it seems. |
Hello Simon,
Btw. jnlp-slave 3.35-5-alpine is available now (see https://github.com/jenkinsci/docker-jnlp-slave/issues/125). |
This setting of
I would guess that is also what @siwyd might have run into? Yes, you see the warning in the ui but it seems it should be an error with red letters? :) If this is the way forward I can do a pr disabling direct connect for the |
This went poorly for us using Helm chart
edit: |
Fyi; created PR here helm/charts#18410 |
@bergemalm You PR should solve your problem from my perspective. I tried to reproduce the issue.
JNLP Log:
Next steps I think we have to allow |
@alxsap The log of the crashing JNLP container. This is using 3.35-5-alpine.
I'm using JCasC to provision the Kubernetes cloud:
When looking in the settings, seems like direct connection is set by default then: |
Not sure how you updated but setting kubernetes plugin to version 1.20.2 (via helm) sets |
@bergemalm @siwyd I think this is related to how the casc plugin creates/injects the config. It seems when it does not set a specific parameter ( |
Maybe we should just change back the default value for that checkbox, since this causing so much trouble. |
@siwyd @bergemalm @wgj Release 1.21.1 switched the |
I did a fix for it which created some issues. Since it defaults to false now a pr removing the fix was merged this morning. Shouldn't be an issue anymore. Thx. |
@alxsap I just reprovisioned with 1.21.1 and things seem to be working! |
Remoting#3.34 introduced direct inbound TCP agent connections (to masters without Web UI/HTTP port). The feature is documented here.
Oleg released jnlp-slave 3.35-5 which contains the required changes. Thanks, @oleg-nenashev!
This Pull Request passes the required parameters to the JNLP container. In particular this change is needed to enable pipeline runs in Jenkinsfile Runner on Kubernetes (without exposed HTTP port).