Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-44785] - Built-in request timeout support #174

Open

Conversation

Projects
None yet
5 participants
@oleg-nenashev
Copy link
Member

commented Jun 29, 2017

This is a PoC implementation of the timeout support in Remoting API, See one of the use-cases in JENKINS-44785

  • - Implement working PoC
  • - TBD make decision about switching to Java 8. The PoC needs it for the java.time.Duration class, but it can be replaced if we stay on Java 7
  • - If Java 8 is selected, implement new call() default implementations in interfaces (with TimeoutExceptions) to address the JENKINS-44785 request from @jglick
  • - channel#callAsync() should also support timeouts on the remote side
  • - TBD: If we stay on Java 7, stick to InterruptedException?
  • - Weaponize it! Set some default timeouts for user requests and RPC calls

@reviewbybees @jglick

@@ -0,0 +1,108 @@
/*

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Jun 29, 2017

Author Member

This class is an adapted version of jenkinsci/workflow-support-plugin@c810b38

@reviewbybees

This comment has been minimized.

Copy link

commented Jun 29, 2017

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

@jglick jglick self-requested a review Jun 29, 2017

@jglick

This comment has been minimized.

Copy link
Member

commented Jun 29, 2017

channel#callAsync() should also support timeouts on the remote side

Why? I see no need for it.

Show resolved Hide resolved pom.xml Outdated

public <V,T extends Throwable>
V call(@Nonnull Callable<V,T> callable, @CheckForNull Duration performTimeout, @CheckForNull Duration executionTimeout)
throws IOException, T, InterruptedException {

This comment has been minimized.

Copy link
@jglick

jglick Jun 29, 2017

Member

What, no TimeoutException?

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Jun 30, 2017

Author Member

The call gets interrupted now instead of timeout. See the PR TODO list

This comment has been minimized.

Copy link
@jglick

jglick Jun 30, 2017

Member

I interpreted the TODO in the PR as referring to methods added to VirtualChannel with default implementations. This is just being added to a class so there is no Java 8 dependency. You chose the throws clause for the method and I am arguing that it should be throwing TimeoutException. See signatures of Future.get and so on.

}

public <V,T extends Throwable>
V call(@Nonnull Callable<V,T> callable, @CheckForNull Duration performTimeout, @CheckForNull Duration executionTimeout)

This comment has been minimized.

Copy link
@jglick

jglick Jun 29, 2017

Member

I do not think executionTimeout is necessary. Just keep it to performTimeout. No one cares about the difference. A caller simply wants to ensure that a network call does not block forever.

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Jun 30, 2017

Author Member

Yes and no. We want to prevent hanging of requests on both sides.

This comment has been minimized.

Copy link
@jglick

jglick Jun 30, 2017

Member

I do not see any need for special infrastructure to control delays on the remote side, i.e., in the body of the callable. That is just up to the discretion of whoever is writing that callable—if it is in fact doing something which might block indefinitely, it may instead select a method variant with an appropriate timeout. But from the perspective of an RPC caller, what is important is just that the calling thread is not blocked for too long in networking.

t.setStackTrace(thread.getStackTrace());
LOGGER.log(Level.FINE, "Interrupting " + thread + " after " + delay + " " + unit, t);
}
thread.interrupt();

This comment has been minimized.

Copy link
@jglick

jglick Jun 29, 2017

Member

I think this utility is inappropriate here.

I introduced it in Pipeline code (later elsewhere) because I was calling predefined APIs (Channel.call, ultimately) which offered no timeout (à la Future.get(long, TimeUnit)), and from reading the source code I knew that the implementation would typically be inside an Object.wait() loop in Request.call, which I had no control over and which waited with no bound except a response, channel closing, or a thread interruption. So this was certainly a workaround.

But if you are implementing timeouts in remoting itself, there is no reason to resort to that. You can simply make Request.call not wait forever. It can wait for the configured timeout, and not use the while loop.

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Jun 30, 2017

Author Member

But if you are implementing timeouts in remoting itself, there is no reason to resort to that. You can simply make Request.call not wait forever. It can wait for the configured timeout, and not use the while loop.

It will solve hanging of the calls only on one side of the channel. I want to address it on both

This comment has been minimized.

Copy link
@jglick

jglick Jun 30, 2017

Member

See above. I think this is not the right approach.

* @since TODO
*/
@Restricted(NoExternalUse.class)
public class Timer {

This comment has been minimized.

Copy link
@jglick

jglick Jun 29, 2017

Member

As above, unnecessary if you implement timeouts in the more natural way.

@@ -45,6 +45,7 @@
import java.io.UnsupportedEncodingException;
import java.lang.ref.WeakReference;
import java.net.URL;
import java.time.Duration;

This comment has been minimized.

Copy link
@jglick

jglick Jun 29, 2017

Member

Could use long + TimeUnit if you wanted to keep Java 7 compatibility. (But I see no reason not to move to 8 immediately.)

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Jun 30, 2017

Author Member

Yes, no reasons from the Jenkins community PoV since the LTS is on Java 8

@jtnord

This comment has been minimized.

Copy link
Member

commented Jun 30, 2017

channel#callAsync() should also support timeouts on the remote side

Why? I see no need for it.

One reason is that you ask for something to happen on the remote side and it takes a while to process and if it can not return within the default you do not want to waste resources continuing to do it.
This may not be say an Agent where you may decide resources are cheap, but where remoting is used inter master.
For example - executing a groovy script on a bunch of inter-connected masters - but the groovy script accidentally uses "while (true)" without a break clause, using the timeout on the remote side the script can be interrupted whilst it is still running.

@oleg-nenashev

This comment has been minimized.

Copy link
Member Author

commented Jun 30, 2017

I suppose @jglick just expected the timeout on one side. In this PR I have intentionally added timeouts on both sides. As @jtnord says, such timeout may be useful if the request hangs due to whatever reason, especially since we have a limited threadpool for them.

@jglick

This comment has been minimized.

Copy link
Member

commented Jun 30, 2017

w.r.t. callAsync

you ask for something to happen on the remote side and it takes a while to process and if it can not return within the default you do not want to waste resources continuing to do it

So the body of the callable is free to impose any time limit it sees appropriate. For that matter a poorly written callable might be consuming hundreds of megs of heap for no good reason. This is no different from local method calls—not a concern of Remoting.

@oleg-nenashev

This comment has been minimized.

Copy link
Member Author

commented Jul 3, 2017

This is no different from local method calls—not a concern of Remoting.

As a maintainer of Remoting, I think having a timeout for such calls is important. Do you block the PR due to that? Or just "IMHO YAGNI"?

@stephenc

This comment has been minimized.

Copy link
Member

commented Jul 3, 2017

Imho I agree with @jglick

Timingboutvthe remote operation should not be a concern of the remoting api... (or at best it should be something that the caller opts-in e.g. By using a wrapper channel)

I am inclined to object, but at this point I will just give a stern look and ask you to critically self-review ;-)

@oleg-nenashev

This comment has been minimized.

Copy link
Member Author

commented Jul 3, 2017

or at best it should be something that the caller opts-in

Currently it's opt-in since the default timeout is "no timeout"

@stephenc

This comment has been minimized.

Copy link
Member

commented Jul 3, 2017

Currently it's opt-in since the default timeout is "no timeout"

But you are littering the API by multiplying methods.

When somebody needs a timeout on the remote execution they probably just want you to provide them with a wrapping callable that does the timeout for them.

That would simplify the API and simplify the implementation.

IMHO less methods to choose from is better. I could probably go so far as to provide a wrapper to the callable for the caller timeout too so that, in effect, the caller just adds the timeouts to their callables

ch.call(localTimeout(remoteTimeout(new Callable<...>(...) {...})));

but it is fine to keep the local timeout as a parameter

ch.call(withTimeout(new Callable<...>(...){...}), localTimeout);

as that way the withTimeout static callable decorator can also be reused locally without confusion

@jglick

This comment has been minimized.

Copy link
Member

commented Jul 3, 2017

provide a wrapper to the callable for the caller timeout too

That is essentially what I attempted to do with the Timeout utility in Pipeline code, but it is not a good solution here—hence my RFE (which is overshot by this PR). More: #174 (comment)

Do you block the PR due to that?

Well, -0.5. I think this PR is solving problems it did not need to solve, and solving the problems it did need to solve the wrong way.

To reiterate, my original complaint in JENKINS-44785 was that, say,

FilePath f;
try {
    if (f.isDirectory()) {
        //
    }
} catch (IOException | InterruptedException x) {
    //
}

could block indefinitely due to problems in the network layer, being an example of one of the fundamental fallacies of distributed computing; and my request was for some variant like

FilePath f;
try {
    if (f.isDirectoryInterruptible()) {
        //
    }
} catch (IOException | InterruptedException | TimeoutException x) {
    //
}

or

FilePath f;
try {
    if (f.isDirectory(30, TimeUnit.SECONDS)) {
        //
    }
} catch (IOException | InterruptedException | TimeoutException x) {
    //
}

which would make explicit the fact that the network operation might hang for various arbitrary reasons and the caller must be prepared to deal with it.

@jglick jglick dismissed their stale review Jul 4, 2017

Premature to approve or reject PR until there is consensus on goals.

Merge branch 'master' into feature/JENKINS-44785-remoting-timeouts
# Conflicts:
#	pom.xml
#	src/main/java/hudson/remoting/Channel.java
#	src/main/java/hudson/remoting/RemoteInvocationHandler.java
#	src/main/java/hudson/remoting/Request.java
#	src/test/java/hudson/remoting/ChannelTest.java
@oleg-nenashev
Copy link
Member Author

left a comment

I merged it, almost no mistakes

@@ -553,10 +555,10 @@ public void handle(Command cmd) {
lastCommandReceivedAt = receivedAt;
if (logger.isLoggable(Level.FINE)) {
logger.fine("Received " + cmd);
} else if (logger.isLoggable(Level.FINER)) {
logger.log(Level.FINER, "Received command " + cmd, cmd.createdAt);

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Nov 12, 2017

Author Member

Merge defect To be fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.