[JENKINS-52165] Controller.watch API #60

jglick · 2018-02-09T17:38:53Z

JENKINS-52165 Just the minimum changes pulled out of #29 to enable push rather than pull logging from durable tasks.

svanoort · 2018-02-09T23:39:06Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+    private static ScheduledExecutorService watchService;
+    private synchronized static ScheduledExecutorService watchService() {
+        if (watchService == null) {
+            watchService = new /*ErrorLogging*/ScheduledThreadPoolExecutor(5, new NamingThreadFactory(new DaemonThreadFactory(), "FileMonitoringTask watcher"));


TODO Use the new ClassLoaderSanityThreadFactory when available from core

jglick · 2018-02-13T18:10:46Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+                    try (FileChannel ch = FileChannel.open(Paths.get(logFile.getRemote()), StandardOpenOption.READ)) {
+                        CountingInputStream cis = new CountingInputStream(Channels.newInputStream(ch.position(lastLocation)));
+                        handler.output(cis);
+                        lastLocationFile.write(Long.toString(lastLocation + cis.getByteCount()), null);


Seems that this would not suffer from JENKINS-37575-like problems. We intentionally write lastLocationFile only after a successful call to output, so that if there is an agent channel outage we do not lose output. In such a case this Watcher simply dies; if and when the agent is reconnected, a new one should start at the same position.

oleg-nenashev

I do not see blocker issues here, but IMHO the code could be improved a bit. I will re-review after discussing comments with Jesse

oleg-nenashev · 2018-03-14T01:22:04Z

src/main/java/org/jenkinsci/plugins/durabletask/Controller.java

+     * @param handler a remotable callback
+     * @param listener a remotable destination for messages
+     * @throws UnsupportedOperationException when this mode is not available, so you must fall back to polling {@link #writeLog} and {@link #exitStatus(FilePath, Launcher)}
+     */


IOException needs to be documented. jenkinsci/workflow-durable-task-step-plugin#65 suggests that it happens iff the node is offline, but there is no evidence of that here

Yes that could be clarified.

oleg-nenashev · 2018-03-14T01:22:57Z

src/main/java/org/jenkinsci/plugins/durabletask/Controller.java

@@ -86,7 +100,7 @@
     * Intended for use after {@link #exitStatus(FilePath, Launcher)} has returned a non-null status.
     * The result is undefined if {@link DurableTask#captureOutput} was not called before launch; generally an {@link IOException} will result.
     * @param workspace the workspace in use
-     * @param launcher a way to start processes
+     * @param launcher a way to start processes (currently unused)


oleg-nenashev · 2018-03-14T01:23:58Z

src/main/java/org/jenkinsci/plugins/durabletask/Controller.java

@@ -55,7 +69,7 @@
    /**
     * Checks whether the task has finished.
     * @param workspace the workspace in use
-     * @param launcher a way to start processes
+     * @param launcher a way to start processes (currently unused)


The method is overridable, and the downstream implementation may actually use it but inherit Javadocs. So I would rather not touch it

The parameter is no longer used in any existing implementation and it likely never will be, but harms nothing to leave it there.

oleg-nenashev · 2018-03-14T01:24:55Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+    /**
+     * Tails a log file and watches for an exit status file.
+     * Must be remotable so that {@link #watch} can transfer the implementation.
+     */
    protected static class FileMonitoringController extends Controller {


SerializableOnlyOverRemoting once it gets to 2.107+ ? Not the best API, but we do not have better annotations now (should be fixed eventually once we get to any kind of static analysis)

Well SerializableOnlyOverRemoting indicates that it should not be serializable except when remoted. For this purpose I would want to emphasize that it must be serializable when remoted; whether it is also serializable in other contexts is of no relevance. We do not have such a marker interface.

(For this feature I also need a way of emphasizing that a ConsoleLogFilter must be serializable over Remoting.)

oleg-nenashev · 2018-03-14T01:28:21Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+        /**
+         * Like {@link #exitStatus(FilePath, Launcher, TaskListener)} but not requesting a {@link Launcher}, which would not be available in {@link #watch} mode anyway.
+         */
+        protected Integer exitStatus(FilePath workspace, TaskListener listener) throws IOException, InterruptedException {


Add CheckForNull so nobody unboxes it to int with FindBugs being silent?

Could do that, but actually I think I want to mark this

@Restricted(ProtectedExternally.class)

as a reminder that only this class is expected to be calling it.

Never mind—plain protected already accomplishes that. Can annotate nullness.

oleg-nenashev · 2018-03-14T01:38:03Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+                    lastLocation = Long.parseLong(lastLocationFile.readToString());
+                }
+                FilePath logFile = controller.getLogFile(workspace);
+                long len = logFile.length();


Maybe check the file's existence just in case? otherwise the watcher will give up immediately (IOException ). But IIUC the code it may want to wait to avoid race conditions. Or not?

If the file does not exist (because the wrapper script is still launching, or failed to launch), length() will be zero, thus nothing will happen on this iteration and we check again in another 100ms.

oleg-nenashev · 2018-03-14T01:39:46Z

src/main/java/org/jenkinsci/plugins/durabletask/Handler.java

+ * If it needs to communicate with the master, you may use {@link VirtualChannel#export}.
+ * @see Controller#watch
+ */
+public abstract class Handler implements Serializable {


SerializableOnlyOverRemoting after 2.107+

Perhaps, yes. As discussed above.

oleg-nenashev · 2018-03-14T01:42:04Z

src/main/java/org/jenkinsci/plugins/durabletask/Handler.java

+     * though in the typical case of line-oriented output it is likely that it will end in a newline.
+     * <p>Buffering is the responsibility of the caller, and {@link InputStream#markSupported} may be false.
+     * @param stream a way to read process output which has not already been handled
+     * @throws Exception if anything goes wrong, this watch is deactivated


I would rather use a more narrow exception type in throws so that Handler implementers do not start throwing RuntimeExceptions. But you will have to catch them when invoking Handler methods anyway...

I do not really care what type is thrown. If anything is caught, the watch task shuts down.

oleg-nenashev · 2018-03-14T01:43:21Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

            try (InputStream is = getOutputFile(workspace).read()) {
                return IOUtils.toByteArray(is);
            }
        }

        @Override public final void stop(FilePath workspace, Launcher launcher) throws IOException, InterruptedException {
            launcher.kill(Collections.singletonMap(COOKIE, cookieFor(workspace)));
+            // TODO after 10s, if the control dir still exists, write a flag file and have the Watcher shut down (interrupting any ongoing handler.output call if possible)


Makes sense to add logging at least

oleg-nenashev · 2018-03-14T01:46:11Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+                    watchService().schedule(this, 100, TimeUnit.MILLISECONDS);
+                }
+            } catch (Exception x) {
+                // note that LOGGER here is going to the agent log, not master log


so Runtime exceptions will be non-recoverable here... Looks fine assuming that we want Handler implementations to be written properly.

If there is an error because the channel is closed, then if and when a new agent is started, a new watcher should be created; so, fine. If there is some other kind of error, then yes this is fatal—we stop sending status updates from this process.

If there is a non-remoting error, then it looks like the build will keep running until controller.exitStatus returns null in DurableTaskStep.Execution#check. Would it helpful/possible to add some way to indicate to DurableTaskStep that logging appears to be broken? A passing build with broken logs would be strange, although failing the build because logging is broken doesn't seem any better.

If the channel is closed, then everything should retry in DurableTask#getWorkspace during the next call to DurableTaskStep.Execution#check so that seems fine to me, although the 5 minute recurrence period seems a little long.

Would it helpful/possible to add some way to indicate to DurableTaskStep that logging appears to be broken?

Possibly. I will think about another method in Handler for this purpose. As you note, in the case of a closed channel, the agent likely cannot communicate that error by definition, and getWorkspace is responsible for fixing things up a few minutes later. But in the case of a cloudy log implementation it could be useful to indicate to the master that some other error occurred, logs may be faulty, and we should rewatch when possible. Whether anything else in the build gets successfully logged at that point, who knows.

A passing build with broken logs would be strange, although failing the build because logging is broken doesn't seem any better.

Yeah it seems a bit of a toss-up. My general expectation is that if something strange happens without a clear recovery procedure, we should err on the side of just failing the build.

jenkinsci/workflow-durable-task-step-plugin@c859378 may produce better outcomes for cloudy log sinks. Without a test case it is hard to predict exactly what the failure modes will be, much less what response a user would actually find helpful.

ExecutorStepTest.shellOutputAcrossRestart does now exercise the code path under discussion in the case of the stock Remoting sink. The aforementioned commit does not change the behavior in that test, because it is indeed a ChannelClosedException that you get when the agent is disconnected from the master side. (If you just kill the agent JVM via either SIGTERM or SIGKILL, then the watcher task dies with it and you are also OK.)

oleg-nenashev

Requesting changes to indicate there are comments to be discussed before merging.

jglick

Some responses. Certain things will need to be cleaned up when I revisit these PRs.

jglick · 2018-03-27T21:56:03Z

src/main/java/org/jenkinsci/plugins/durabletask/Controller.java

+     * @param handler a remotable callback
+     * @param listener a remotable destination for messages
+     * @throws UnsupportedOperationException when this mode is not available, so you must fall back to polling {@link #writeLog} and {@link #exitStatus(FilePath, Launcher)}
+     */


Yes that could be clarified.

jglick · 2018-03-27T21:56:59Z

src/main/java/org/jenkinsci/plugins/durabletask/Controller.java

@@ -55,7 +69,7 @@
    /**
     * Checks whether the task has finished.
     * @param workspace the workspace in use
-     * @param launcher a way to start processes
+     * @param launcher a way to start processes (currently unused)


The parameter is no longer used in any existing implementation and it likely never will be, but harms nothing to leave it there.

jglick · 2018-03-27T21:59:32Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+    /**
+     * Tails a log file and watches for an exit status file.
+     * Must be remotable so that {@link #watch} can transfer the implementation.
+     */
    protected static class FileMonitoringController extends Controller {


Well SerializableOnlyOverRemoting indicates that it should not be serializable except when remoted. For this purpose I would want to emphasize that it must be serializable when remoted; whether it is also serializable in other contexts is of no relevance. We do not have such a marker interface.

(For this feature I also need a way of emphasizing that a ConsoleLogFilter must be serializable over Remoting.)

jglick · 2018-03-27T22:00:58Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+        /**
+         * Like {@link #exitStatus(FilePath, Launcher, TaskListener)} but not requesting a {@link Launcher}, which would not be available in {@link #watch} mode anyway.
+         */
+        protected Integer exitStatus(FilePath workspace, TaskListener listener) throws IOException, InterruptedException {


Could do that, but actually I think I want to mark this

@Restricted(ProtectedExternally.class)

as a reminder that only this class is expected to be calling it.

jglick · 2018-03-27T22:02:25Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+        /**
+         * Like {@link #getOutput(FilePath, Launcher)} but not requesting a {@link Launcher}, which would not be available in {@link #watch} mode anyway.
+         */
+        protected byte[] getOutput(FilePath workspace) throws IOException, InterruptedException {


Not sure I follow what you are saying.

To be clear, this overload needed to be introduced so it can be called from the watch task, which cannot produce a (usable) Launcher; the original overload was not using its launcher parameter anyway.

jglick · 2018-03-27T22:04:38Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+                    lastLocation = Long.parseLong(lastLocationFile.readToString());
+                }
+                FilePath logFile = controller.getLogFile(workspace);
+                long len = logFile.length();


If the file does not exist (because the wrapper script is still launching, or failed to launch), length() will be zero, thus nothing will happen on this iteration and we check again in another 100ms.

jglick · 2018-03-27T22:06:27Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+                    watchService().schedule(this, 100, TimeUnit.MILLISECONDS);
+                }
+            } catch (Exception x) {
+                // note that LOGGER here is going to the agent log, not master log


If there is an error because the channel is closed, then if and when a new agent is started, a new watcher should be created; so, fine. If there is some other kind of error, then yes this is fatal—we stop sending status updates from this process.

jglick · 2018-03-27T22:07:04Z

src/main/java/org/jenkinsci/plugins/durabletask/Handler.java

+ * If it needs to communicate with the master, you may use {@link VirtualChannel#export}.
+ * @see Controller#watch
+ */
+public abstract class Handler implements Serializable {


Perhaps, yes. As discussed above.

jglick · 2018-03-27T22:08:02Z

src/main/java/org/jenkinsci/plugins/durabletask/Handler.java

+     * though in the typical case of line-oriented output it is likely that it will end in a newline.
+     * <p>Buffering is the responsibility of the caller, and {@link InputStream#markSupported} may be false.
+     * @param stream a way to read process output which has not already been handled
+     * @throws Exception if anything goes wrong, this watch is deactivated


I do not really care what type is thrown. If anything is caught, the watch task shuts down.

jglick · 2018-08-07T21:13:28Z

src/test/java/org/jenkinsci/plugins/durabletask/BourneShellScriptTest.java

-        assertEncoding(description, file, charset, expected, expectedEncoding, true, dockerWS, dockerLauncher);
-    }
-    private void assertEncoding(String description, String file, String charset, String expected, String expectedEncoding, boolean output, FilePath dockerWS, Launcher dockerLauncher) throws Exception {
-        System.err.println(description + " (output=" + output + ")"); // TODO maybe this should just be moved into a new class and @RunWith(Parameterized.class) for clarity


Did that refactoring, adding the watch parameter in the process.

…nly what Handler.output actually received, in case of some exception halfway followed by a fresh agent connection.

svanoort · 2018-08-08T19:23:59Z

src/main/java/org/jenkinsci/plugins/durabletask/Controller.java

+     * @throws IOException if initiating the watch fails, for example with a {@link ChannelClosedException}
+     * @throws UnsupportedOperationException when this mode is not available, so you must fall back to polling {@link #writeLog} and {@link #exitStatus(FilePath, Launcher)}
+     */
+    public void watch(@Nonnull FilePath workspace, @Nonnull Handler handler, @Nonnull TaskListener listener) throws IOException, InterruptedException, UnsupportedOperationException {


Speculative, but what's the expected behavior in the face of an InterruptedException here?

During DurableTaskStep.Execution.start: the sh step would fail. During getWorkspace: we retry later.

dwnusbaum · 2018-08-08T19:28:06Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+                        try {
+                            handler.output(utf8EncodedStream);
+                        } finally {
+                            lastLocationFile.write(Long.toString(lastLocation + cis.getByteCount()), null);


Could you use ch.position() instead and get rid of CountingInputStream entirely, or am I missing something?

Probably. My concern here is to have last-location.txt record the last position that was actually delivered to the Handler, in case something broke the streaming midway. It is maybe clearer that this would correspond to how much of the locallyEncodedStream was processed as opposed to the behavior of newInputStream, but OTOH that method does claim that the stream is unbuffered so I guess it should work and be simpler. (And anyway I have not managed to test this scenario.) I will try it.

svanoort · 2018-08-08T19:28:15Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+        /**
+         * Like {@link #getOutput(FilePath, Launcher)} but not requesting a {@link Launcher}, which would not be available in {@link #watch} mode anyway.
+         */
+        protected byte[] getOutput(FilePath workspace) throws IOException, InterruptedException {


If we're going to rework the way output is done, I'd really like to return a stream-able data implementation rather than raw byte[] - this ensures we're not allocating giant wads of memory to collect all the data if something produces logs quickly. Whether it's a Stream, Channel, whatever I do not care as much.

Consider also something like 'writeOutput' that internally feeds data to a sink.

It's also just plain more elegant and easier to work with in general.

Actually... we could still count location that way using a CountingOutputStream I believe.

Was already discussed earlier. This method is only used when you specify returnStdout: true, a fairly specialized case which is anyway only suited for small amounts of output—in the case of a workflow-cps caller, the entire resulting String gets saved to program.dat.

Damnit, we've gone in circles on these PRs so long that I'm repeating myself -- at least I'm consistent though! 😆

svanoort · 2018-08-08T19:41:19Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+    private synchronized static ScheduledExecutorService watchService() {
+        if (watchService == null) {
+            // TODO 2.105+ use ClassLoaderSanityThreadFactory
+            watchService = new /*ErrorLogging*/ScheduledThreadPoolExecutor(5, new NamingThreadFactory(new DaemonThreadFactory(), "FileMonitoringTask watcher"));


Wish we could use a thread pool that grows as needed for load (with an upper limit) but still supports scheduling. Sigh.

( I know, I know, not part of the standard library )

svanoort · 2018-08-08T19:43:04Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+        /**
+         * Like {@link #getOutput(FilePath, Launcher)} but not requesting a {@link Launcher}, which would not be available in {@link #watch} mode anyway.
+         */
+        protected byte[] getOutput(FilePath workspace) throws IOException, InterruptedException {
            return getOutputFile(workspace).act(new MasterToSlaveFileCallable<byte[]>() {
                @Override public byte[] invoke(File f, VirtualChannel channel) throws IOException, InterruptedException {
                    byte[] buf = FileUtils.readFileToByteArray(f);


Sigh, so much wish for streaming use.

Ditto, unnecessary here.

…CountingInputStream.

svanoort · 2018-08-08T20:29:31Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+                        try {
+                            handler.output(utf8EncodedStream);
+                        } finally {
+                            lastLocationFile.write(Long.toString(ch.position()), null);


🐜 Noted that without atomic writes here, if the agent is restarted during writing, then we lose the position within the log and re-send I think the entire log. Potential for looping if somehow we get repeated failures (very unlikely though).

What if the handler throws an exception (likely IOException) while writing the output and the output isn't actually saved? Suppose, for example that we can't connect to whatever is supposed to receive the log output.

Wait, at that point will we bomb out completely and never try to send the log again? Gah, this code is literally giving me headaches to follow the flow of execution because it's so indirect.

@jglick C'mon man, a few more explanatory comments please? I almost think the overall flow merits a flow chart so you can see what's passed back and forth.

Let me say what I expect: if the handler throws an exception (probably IOExeption) while trying to handle output, or takes longer than some timeout period, we break out (without mutating the lastLocationFile), and retry at least once after an appropriate delay.

At some point if issues are resolved so the handler passes, we should (via mechanism unspecified) be able to get the entire log re-sent from the point of failure, including any new content that was added.

What if the handler throws an exception (likely IOException) while writing the output and the output isn't actually saved?

Then we assume the log sink is hopeless and the Watcher task dies, after recording the last position it at least attempted to deliver. If and when the agent is reconnected, a new watch call will be made and we will try to resume.

a few more explanatory comments please

ack

if the handler throws an exception (probably IOExeption)

I need to note in a comment that the default handler does not currently ever throw IOException as a result of a sink issue (only as a result of a local file error).

Changed as of jenkinsci/workflow-durable-task-step-plugin@62aeb45.

svanoort · 2018-08-08T22:52:17Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+                        InputStream locallyEncodedStream = Channels.newInputStream(ch.position(lastLocation));
+                        InputStream utf8EncodedStream = cs == null ? locallyEncodedStream : new ReaderInputStream(new InputStreamReader(locallyEncodedStream, cs), StandardCharsets.UTF_8);
+                        try {
+                            handler.output(utf8EncodedStream);


@jglick Is there an enclosing Timeout somewhere here so we can't hang indefinitely? May be somewhere non-obvious since this has so many levels of indirection that my brain is throwing a StackOverflowException trying to process it, but if there isn't one there should be, and if there is one please note it somewhere obvious so this doesn't get flagged again.

Note also: as long as you haven't advanced the location marker it should be safe to try re-sending later... assuming this doesn't cause the whole thing to bomb out.

Is there an enclosing Timeout somewhere here so we can't hang indefinitely?

No. Unlike the code path in polling mode, which wraps everything of interest in a Timeout, this is happening on the agent, so no master thread is being blocked. We no more apply a timeout to that than we would to any plugin code which anywhere calls something like

listener.getLogger().println("Whatever");

If it hangs for a while, it hangs for a while. If and when it recovers, it will proceed, and reschedule itself after 100ms as usual. If the master-side 5m sanity poll comes around (twice) and sees that the process has already exited yet no exit event was received, the step gets aborted.

What if it halts? I'm not clear what the result is there, and what if that happens a bunch of times in a row?

Not sure what you mean by “it halts” in this context.

svanoort

Blocking to reflect comments that need addressing -- and, uh, a very unclear and perhaps incorrect handling of cases where you encounter a transitory exception that prevents sending log output via Handler.output -- but which is recoverable.

Which sounds obscure except that with networks, it could happen pretty regularly due to hiccups.

Also: my review isn't done per se yet, this is kind of the first pass (I will need to sit down and actually diagram the flow from start to end probably).

jglick

It is instructive to watch ExecutorStepTest.buildShellScriptAcrossDisconnect in jenkinsci/workflow-durable-task-step-plugin#63, as I walked @dwnusbaum through.

jglick · 2018-08-09T13:49:45Z

src/main/java/org/jenkinsci/plugins/durabletask/Controller.java

@@ -86,7 +100,7 @@
     * Intended for use after {@link #exitStatus(FilePath, Launcher)} has returned a non-null status.
     * The result is undefined if {@link DurableTask#captureOutput} was not called before launch; generally an {@link IOException} will result.
     * @param workspace the workspace in use
-     * @param launcher a way to start processes
+     * @param launcher a way to start processes (currently unused)


jglick · 2018-08-09T13:52:32Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+                        InputStream locallyEncodedStream = Channels.newInputStream(ch.position(lastLocation));
+                        InputStream utf8EncodedStream = cs == null ? locallyEncodedStream : new ReaderInputStream(new InputStreamReader(locallyEncodedStream, cs), StandardCharsets.UTF_8);
+                        try {
+                            handler.output(utf8EncodedStream);


Is there an enclosing Timeout somewhere here so we can't hang indefinitely?

No. Unlike the code path in polling mode, which wraps everything of interest in a Timeout, this is happening on the agent, so no master thread is being blocked. We no more apply a timeout to that than we would to any plugin code which anywhere calls something like

listener.getLogger().println("Whatever");

If it hangs for a while, it hangs for a while. If and when it recovers, it will proceed, and reschedule itself after 100ms as usual. If the master-side 5m sanity poll comes around (twice) and sees that the process has already exited yet no exit event was received, the step gets aborted.

jglick · 2018-08-09T13:59:20Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+                        try {
+                            handler.output(utf8EncodedStream);
+                        } finally {
+                            lastLocationFile.write(Long.toString(ch.position()), null);


What if the handler throws an exception (likely IOException) while writing the output and the output isn't actually saved?

Then we assume the log sink is hopeless and the Watcher task dies, after recording the last position it at least attempted to deliver. If and when the agent is reconnected, a new watch call will be made and we will try to resume.

a few more explanatory comments please

ack

…current write succeeded (err on the side of duplication).

dwnusbaum · 2018-08-10T20:40:32Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+                        InputStream locallyEncodedStream = Channels.newInputStream(ch.position(lastLocation));
+                        InputStream utf8EncodedStream = cs == null ? locallyEncodedStream : new ReaderInputStream(new InputStreamReader(locallyEncodedStream, cs), StandardCharsets.UTF_8);
+                        handler.output(utf8EncodedStream);
+                        lastLocationFile.write(Long.toString(ch.position()), null);


I think a comment explaining that this is intentionally not in a finally block so that we duplicate data rather than skip it when an error occurs would be helpful.

Just added that.

dwnusbaum · 2018-08-10T20:56:18Z

src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java

+                    watchService().schedule(this, 100, TimeUnit.MILLISECONDS);
+                }
+            } catch (Exception x) {
+                // note that LOGGER here is going to the agent log, not master log


If there is a non-remoting error, then it looks like the build will keep running until controller.exitStatus returns null in DurableTaskStep.Execution#check. Would it helpful/possible to add some way to indicate to DurableTaskStep that logging appears to be broken? A passing build with broken logs would be strange, although failing the build because logging is broken doesn't seem any better.

If the channel is closed, then everything should retry in DurableTask#getWorkspace during the next call to DurableTaskStep.Execution#check so that seems fine to me, although the 5 minute recurrence period seems a little long.

stale

dwnusbaum

I don't have any unaddressed concerns now that we attempt to retry after non-remoting failures in the default Handler in workflow-durable-task.

svanoort

Seems fine to me now, AFAICT.

[JENKINS-38381] Controller.watch API.

b945cf2

jglick requested review from abayer, rsandell, svanoort and oleg-nenashev February 9, 2018 17:38

jglick mentioned this pull request Feb 9, 2018

[JENKINS-52165] Calling Controller.watch API jenkinsci/workflow-durable-task-step-plugin#63

Merged

svanoort reviewed Feb 9, 2018

View reviewed changes

jglick added 2 commits February 13, 2018 09:51

Reminder to pick up jenkinsci/jenkins#3272.

27f957f

Merge branch 'UTF-8-JENKINS-31096' into watch-plus-UTF-8-JENKINS-31096

cc6029c

jglick mentioned this pull request Feb 13, 2018

[JENKINS-38381] API to receive asynchronous notifications #29

Closed

jglick added 3 commits February 13, 2018 12:39

Integrating transcoding into watches.

5d7ea7f

Merge branch 'UTF-8-JENKINS-31096' into watch-plus-UTF-8-JENKINS-31096

6141407

Completing test coverage.

deb425d

jglick mentioned this pull request Feb 13, 2018

Watching plus Unicode #62

Closed

jglick commented Feb 13, 2018

View reviewed changes

jglick mentioned this pull request Feb 19, 2018

[JEP-206] Define API for gathering command output in a local encoding #61

Merged

Merge branch 'UTF-8-JENKINS-31096' into watch-plus-UTF-8-JENKINS-31096

625da89

jglick mentioned this pull request Mar 5, 2018

Better fix to [JENKINS-25519] -- avoid errors when trying to read exitStatus from durable task before finished writing #66

Merged

jglick added 3 commits March 6, 2018 10:04

Merge with jenkinsci#66.

814047d

Merge branch 'watch-JENKINS-38381' into watch-plus-UTF-8-JENKINS-31096

d3e2afb

Merge branch 'UTF-8-JENKINS-31096' into watch-plus-UTF-8-JENKINS-31096

768bf09

jglick mentioned this pull request Mar 12, 2018

[FIXED JENKINS-26521] Add activity flag to timeout step jenkinsci/workflow-basic-steps-plugin#62

Merged

oleg-nenashev reviewed Mar 14, 2018

View reviewed changes

oleg-nenashev self-requested a review March 14, 2018 01:47

oleg-nenashev mentioned this pull request Mar 14, 2018

Watching plus Unicode jenkinsci/workflow-durable-task-step-plugin#65

Closed

oleg-nenashev previously requested changes Mar 26, 2018

View reviewed changes

jglick commented Mar 27, 2018

View reviewed changes

oleg-nenashev added the work-in-progress label Apr 2, 2018

jglick added 2 commits June 8, 2018 16:02

Merge branch 'incrementals' into watch-JENKINS-38381

9f6e2ec

Suggestions from @oleg-nenashev.

5ac713d

jglick added 2 commits August 7, 2018 16:07

Merge branch 'master' into watch-JENKINS-38381

b40fbed

Split out a parameterized EncodingTest for clarity.

c46fc88

jglick commented Aug 7, 2018

View reviewed changes

jglick added 2 commits August 7, 2018 17:17

Adding comments to clarify discussion with @oleg-nenashev.

147acaf

[JENKINS-37575] As in jenkinsci#74, last-location.txt should record o…

399e3eb

…nly what Handler.output actually received, in case of some exception halfway followed by a fresh agent connection.

jglick mentioned this pull request Aug 8, 2018

Use of non-ASCII-embedding encodings jenkinsci/jep#173

Merged

merge

3ba5eb8

svanoort reviewed Aug 8, 2018

View reviewed changes

dwnusbaum reviewed Aug 8, 2018

View reviewed changes

svanoort reviewed Aug 8, 2018

View reviewed changes

@dwnusbaum noticed that code could be simplified a bit by dropping a …

e3119d5

…CountingInputStream.

svanoort reviewed Aug 8, 2018

View reviewed changes

Merge branch 'master' into watch-JENKINS-38381

8feca39

svanoort reviewed Aug 8, 2018

View reviewed changes

svanoort previously requested changes Aug 8, 2018

View reviewed changes

jglick commented Aug 9, 2018

View reviewed changes

Reverting 399e3eb; seems to be better to write last-location iff the …

b077ce3

…current write succeeded (err on the side of duplication).

dwnusbaum reviewed Aug 10, 2018

View reviewed changes

Comments.

dccb646

jglick requested review from svanoort and dwnusbaum August 10, 2018 21:47

dwnusbaum approved these changes Aug 14, 2018

View reviewed changes

jglick mentioned this pull request Aug 25, 2018

[JENKINS-37341] z/OS Unix System Services enablement #80

Merged

Merge branch 'master' into watch-JENKINS-38381

c1ee660

svanoort approved these changes Sep 6, 2018

View reviewed changes

svanoort merged commit a224295 into jenkinsci:master Sep 13, 2018

jglick deleted the watch-JENKINS-38381 branch September 13, 2018 18:31

This was referenced Oct 19, 2018

[JENKINS-54133] Encode console notes on master for JEP-210 compatibility jenkinsci/ant-plugin#32

Merged

Use the standard Timer rather than our own ScheduledExecutorService #84

Merged

[JENKINS-52165] Controller.watch API #60

[JENKINS-52165] Controller.watch API #60

Conversation

jglick commented Feb 9, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oleg-nenashev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwnusbaum Aug 10, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oleg-nenashev left a comment

Choose a reason for hiding this comment

jglick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

svanoort Aug 8, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

svanoort left a comment

Choose a reason for hiding this comment

jglick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwnusbaum Aug 10, 2018 • edited

jglick commented Feb 9, 2018 •

edited

dwnusbaum Aug 10, 2018 •

edited

svanoort Aug 8, 2018 •

edited

dwnusbaum Aug 10, 2018 •

edited