3.5.5 recover gets stuck in waiting for cancel and eventually exists with recover error #214

NiclasLindgren · 2020-12-02T17:19:03Z

If you simulate network issues by adding a IOException in send you will notice the the JmDns won't finalize its recover state but instead call the delegate letting it now it couldn't recover, because, at least from what it seems, the canceller tasks won't finish (it is cancelled) so you are stuck in Cancel_1 or Cancel_2 (if you had a packet to send on first cancel)

These leave JmDns in a state where you can't restart it because the Timer/Task threads (2) are left running so you leak thread if you create a new instance. Obviously the current instance has stopped announcing and receiving as the multicast socket is closed in the recover code.

Need some help to figure out how to correct this as the state machine isn't obvious, but it seems to me the HostInfo state is deassociated incorrectly during cancel (or maybe not move to cancelled state when it happens).

NiclasLindgren · 2020-12-02T19:29:56Z

I think the problem is here

        if (!out.isEmpty()) {
            logger.debug("{}.run() JmDNS {} #{}", this.getName(), this.getTaskDescription(), this.getTaskState());
            this.getDns().send(out);

            // Advance the state of objects.
            this.advanceObjectsState(stateObjects);

When send returns an IOException, advanceObjectsState isn't called, instead recoverTask is called on the Canceler, which stops, so HostInfoState never reaches cancelled.

So if the canceller instead does (changed code)
protected void recoverTask(Throwable e) {
if (this.getTaskState().isCanceling()) {
this.getDns().advanceState(this);
} else {
this.getDns().recover();
}
}

It works, but it doesn't seem right, perhaps JmDnsImpl should advance the state in

    // We have an IO error so lets try to recover if anything happens lets close it.
    // This should cover the case of the IP address changing under our feet
    if (this.isClosing() || this.isClosed() || this.isCanceling() || this.isCanceled()) {
        return;
    }

Instead of just returning?

NiclasLindgren · 2020-12-02T19:45:57Z

To repro just put a throw new IOException("fake") in JmDNSImpl.send instead of ms.send(packet)

NiclasLindgren · 2020-12-03T16:17:46Z

Another issue is that when going in and out of hibernate Linux can remove network interfaces and you get the exception "No such device", the only way out of that is to call

            _interfaze = NetworkInterface.getByInetAddress(_address);

again before opening the socket in recover else you will get exception in recovering and the state machine will be stuck.

NiclasLindgren · 2020-12-03T16:55:24Z

It seems if you call closeMulticastSocket() on any exception in

public void send(DNSOutgoing out) throws IOException {

before throwing the initial exception, recover won't get stuck

NiclasLindgren · 2020-12-04T12:57:29Z

It also thinks it has incorrectly recovered if this happens

[local6.warni] 23:32:46,868 jmdns.impl.JmDNSImpl Creating multicast socket on interface name:eth0 (eth0)
[local6.warni] 23:32:46,872 jmdns.impl.HostInfo Find new interface for address /192.168.3.30
[local6.warni] 23:32:46,873 jmdns.impl.JmDNSImpl Creating multicast socket on new interface null
[local6.warni] 23:32:46,874 jmdns.impl.JmDNSImpl cts-va-20041634.recover() Start services exception
[local6.warni] java.net.SocketException: bad argument for IP_MULTICAST_IF2
[local6.warni] at java.net.AbstractPlainDatagramSocketImpl.setOption(Unknown Source)
[local6.warni] at java.net.MulticastSocket.setNetworkInterface(Unknown Source)
[local6.warni] at javax.jmdns.impl.JmDNSImpl.openMulticastSocket(JmDNSImpl.java:472)
[local6.warni] at javax.jmdns.impl.JmDNSImpl.__recover(JmDNSImpl.java:1883)
[local6.warni] at javax.jmdns.impl.JmDNSImpl$6.run(JmDNSImpl.java:1836)
[local6.warni] 23:32:46,876 jmdns.impl.JmDNSImpl cts-va-20041634.recover() We are back!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.5.5 recover gets stuck in waiting for cancel and eventually exists with recover error #214

3.5.5 recover gets stuck in waiting for cancel and eventually exists with recover error #214

NiclasLindgren commented Dec 2, 2020

NiclasLindgren commented Dec 2, 2020

NiclasLindgren commented Dec 2, 2020

NiclasLindgren commented Dec 3, 2020

NiclasLindgren commented Dec 3, 2020

NiclasLindgren commented Dec 4, 2020

3.5.5 recover gets stuck in waiting for cancel and eventually exists with recover error #214

3.5.5 recover gets stuck in waiting for cancel and eventually exists with recover error #214

Comments

NiclasLindgren commented Dec 2, 2020

NiclasLindgren commented Dec 2, 2020

NiclasLindgren commented Dec 2, 2020

NiclasLindgren commented Dec 3, 2020

NiclasLindgren commented Dec 3, 2020

NiclasLindgren commented Dec 4, 2020