IDLE can't be interrupted using idleTerm() #15

dinhvh · 2015-10-13T18:52:29Z

Here's a sample code:

  idleChannel := make(chan int8)
  for {
    interrupted := false
    log.Printf("before idle\n");
    _, idleErr := c.Idle()
    log.Printf("after idle %v\n", idling);

    go func() {
      log.Printf("receiving data\n");
      c.Recv(-1)
      log.Printf("received data xx\n");
      if (!interrupted) {
        idleChannel <- 1 // received
      }
    }()

    log.Printf("select")
    timeout := time.After(2 * time.Second)
        select {
    case value := <- idleChannel:
      if (value == 1) {
        log.Printf("received data %v\n", c.Data);
      }
      c.IdleTerm()
    case timeValue := <- timeout:
      log.Printf("interrupted %v\n", timeValue)
      interrupted = true
      c.IdleTerm()
      log.Printf("interrupted done\n")
    }
    log.Printf("out of select")
  }

When calling, c.IdleTerm() it will interrupt IDLE but the execution flow will be blocked at c.IdleTerm().
The log "interrupted done" won't show.

@iragsdale has a suggestion of fix here:
boxer@796926b

And it seems like it's the right thing to do:

don't try to read and parse the response in c.IdleTerm()
let c.Recv() do the receiving and parsing work

The text was updated successfully, but these errors were encountered:

mxk · 2015-10-13T20:34:52Z

Please post the output of your sample code along with the raw message log (imap.DefaultLogMask = imap.LogConn | imap.LogRaw).

dinhvh · 2015-10-13T22:18:13Z

Here are the logs:

C: BQMUP4 SELECT "INBOX"
S: * FLAGS (\Answered \Flagged \Draft \Deleted \Seen $Forwarded $Junk $MDNSent $MailFlagBit0 $NotJunk $NotPhishing $Phishing JunkRecorded NotJunk Seen)
S: * OK [PERMANENTFLAGS (\Answered \Flagged \Draft \Deleted \Seen $Forwarded $Junk $MDNSent $MailFlagBit0 $NotJunk $NotPhishing $Phishing JunkRecorded NotJunk Seen \*)] Flags permitted.
S: * OK [UIDVALIDITY 2] UIDs valid.
S: * 189 EXISTS
S: * 0 RECENT
S: * OK [UIDNEXT 207898] Predicted next UID.
S: * OK [HIGHESTMODSEQ 17940080]
S: BQMUP4 OK [READ-WRITE] INBOX selected. (Success)
2015/10/13 15:16:42 before idle
C: BQMUP5 IDLE
S: + idling
2015/10/13 15:16:42 after idle
2015/10/13 15:16:42 select
2015/10/13 15:16:42 receiving data
2015/10/13 15:16:44 interrupted 2015-10-13 15:16:44.535483765 -0700 PDT
C: DONE
S: BQMUP5 OK IDLE terminated (Success)
2015/10/13 15:16:44 received data xx

dinhvh · 2015-10-13T22:19:12Z

I also tried the patch for @iragsdale but I think there's still a race condition somewhere :(

iragsdale · 2015-10-13T22:42:37Z

Hmm, I've been using that code for quite a while handling a lot of clients. What doesn't appear to be working correctly?

dinhvh · 2015-10-13T23:51:17Z

I retried and when used properly, it works :)
Which probably means that the API for now is not so great.

iragsdale · 2015-10-13T23:56:13Z

Yeah, a better API overall might be to have the idle command take a channel that you can send to for canceling. I think I tried to make that work and had a hard time with it, but it was a while back and I don't recall.

mxk · 2015-10-14T01:18:37Z

The problem that you guys are running into is that it's not safe to call IdleTerm concurrently with Recv. The documentation says: "The Client and its Command objects cannot be used concurrently from multiple goroutines." You have one goroutine blocked in the Recv(-1) call and another one in IdleTerm, which also calls Recv via cmd.Result(OK). Only one will get the command completion response, while the other will remain blocked.

You aren't fixing this problem by removing the cmd.Result(OK) call. It's still not safe to call IdleTerm while a Recv is in progress. You would need to add mutexes in a number of places to ensure that any outstanding Recv calls are actually blocked on network I/O and are not in the middle of processing a new response. I started out with such design, but it quickly turned into a mess, so I went with the simpler implementation.

The best way to deal with this is to Recv for a short interval (e.g. 100ms), or to poll for new responses, and call IdleTerm from the same goroutine when you want to stop idling.

dinhvh · 2015-10-14T01:37:48Z

When coroutine are executing concurrently, I don't think we need mutex. We need probably need mutexes only when it's running in a thread.
Recv() for a short interval will be CPU intensive and it's likely that it won't scale.
Could you let us know what's not safe once we move the cmd.Result(OK) call?

mxk · 2015-10-14T11:46:51Z

Run that code with the race detector on. I've never tried it, but I'm betting that it will complain about a lot of things. The rest of the client state is unprotected, starting with concurrent access to c.tags and c.cmds. What you're doing is definitely not safe and will break in strange, unpredictable, and unreproducable ways later on. 100ms is an eternity for a CPU and there is nothing wrong in doing that.

dinhvh · 2015-10-14T15:20:21Z

My concern is that if we run ten thousands (or more) IDLE loops, 100ms won't look like a small number any more. I'm going to investigate the race detector. Thanks!

dinhvh · 2015-10-14T18:24:33Z

The race detector isn't showing anything.

iragsdale · 2015-10-14T22:44:41Z

@dinhviethoa I do believe you want to take precautions to use this safely. What we do looks something like the following:

idleCmd, err := c.Idle()
if err != nil {
    log.Printf("Error sending idle for %v: %v", &account, err)
    return err
}

// wait for a notification or a timeout in a goroutine, because the receive will block
// this allows us to cancel the idle if needed
serverChan := make(chan bool)
doneChan := make(chan bool)
waitc := c
go func() {
    select {
    // wait for cancel of the timeout
    case _ = <-cancel:
        waitc.IdleEnd()
        // wait for the main routine to continue
        <-serverChan
        // now read the result
        idleCmd.Result(imap.OK)
    // if we got here first, the server notified us or the timeout expired
    case <-serverChan:
        // this both sends the done command AND reads the result
        waitc.IdleTerm()
    }
    doneChan <- true
}()

// now wait for a response
err = c.Recv(25 * time.Minute)
if err != nil && err != imap.ErrTimeout {
    return err
}
serverChan <- true

// wait for the idle response to be read
<-doneChan

This could probably be cleaned up a bit, but what you see here is that if we hit the timeout, we trigger the end of the IDLE by sending the command to the server. After that we wait for the main thread receive to finish before reading the result and allowing the main thread to continue. That means the ONLY place we can use the client concurrently is to trigger send the end of the idle command.

We've run this against the race checker with no issue, and we've been using this to watch many thousands of accounts concurrently with good results and no crashes. It's a bit tricky to use properly, and I'm sure we could find some way to encapsulate that pattern a bit better, but I can confirm it works.

mxk · 2015-10-14T23:55:57Z

I just pushed a new interrupt branch, which I think may be the better solution to your problem. I haven't tested these changes, but the idea is to allow one goroutine to interrupt a blocked Recv call in another goroutine by sending an error via a channel. You would still need to use mutexes, conditions, or other channels to synchronize both goroutines so that only one is calling Client methods at any given time, but this should allow you to block on Recv indefinitely.

Test it out, let me know what you think. If you like this interface, I'll add some tests for it and merge it into master. I need to think about this a bit more, though. It's been a long time since I last looked at this code.

dinhvh · 2015-10-15T16:29:11Z

Thanks! I'm taking a look at it!

iragsdale · 2015-10-15T16:33:33Z

At first glance, this makes WAY more sense to me. I think it should be far simpler to use and also solves for the case where you send the term to the server but it isn't responding (which currently would continue to block the main thread indefinitely).

dinhvh · 2015-10-15T18:25:29Z

I just tried it. It works really well!

Here's an API suggestion:

Usage:

idleCmd, err := c.Idle()

idleData := make(chan bool)
go func() {
  err := c.IdlePoll();
  if (err == errorInterrupted) {
    return
  }
  idleData <- true
}()

timeout := time.After(pollDelay * time.Second)
select {
   case <- idleData:
     log.Printf("has data")
   case <- timeout:
     log.Printf("timeout")
     c.IdleInterrupt()
}

c.IdleTerm()

Approximate implementation:

// When creating the client, always register an interruption channel: c.interruptChan(c.idleChannel)

func (c *imap.Client) IdlePoll() (error err) { // I'm not sure whether we should return an imap command so that the user of the API could collect the data on the command instead of collecting it on imap.Client
        err = c.Recv(-1)
        return
}

func  (c *imap.Client) IdleInterrupt() {
        c.idleChannel <- errorInterrupted
}

dinhvh · 2015-10-19T21:28:31Z

@iragsdale are you planning to merge that change in https://github.com/boxer/go-imap ?

iragsdale · 2015-10-21T16:58:19Z

Maybe? We're not really doing any active development on the tool that uses this now, as it's currently running pretty stable and doing what we need it to do.

mikaa123 · 2016-10-21T21:50:52Z

Late to the party, but I'm also running into this issue.
I went with polling c.Recv(100), as @mxk suggested, and it works well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IDLE can't be interrupted using idleTerm() #15

IDLE can't be interrupted using idleTerm() #15

dinhvh commented Oct 13, 2015

mxk commented Oct 13, 2015

dinhvh commented Oct 13, 2015

dinhvh commented Oct 13, 2015

iragsdale commented Oct 13, 2015

dinhvh commented Oct 13, 2015

iragsdale commented Oct 13, 2015

mxk commented Oct 14, 2015

dinhvh commented Oct 14, 2015

mxk commented Oct 14, 2015

dinhvh commented Oct 14, 2015

dinhvh commented Oct 14, 2015

iragsdale commented Oct 14, 2015

mxk commented Oct 14, 2015

dinhvh commented Oct 15, 2015

iragsdale commented Oct 15, 2015

dinhvh commented Oct 15, 2015

dinhvh commented Oct 19, 2015

iragsdale commented Oct 21, 2015

mikaa123 commented Oct 21, 2016

IDLE can't be interrupted using idleTerm() #15

IDLE can't be interrupted using idleTerm() #15

Comments

dinhvh commented Oct 13, 2015

mxk commented Oct 13, 2015

dinhvh commented Oct 13, 2015

dinhvh commented Oct 13, 2015

iragsdale commented Oct 13, 2015

dinhvh commented Oct 13, 2015

iragsdale commented Oct 13, 2015

mxk commented Oct 14, 2015

dinhvh commented Oct 14, 2015

mxk commented Oct 14, 2015

dinhvh commented Oct 14, 2015

dinhvh commented Oct 14, 2015

iragsdale commented Oct 14, 2015

mxk commented Oct 14, 2015

dinhvh commented Oct 15, 2015

iragsdale commented Oct 15, 2015

dinhvh commented Oct 15, 2015

dinhvh commented Oct 19, 2015

iragsdale commented Oct 21, 2015

mikaa123 commented Oct 21, 2016