Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay before entering listen mode with multithreading #1013

Closed
monkbroc opened this issue Jun 1, 2016 · 23 comments · Fixed by #1105

Comments

@monkbroc
Copy link
Member

commented Jun 1, 2016

Entering listen mode through WiFi.listen() in multithreading has a delay to wait for the current connection attempt to finish. The system firmware should cancel the connection attempt before entering listen mode.

Calling WiFi.disconnect() first from the app isn't sufficient because the system thread races to reconnect before the app has time to call WiFi.listen().

A workaround is to call Particle.disconnect() before WiFi.disconnect() and Particle.connect() after listen mode is done.

Test case to reproduce behavior (tested with Photon 0.5.1):

  • Configure a wifi network with a bogus name into the Photon
  • Ground D0 to call WiFi.listen() while the connection attempt is ongoing
#include "application.h"

SYSTEM_THREAD(ENABLED);
SYSTEM_MODE(AUTOMATIC);

void reconnectAfterListen() {
  Particle.connect();
}

void setup() {
  System.on(wifi_listen_end, reconnectAfterListen);
  pinMode(D0, INPUT_PULLUP);
}

void loop() {
  if(digitalRead(D0) == LOW) {
    // Delay of ~20 seconds to enter listen mode if these 2 disconnect calls are missing
    Particle.disconnect();
    WiFi.disconnect();
    WiFi.listen(true);
  }
}

/*
Main source of delay:

While connecting to the WiFi, wlan_connect_finalize calls
wiced_interface_up -> wiced_join_ap -> wiced_join_ap_specific ->
wwd_wifi_join -> host_rtos_get_semaphore (aka xQueueGenericReceive)
which blocks the system thread until connection attempt finishes.

Reason delay cannot be stopped by WiFi.disconnect():

With multithreading on, calling WiFi.disconnect() in the app calls
network.disconnect() to the system thread, then the system background
task is run (Spark_Idle_Events) which runs manage_network_connection
which immediately calls network.connect() which blocks the system thread
while the WiFi connects.
*/

Completeness:

  • Minimum test case added
  • Device, system and user firmware versions stated
  • Particle confirmed
@pomplesiegel

This comment has been minimized.

Copy link

commented Jun 23, 2016

From our testing it appears that the workaround of entering listening mode using

Particle.disconnect();
WiFi.disconnect();
WiFi.listen(true);

does not work in all cases. We're seeing delays to enter listening mode, and sometimes never entering at all. @karlhenricksen, can you elaborate regarding this scenario?

However, using the setup button directly on the Photon has always worked consistently. Is there any reason why the implementations for WiFi.listen(true) and the handler for a photon setup button hold have diverged?

To be fully honest, as customers we were quite disappointed to see this regression, as entering listening mode was an issue (which was then resolved) this last February: #859. This is a pretty barebones essential feature in any ioT device to set up a wifi connection. Any reason why this wasn't caught within 0.5.1 testing?

@technobly

This comment has been minimized.

Copy link
Member

commented Jun 29, 2016

@pomplesiegel I'll look into this one as well to figure out what happened with the regression you're seeing.

@pomplesiegel

This comment has been minimized.

Copy link

commented Jun 29, 2016

great, thank you

@pomplesiegel

This comment has been minimized.

Copy link

commented Jul 7, 2016

Hey @technobly, to be more specific regarding our questions

  1. Why has the implementation of listen(true) deviated from that of the setup button hold? The latter method has always worked reliably, while the former's efficacy has changed over time.
  2. We're curious why this regression wasn't caught in 0.5.1 testing.

Hope this is more clear. Thank you!

@pomplesiegel

This comment has been minimized.

Copy link

commented Jul 12, 2016

Hi @technobly and @m-mcgowan, is there a PR we could use which fixes this? As of the moment, our users can't reliably put the device into listening mode using WiFi.listen(true). We're seeing all kinds of weird in-between system behavior as a result

@karlhenricksen

This comment has been minimized.

Copy link

commented Jul 20, 2016

@technobly can we expect a fix for this in the 0.5.3 release? Is this even on the radar? This is the typical use-case for our customers to ever have to enter into listening mode, and I'm seeing about 60% of the time it takes ~20s to enter listening using the 0.5.3rc1 pre-release.

I haven't done enough testing to verify if anything has changed, but in recent system versions I saw cases where it seemed unable to enter listening at all, though I have not reproduced that failure yet. Will report with more findings if I get any updates there.

@technobly

This comment has been minimized.

Copy link
Member

commented Jul 20, 2016

@karlhenricksen @pomplesiegel a fix for this is not currently in 0.5.3-rc.1 but could be for 0.5.3-rc.2 before 0.5.3 is default. We've been spending most of the time ensuring that other bugfixes are integrated and well tested in 0.5.2 and 0.5.3-rc.1.

I spent a bit of time with the test case in the first post, and modified it to pop in and out of listening mode. Even though it's a workaround, it works very reliably for me with 0.5.3-rc.1 on a Photon. It would be good to understand your minimum test case and how it doesn't always work for you as we are considering a fix for this issue. For your test case, I'd be interested to know what happens when you wrap these in

SINGLE_THREADED_BLOCK() {
  Particle.disconnect();
  WiFi.disconnect();
  WiFi.listen(true);
}

The worst delay I've seen is when trying to enter Listening Mode during the blinking cyan (cloud handshake process), which I don't believe has an method to cancel currently. In that case though it's in the middle of connecting to the Cloud so I'm not sure why Listening Mode would be desired immediately. Is it usually during Blinking Green (WiFi connecting) that your users would want to immediately enter Listening Mode?

It is unclear why the manual button version of entering listening mode has changed from the software version, but they are both fairly different entry points. The manual method catches the action in an interrupt, and immediately cancels the network connection, then enters listening mode. The software method requires multithreading to execute fast like the interrupt driven method of the button, but by itself it does not force the network to be disconnected first (thus the particle.disconnect() wifi.disconnect() calls as a workaround). Perhaps the suggested: "The system firmware should cancel the connection attempt before entering listen mode." in the first post should be implemented as WiFi.listen(true) would force a network disconnection within the .listen() function, and this would bring the Setup button and Software implementations back inline with each other.

There are some tests for Threading that exercise WiFi.listen(true) and while I ran them and they passed, if I run them many times a couple seem prone to fail intermittently. We will have to spend some time re-working these tests to ensure this failure case is captured and make sure the test reliably passes 100% of the time.

@technobly technobly added this to the 0.6.x milestone Jul 20, 2016

@technobly

This comment has been minimized.

Copy link
Member

commented Jul 20, 2016

I added the 0.6.x milestone with the understanding that it would be back-patched in 0.5.x

@karlhenricksen

This comment has been minimized.

Copy link

commented Jul 20, 2016

@technobly users would typically be attempting to enter listening when in blinking green, but breathing cyan is valid as well.

There are some tests for Threading that exercise WiFi.listen(true) and while I ran them and they passed, if I run them many times a couple seem prone to fail intermittently.

Yes this seems to be what we're seeing as well, intermittent failures.

FYI, I've seen that I can reproduce the 20s delay 100% if I attempt to enter right after exiting listening, like:

//each of these called individually using a serialEvent handler:
WiFi.listen(true);
WiFi.listen(false);
WiFi.listen(true); //fails here

Not necessarily a real-life use-case but exposes the problem reliably. Hope that helps narrow the search.

@technobly

This comment has been minimized.

Copy link
Member

commented Aug 1, 2016

Related to #1067

@technobly technobly added the priority label Aug 9, 2016

@markterrill

This comment has been minimized.

Copy link

commented Aug 14, 2016

ps, thanks for that demo code with the kicker of: System.on(wifi_listen_end, reconnectAfterListen);
That was pure magic, it helped with multiple issues. I use a ISR timer to timeout listen and connect modes and previously whenever softap API had been used it would require a power cycle to neatly go back to pure connect mode (including correct LED colour), that system event listener meant I could switch to my internal connect mode as well as doing the particle.connect.

Could I suggest adding that magic to the docs because that event is currently undocumented. https://docs.particle.io/reference/firmware/photon/#system-events-overview

@karlhenricksen

This comment has been minimized.

Copy link

commented Aug 23, 2016

@technobly any updates on this issue, is this still planned to be included in the 0.6.0 release?

@technobly

This comment has been minimized.

Copy link
Member

commented Aug 23, 2016

Hi @karlhenricksen yes this issue is still a priority for 0.6.0.

@sergeuz sergeuz self-assigned this Aug 25, 2016

@sergeuz sergeuz referenced this issue Aug 29, 2016
7 of 7 tasks complete
@technobly

This comment has been minimized.

Copy link
Member

commented Aug 29, 2016

@karlhenricksen @pomplesiegel I would appreciate it if you could test PR #1105 asap which is a proposed fix for this issue, so we can get this into v0.6.0-rc.2 early this week. Thanks!

@karlhenricksen

This comment has been minimized.

Copy link

commented Aug 29, 2016

@technobly sounds good, will check it out now, thanks!

@karlhenricksen

This comment has been minimized.

Copy link

commented Aug 29, 2016

@technobly I see no difference (100% reproducible) with that PR (sys version: "0.6.2-rc2") using the method mentioned earlier - enter/exit/enter listen when router is offline and photon searching (blinking green):

//each of these called individually using a serialEvent handler:
WiFi.listen(true);
WiFi.listen(false);
WiFi.listen(true); //fails here --> 20s delay to enter
@technobly

This comment has been minimized.

Copy link
Member

commented Aug 29, 2016

@karlhenricksen I wasn't aware that it was the second time for you previously (even though I see now you mentioned the same thing above) I must have skipped over the fact that it was the second call... are you consistently seeing the delay on the second use of WiFi.listen(true)? I was seeing a similar thing with the Electron via Cellular.listen(true), see my comments over here #1105 which we are currently looking into as well.

I'll test again on the Photon now.

@technobly

This comment has been minimized.

Copy link
Member

commented Aug 29, 2016

@karlhenricksen if you look at my code over in #1105 it works on successive calls ok for me with the Photon continuously trying to connect. May I please get a copy of your serialEvent handler code to try?

@technobly

This comment has been minimized.

Copy link
Member

commented Aug 29, 2016

@karlhenricksen no need to send something... I just wrote this app that does seem to exhibit the issue. Not necessarily on the 2nd try, but on various tries. If you open a serial terminal and type 121212121212121212 at about a 1 second rate you'll see it not respond to one of the 1's and continue trying to connect. You can do this with credentials as well... just be sure to press 1 when it's blinking green. You'll see the D7 LED light up when a 1 has been received or off with 2 has been received. We'll use this to hunt down this remaining issue.

#include "Particle.h"

SYSTEM_THREAD(ENABLED)

void setup()
{
   Serial.begin(9600);
   pinMode(D7, OUTPUT);
}

void serialEvent()
{
    char c = Serial.read();
    Serial.print(c); // echo
    if (c == '1') {
        digitalWrite(D7, HIGH);
        #if Wiring_WiFi
            WiFi.listen(true);
        #elif Wiring_Cellular
            Cellular.listen(true);
        #endif
    } else if (c == '2') {
        digitalWrite(D7, LOW);
        #if Wiring_WiFi
            WiFi.listen(false);
        #elif Wiring_Cellular
            Cellular.listen(false);
        #endif
    }
}
@karlhenricksen

This comment has been minimized.

Copy link

commented Aug 29, 2016

great, thanks, glad you're seeing it now

@technobly

This comment has been minimized.

Copy link
Member

commented Sep 1, 2016

@karlhenricksen @pomplesiegel We've made a few updates to PR #1105 and it's looking really nice now. I would love it if you could confirm. We are also currently targeting Tuesday the 6th for a 0.6.0-rc.2 release, however I would love to wrap this issue up before Monday. Thanks!

@karlhenricksen

This comment has been minimized.

Copy link

commented Sep 2, 2016

@technobly looks so far like you've nailed this issue - I'm still testing around that area to be confident there are no side-effects but the delay problem looks properly wrapped up. Thanks! Unless something else pops out of it, I'd consider this done.

@technobly

This comment has been minimized.

Copy link
Member

commented Sep 3, 2016

Good to hear! Thanks for testing @karlhenricksen

@sergeuz sergeuz referenced this issue May 18, 2017
6 of 6 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.