Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Red SoS re Wifi.on/connect and particle.connect #1958

Closed
markterrill opened this issue Nov 6, 2019 · 23 comments

Comments

@markterrill
Copy link

@markterrill markterrill commented Nov 6, 2019

New SoS with old code after 1.1.1

From my experience it's worked perfectly fine for many years, now doesn't. Zero documentation on how it is meant to work. What's your test case code for checking that 1.4.2 would connect reliably to cloud in semi auto / system thread mode? With and without credentials stored.

Bug Report

Expected Behavior

What should it do? Not red sos, it should work.

Observed Behavior

What did it actually do? Red SoS, did not work

Steps to Reproduce

Running 1.4.2 with semi automatic and system thread enabled.

I'm guessing anything older than 1.1.1 changes system behaviour.

Test App

Fails:

SYSTEM_MODE(SEMI_AUTOMATIC);
SYSTEM_THREAD(ENABLED);

void timer_10sec_connect() {
    Serial.println("in 10sec timer going to do connect()");
    Particle.connect();
}
Timer timerParticleConnect(10000, timer_10sec_connect, true); // one shot timer for connectivity

void setup() {

    delay (3000);
    // Start serial at 9600 baud
    Serial.begin(9600);
    Serial1.begin(115200); // BT module

    // Update version as property functions can't be in global scope
    sV.concat("_");
    sV.concat(System.version());
    sV.toCharArray(strVersion, 32);

    // Spark read/write variables
      Particle.variable("version", strVersion, STRING);
      Particle.variable("temps", publishString, STRING);
      // Spark functions
      // These are limited to 63 chars in arguments https://docs.particle.io/reference/api/#call-a-function
      Particle.function("listenMode", listenMode);
      Particle.function("stopLidOpen", cancelLidOpen); // cancel was too long for 12 chars
// .... more functions, redacting for privacy
      Serial.println("start of setup 0 functions");
    delay(500);
    WiFi.connect();
    Serial.println("start of setup 1 connect");
    timerParticleConnect.start();

   //more setup down here. 
}

Yes, that comment was correct. 'spark variables', that code has worked that long.

To make this shambles work, you can insert a delay(7000); before timerParticleConnect.start(); and it magically starts working.

Note, I added WiFi.connect(); today to the code because without it 1.4.2 would stay on breathing white light, not bothering to connect despite particle.connect being called. To see it explode in radiant red flashing, just remove the WiFi.connect call and wait for it to call particle.connect. See below in references on how the docu for particle.connect is clearly out of sync with the dystopian future provided by 1.4.2

References

Smells like this #1449

Also the documentation is clearly wrong.
"Particle.connect() connects the device to the Cloud. This will automatically activate the Wi-Fi connection and attempt to connect to the Particle cloud if the device is not already connected to the cloud."
https://docs.particle.io/reference/device-os/firmware/photon/#particle-connect-


Feature/Enhancement Request

How about an actual example in the docs of how you expect an user to start wifi and connect according to best practice (instead of SOS red lights) with 1.4.2?

How about you guys link your test case application code against the documentation? If it doesn't exist, how about writing some tests?

@markterrill

This comment has been minimized.

Copy link
Author

@markterrill markterrill commented Nov 6, 2019

Incidentally, using particle.connect in the main setup thread instead of wifi.connect() works. running a 'particle list' against the device seems to show all the functions/variables, but prior experience is that without the 10s delay to particle connect then publishing the variables/functions was not reliable and particle cloud would list the device as having no functions. Only reliable way was to delay connect. (which may have been a bit of column a Particle Cloud and column b Client Internet Connection, but we have to be pragmatic as it has to work)

So, what is recommended?

@markterrill

This comment has been minimized.

Copy link
Author

@markterrill markterrill commented Nov 6, 2019

ps, either way there are some nasty hard faults in there. without a serious delay after wifi.connect the timer start causes a hard fault. Without wifi.connect in the main thread then particle.connect in the timer doesn't work.

@technobly

This comment has been minimized.

Copy link
Member

@technobly technobly commented Nov 6, 2019

@markterrill can you try adding WiFi.on() before WiFi.connect()? I don't think it should be required for SEMI_AUTOMATIC though, but just as a work around for now.

@Dan-Kouba

This comment has been minimized.

Copy link

@Dan-Kouba Dan-Kouba commented Nov 6, 2019

@markterrill A couple things:

Particle.function and Particle.variable should be called at the very top of your setup() routine. Please refer to our documentation for these functions for the warning about using them with SYSTEM_THREAD(ENABLED):

When using SYSTEM_THREAD(ENABLED) you must be careful of when you register your variables. At the beginning of setup(), before you do any lengthy operations, delays, or things like waiting for a key press, is best. The reason is that variable and function registrations are only sent up once, about 30 seconds after connecting to the cloud. Calling Particle.variable after the registration information has been sent does not re-send the request and the variable will not work.

Also, software timers have restrictions which I believe you may be running into and that may be causing your SOS. Please refer to the following information from our docs:

You should not use functions like Particle.publish from a timer callback.

Software timers run with a smaller stack (1024 bytes vs. 6144 bytes). This can limit the functions you use from the callback function.

Playing with threads is inherently risky on a resource-limited device like this, and the implementation of proper code takes a long time (and likely includes a lot of frustrating debugging). If you're interested in reading more, one of our engineers has a post in the community that illustrates some of the risks of threading and has some implementation suggestions: https://community.particle.io/t/particle-threads-tutorial/41362

My recommendation is the following:

  1. Determine if you actually need threading - can what you are doing be done using a state machine instead? If not, follow the guidelines in the last link I provided for best practices.
  2. Reorganize your setup() routine such that Particle.variable and Particle.function calls are at the very top, before any delays or other blocking calls.
  3. Connect to the internet at the end of your setup routine, or even better, consider using a state machine to orchestrate the startup of your device.

Hope that helps.

@technobly

This comment has been minimized.

Copy link
Member

@technobly technobly commented Nov 6, 2019

@markterrill also how many blinks are you getting after the SOS pattern? That could point to the problem.

Also in your timer_10sec_connect() function, try setting a flag that gets processed in loop() to avoid any issues like Dan mentioned or:

You should not call Serial.println from a software timer because Serial is not thread-safe.

The timer callback is similar to an interrupt - it shouldn't block. However, it is less restrictive than an interrupt.

@markterrill

This comment has been minimized.

Copy link
Author

@markterrill markterrill commented Nov 6, 2019

@Dan-Kouba I agree, careful coding is required. Probably thousands of hours has gone into our code and is run on 5k devices. This code has been running since spark core days, the setup section untouched for the past 2 years.

So, skipping over the new kid being patronising, a couple issues:

  • Code works fine on 0.5.4 to 1.1.1. That's called a regression bug.
  • In the real world you don't refactor a large application because the supplier introduced a regression bug. You contact them, find out the workaround (wifi.connect) and ask them to fix their stuff up.
  • Dan's reply suggests he didn't actually look at the code above. That's the code tripping the sos. You can run it. You'll see pretty red lights.
  • @technobly the serial.println isn't tripping the sos. You get to see it print before the sos. It prints without sos when you put a delay or use particle.connect in the main thread instead of the timer. I'll run it later and see if with no large timer and no serial if it sos's. I'll try counting the sos, but it's pretty hard unless its obviously the actual sos pattern instead of like 8 blinks.
@markterrill

This comment has been minimized.

Copy link
Author

@markterrill markterrill commented Nov 6, 2019

Hi, related thread from @ScruffR

#1631

@markterrill

This comment has been minimized.

Copy link
Author

@markterrill markterrill commented Nov 6, 2019

Lastly, I'd suggest the documentation writer needs to wear the dunces hat for the day.

Updated the argon docu highlighting the issue:
https://docs.particle.io/reference/device-os/firmware/argon/#particle-connect-

Did not update the photon/p1 docu:
https://docs.particle.io/reference/device-os/firmware/photon/#particle-connect-

Thanks to whomever did that, that cost me a few hours figuring that out. I just love sitting around all day figuring out particle bugs.

@markterrill

This comment has been minimized.

Copy link
Author

@markterrill markterrill commented Nov 7, 2019

Incidentally, speaking of hard to decipher sos codes, side thought, I recently had to code a status light library for ESP32 so I could have something similar to particle. I made a few departures from the Particle approach.

Green - Cloud, ie when it's breathing it's breathing green to traffic light signal all ok. Cyan was a poor choice, it uses two led segments of the cree to make cyan, chews more power, and isn't universal traffic light color code.
Blue - BT
White - Wifi codes, starts that way and moves so fast through wifi connection it may as well stay white

Key point of differentiation. If there is a warning (orange) or fault (red) with a service then it blinks the color of the service and every 2 seconds blinks the error code. ie green green green orange green green green orange. ..or red. You get the point.

What happens when you get the luxury of hindsight.

@markterrill

This comment has been minimized.

Copy link
Author

@markterrill markterrill commented Nov 7, 2019

PS @technobly

I tested some variations with/without, both SOS, bold to show the difference:

  • wifi.connect in setup, no delay, timer call to particle.connect with serial println
  • wifi.connect in setup, no delay, timer call to particle.connect with NO serial println

The best I can tell from the nonsense red light is it's not the SOS pattern, think its 8 red relatively steady blinks a pause then another long blink. Confusing AF. Video: https://www.dropbox.com/s/c52jnorr8qs8pwp/Video%207-11-19%2C%2012%2054%2055.mov?dl=0

Fixes:

  • Adding delay fixes the timer call.
  • Not using a timer call and just using particle.connect in the main thread instead of wifi.connect() works. That doesn't seem to precisely match up with the argon bug with requiring wifi.* before particle.connect.

So, there is a christmas hamper of bugs in there. I'm not exactly stoked about my workaround options:

  • A) execute wifi.connect and run a long delay (which slows down boot time) and keep the timer for particle.connect
  • B) run particle.connect in the main setup thread, though not 110% sure on implications and probably requires a lot more testing than A)
@markterrill

This comment has been minimized.

Copy link
Author

@markterrill markterrill commented Nov 7, 2019

@Dan-Kouba thanks for the friendly information on system threading, it's amazing what we can learn from the new generation of coders.

What I suggest is a recap on regression bugs:

If you're feeling adventurous you can comment out the timer and comment in and out the wifi and particle connect commands. Just make sure to check what happens on a soccer field with a weak 3g signal with particle cloud registering the test variable and function!

FYI, why 0.5.5? Mid this year we were still getting tape and reels of 0.5.4 shipped to us, but I'm guessing 0.5.5 with the WPA2 patch is the lowest you should be going. Our code was working on up to 1.1.1. Weirdly the latest reel shipped with 1.2.1.

@markterrill

This comment has been minimized.

Copy link
Author

@markterrill markterrill commented Nov 7, 2019

PS, I'll repeat my request for the Particle test case code:

  • semi automatic

  • system thread enabled

  • particle.connect

  • typical prepare/setup code before and after particle.connect to validate that the (now) blocking .connect doesn't cause issues

  • particle functions and variables successfully registering

You must have this test case being run automatically in your CI system. Share it and put it in your docs.

@ScruffR

This comment has been minimized.

Copy link
Contributor

@ScruffR ScruffR commented Nov 7, 2019

@markterrill, with regards to this

Lastly, I'd suggest the documentation writer needs to wear the dunces hat for the day.

That's not a fault of any Particle employee adding the note for one but not the other platform.
It was actually me who added the note and back then it was only needed for Gen3 since Gen1&2 did work as layed out.

I have also provided that info somewhat more elaborate in the forum post here
https://community.particle.io/t/wifi-on-or-wifi-connect/50735/13?u=scruffr

While I do agree that @Dan-Kouba may have jumped to conclusions with his inital statement

Particle.function and Particle.variable should be called at the very top of your setup() routine.

as this does not actually apply as long the actual cloud connection is established after the registration, it's still not necessary to get provocative and it won't help anybody's cause.

Looking at your red-blink video I'd say this is a Safe Mode (magenta) followed by Hard Fault SOS+1

@markterrill

This comment has been minimized.

Copy link
Author

@markterrill markterrill commented Nov 7, 2019

Yeah the lights are hard to decipher. It's all red, no magenta. Normally if it's doing an actual SoS its easy to spot, when it does n number of red blinks it's quite hard to count. There was bit of a pause during the blinks but still didn't (to me) resemble the sos pattern as only one blink went long imho. Someone who stares at boards all day probably will know instantly what it is meant to be.

@avtolstoy

This comment has been minimized.

Copy link
Member

@avtolstoy avtolstoy commented Nov 8, 2019

I've tried reproducing this issue on both Photon and P1 today to no avail both with the original app (which doesn't build as-is by the way) and slimmed down variant below. Anything else I should do? The device doesn't go into panic state and correctly connects to both the wifi and the cloud.

SYSTEM_MODE(SEMI_AUTOMATIC);
SYSTEM_THREAD(ENABLED);

void timer_10sec_connect() {
    Particle.connect();
}
Timer timerParticleConnect(10000, timer_10sec_connect, true); // one shot timer for connectivity

void setup() {
    timerParticleConnect.start();
}
@markterrill

This comment has been minimized.

Copy link
Author

@markterrill markterrill commented Nov 9, 2019

@avtolstoy hah, oops, my bad, I posted the code with the delay(2000), which sidesteps the problem.

Try this: https://gist.github.com/markterrill/a8bd527bbacf898c4970686a82670009.js

The key part is ln59 delay(2000); - if it's there the timer works. If it's not, red sos.

The reason for the timer is experience with slow chinese internet shows that if you delay 10 seconds then the functions register accurately and quickly. If you particle.connect immediately (ie in setup after a wifi.connect) then your luck widely varies. At the moment we have issues with 1.4.2 even running particle.connect in the main setup function. Functions are taking a few minutes to register so our automated api call to test the board are failing. We've had to disable the final check and just trust that they're operational. It's pretty hard to test remotely over a flakey VPN to know what is particle code sequencing and what is their internet (and the best way to structure our code to sidestep/curtail the internet lag)

@markterrill

This comment has been minimized.

Copy link
Author

@markterrill markterrill commented Nov 9, 2019

ps, your further trimmed code doesn't enable testing if the variables/functions actually registered.

The serial/strVersion setup is optional, I've validated that it makes no difference to the SOS.

@avtolstoy

This comment has been minimized.

Copy link
Member

@avtolstoy avtolstoy commented Nov 9, 2019

@markterrill Thanks for the full app. I can say with a great amount of certainty that the crash is actually due to WiFi.macAddress() call before WiFi.on(). We are tracking this issue as #1858 and it's planned to be resolved in 1.5.0-rc.1.

I'm not sure I understand the additional info about variables and functions not registering though, as the main issue discussed here is about the SOS and Particle.connect() from a timer. I would appreciate if you could create another GitHub issue describing what you are seeing and perhaps contact the support team to take a look at the cloud side of things for particular devices as the DESCRIBE message containing information about variables/functions is requested by the cloud.

@markterrill

This comment has been minimized.

Copy link
Author

@markterrill markterrill commented Nov 9, 2019

I agree. Just compiled with wifi.macaddress commented out and no sos crash.

Re the timer and cloud functions, I was just providing context as to why we've historically stashed the particle.connect away in a timer.

@avtolstoy

This comment has been minimized.

Copy link
Member

@avtolstoy avtolstoy commented Nov 9, 2019

@markterrill Thanks for the confirmation. Would you mind if we close this issue as a duplicate of #1858?

@markterrill

This comment has been minimized.

Copy link
Author

@markterrill markterrill commented Nov 9, 2019

Perfect, thanks!

@markterrill markterrill closed this Nov 9, 2019
@avtolstoy avtolstoy added the duplicate label Nov 9, 2019
@avtolstoy

This comment has been minimized.

Copy link
Member

@avtolstoy avtolstoy commented Nov 9, 2019

Duplicate of #1858.

@markterrill

This comment has been minimized.

Copy link
Author

@markterrill markterrill commented Nov 9, 2019

Yeah I've just commented it out in our main code and all is working fine. I'll test Monday how we're now faring with function/variable registration via the great firewall of china :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.