Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HardFault in Photon setup during WiFi scan #651

Closed
monkbroc opened this issue Sep 30, 2015 · 14 comments

Comments

@monkbroc
Copy link
Member

commented Sep 30, 2015

When trying to set up a new Photon through the CLI and letting it scan for WiFi networks it crashes with a hard fault in wwd_scan_result_handler. The Photon has firmware version 0.4.5.

Here's the CLI interaction. I'm on Linux so the auto switching WiFi networks doesn't work. I manually connect to the Photon WiFi network before starting particle setup.

The crash only happens if I answer "Would you like to manually enter your Wi-Fi network configuration? No"

                  _   _      _        _
 _ __   __ _ _ __| |_(_) ___| | ___  (_) ___
| '_ \ / _` | '__| __| |/ __| |/ _ \ | |/ _ \
| |_) | (_| | |  | |_| | (__| |  __/_| | (_) |
| .__/ \__,_|_|   \__|_|\___|_|\___(_)_|\___/
|_|                     https://particle.io/

> Setup is easy! Let's get started...
> It appears as though you are already logged in as XXXXXXXX
? Would you like to log in with a different account? No

! PROTIP: Hold the MODE/SETUP button on your device until it blinks blue!
! PROTIP: Please make sure you are connected to the internet. 

> I have detected a Photon connected via USB.
? Would you like to continue with this one? Yes
! The Photon supports secure Wi-Fi setup. We'll try that first.

! PROTIP: Wireless setup of Photons works like a wizard!
! PROTIP: We will automagically change the Wi-Fi network to which your computer is connected.
! PROTIP: You will lose your connection to the internet periodically.

> No nearby Photons detected. Try the `particle help` command for more information.
? Would you like to wait and monitor for Photons entering setup mode? Yes
> Monitoring nearby Wi-Fi networks for Photons. This may take up to a minute.
? Found "Photon-3JFT". Would you like to perform setup on this one now? Yes

! PROTIP: You will need to know the password for your Wi-Fi network (if any) to proceed.
! PROTIP: You can press ctrl + C to quit setup at any time.

> Obtained magical secure claim code.


! I am unable to automatically connect to Wi-Fi networks (-___-)

? We can still proceed in 'manual' mode. Would you like to continue? Yes
? Please connect to the Photon-3JFT network now. Press enter when ready. 

> Now to configure our precious Photon

! PROTIP: If you want to skip scanning, or your network is configured as a
! PROTIP: non-broadcast network, please enter manual mode to proceed...
? Would you like to manually enter your Wi-Fi network configuration? No
O Asking the Photon to scan for nearby Wi-Fi networks...
@m-mcgowan

This comment has been minimized.

Copy link
Contributor

commented Sep 30, 2015

I'm pretty sure this must be environmental. Are you able to deactivate certain routers to see if that helps?
Of course, it's a bug in the WICED code too, but being able to tweak the environment might help us narrow down the cause.

@monkbroc

This comment has been minimized.

Copy link
Member Author

commented Sep 30, 2015

My computer can see 26 SSID's from my desk. Most of them are in other offices, so I can't turn them off.

@m-mcgowan

This comment has been minimized.

Copy link
Contributor

commented Sep 30, 2015

Napalm? ;-)

@monkbroc

This comment has been minimized.

Copy link
Member Author

commented Sep 30, 2015

I'll just increase the power on the router on my desk until it knocks all the other ones offline! 💥

@jphgross

This comment has been minimized.

Copy link

commented Jan 23, 2016

Any updates on this issue? I am having an issue that appears to be similar. I'm attempting to setup a photon and it crashes every time it scans the networks. Manually configuring the network never completes successfully either so I'm rather stuck.

In my case there are only 4 networks visible and only one that has a strong enough signal to connect to.

My router is a Linksys WRT160Nv2 Firmware revision: v2.0.02

It's setup using WEP security 40/64-bit

The router isn't mine and so resetting it or changing any of the config isn't really an option at the moment.

I'm at a loss on how to debug this any further or figure out a workaround. Any thoughts?

Update: I was successful in connecting. I didn't realize that the WEP configuration required the prefix to the key. So, the manual configuration worked.

@embedded-creations

This comment has been minimized.

Copy link
Contributor

commented Jan 30, 2016

I'm seeing a similar issue, it's a a remote site so my debugging is limited. We are seeing a hard fault during a wifi scan in our firmware. I tried the example from Particle docs, modified just slightly. We are seeing hard faults running the code below, with 0.4.7. I see some empty SSIDs in the log. Perhaps there are some nearby routers with SSIDs that have unusual characters or a long name?



// EXAMPLE using a callback
void wifi_scan_callback(WiFiAccessPoint* wap, void* data)
{
    WiFiAccessPoint& ap = *wap;
    Serial.print("SSID: ");
    Serial.println(ap.ssid);
    Serial.print("Security: ");
    Serial.println(ap.security);
    Serial.print("Channel: ");
    Serial.println(ap.channel);
    Serial.print("RSSI: ");
    Serial.println(ap.rssi);
}

void setup() {
    Serial.println();
    Serial.println("startup");
}

void loop()
{
    int result_count = WiFi.scan(wifi_scan_callback);
    Serial.print(result_count);
    Serial.println(" APs scanned.");
    Serial.println();

    delay(2000);
}

Log after running a few times with hard faults:

startup
SSID: sweets
Security: 3
Channel: 1
RSSI: -67
SSID: xfinitywifi
Security: 0
Channel: 1
RSSI: -63
SSID: DIRECT-EB-HP ENVY 5660 series
Security: 3
Channel: 1
RSSI: -60
SSID: SA2100-FA38
Security: 3
Channel: 1
RSSI: -19
SSID:
Security: 3
Channel: 1
RSSI: -62
SSID: Clope_4
Security: 3
Channel: 11
RSSI: -42
SSID: stan305
Security: 3
Channel: 11
RSSI: -80
SSID: linksys
Security: 2
Channel: 6
RSSI: -74
SSID:
Security: 3
Channel: 6
RSSI: -87
SSID: HOME-7A3D-2.4
Security: 3
Channel: 6
RSSI: -87
SSID: HOME-2F72
Security: 3
Channel: 11
RSSI: -90
11 APs scanned.

0 APs scanned.

SSID: DIRECT-EB-HP ENVY 5660 series
Security: 3
Channel: 1
RSSI: -59
SSID: sweets
Security: 3
Channel: 1
RSSI: -60
SSID: xfinitywifi
Security: 0
Channel: 1
RSSI: -60
SSID: SA2100-FA38
Security: 3
Channel: 1
RSSI: -21
SSID:
Security: 3
Channel: 1
RSSI: -61
SSID: Clope_4
Security: 3
Channel: 11
RSSI: -43
SSID: HOME-2F72
Security: 3
Channel: 11
RSSI: -84
SSID: HP-Print-26-Photosmart 6520
Security: 3
Channel: 3
RSSI: -89
SSID: stan305
Security: 3
Channel: 11
RSSI: -83
SSID: linksys
Security: 2
Channel: 6
RSSI: -69
SSID:
Security: 3
Channel: 6
RSSI: -82
SSID: HOME-7A3D-2.4
Security: 3
Channel: 6
RSSI: -83
SSID:
Security: 3
Channel: 11
RSSI: -89
13 APs scanned.

SSID: sweets
Security: 3
Ch
startup
SSID: sweets
Security: 3
Channel: 1
RSSI: -55
SSID: SA2100-FA38
Security: 3
Channel: 1
RSS
startup
SSID: sweets
Security: 3
Channel: 1
RSSI: -57
SSID: SA2100-FA38
Security: 3
Channel: 1
RSSI: -21
SSID: xfinitywifi
Security: 0
Channel: 1
RSSI: -59
SSID: DIRECT-EB-HP ENVY 5660 series
Security: 3
Channel: 1
RSSI: -63
SSID:
Security: 3
Channel: 1
RSSI: -59

startup
SSID: DIRECT-EB-HP ENVY 5660 series
Security: 3
Channel: 1
RSSI: -60
SSID: sweets
Security: 3
Channel: 1
RSSI: -59
SSID: SA2100-FA38
Security: 3
Channel: 1
RSSI: -22
SSID: xfinitywifi
Security: 0
Channel: 1
RSSI: -60
SSID:
Security: 3
Channel: 1
RSSI: -60

startup
SSID:
Security: 3
Channel: 1
RSSI: -62
SSID: SA2100-FA38
Security: 3
Channel: 1
RSSI: -23
SSID: sweets
Security: 3
Channel: 1
RSSI: -59
SSID: DIRECT-EB-HP ENVY 5660 series
Security: 3
Channel: 1
RSSI: -60
SSID: xfinitywifi
Security: 0
Channel: 1
RSSI: -62
SSID: Clope_4
Security: 3
Channel: 11
RSSI: -40
SSID: stan305
Security: 3
Channel: 11
RSSI: -80
SSID: HOME-2F72
Security: 3
Channel: 11
RSSI: -97
SSID:
Security: 3
Channel: 6
RSSI: -83
SSID: linksys
Security: 2
Channel: 6
RSSI: -75
SSID: HOME-7A3D-2.4
Security: 3
Channel: 6
RSSI: -81
SSID:
Security: 3
Channel: 11
RSSI: -89
12 APs scanned.

0 APs scanned.

SSID: sweets
Security: 3
Channel: 1
RSSI: -55
SSID: SA2100-FA38
Security: 3
Channel: 1
RSSI: -23
SSID:
Security: 3
Channel: 1
RSSI: -59
SSID: xfinitywifi
Security: 0
Channel: 1
RSSI: -59
SSID: DIRECT-EB-HP ENVY 5660 series
Security: 3
Channel: 1
RSSI: -62
@monkbroc

This comment has been minimized.

Copy link
Member Author

commented Jan 30, 2016

It's hard to make progress on this issue since there are no clear steps for Particle developers to reproduce the crash. I'll see with @m-mcgowan if there's something I can do to narrow this down since the issue occurs at my office.

@embedded-creations

This comment has been minimized.

Copy link
Contributor

commented Jan 31, 2016

We were also seeing a hard fault during a 2-argument call to WiFi.setCredentials(). After getting a better understanding of how that call works, it seems to do a wifi scan to get the missing network information.

@m-mcgowan Can you recommend a Windows-based WiFi scanner tool that would give the full SSID of nearby networks? I can have someone run it at the remote location and hopefully get some information that will help reproduce this issue.

@embedded-creations

This comment has been minimized.

Copy link
Contributor

commented Jan 31, 2016

I forgot to mention that there's no good workaround when using the Particle smartphone app to setup a Photon in this environment. The Photon would hard fault during the wifi scan section of setup, and the option to enter credentials manually doesn't come until after the scan.

Our workaround was putting the device with Photon into an ammo box with the router and phone right outside. About 50% of the time this was enough to attenuate the offending network's signal so we could get past the wifi scan and add credentials.

@monkbroc

This comment has been minimized.

Copy link
Member Author

commented Feb 12, 2016

I tried reproducing the issue again at my office today but it didn't reoccur (maybe some WiFi networks changed since last September).

@jamesr66a

This comment has been minimized.

Copy link
Contributor

commented Mar 11, 2016

Hi all,

I'm writing an application that makes extensive use of the WiFi scan facility and I've encountered this issue as well.

Via JTAG, I tracked the hard fault issue down to an illegal memory access within the tlv_find_tlv8() function call: old_find_tlv8.txt, sepcifically during the ldrb r4 [r3, #0] instruction. That instruction causes an illegal access to memory location 0x20020080 (well beyond .bss in my case) so I sought out to investigate why this was happening.

During the hard fault condition, $r1 (message_length parameter according to GDB) is 0xfffffff0 (after subtracting 2) so the function enters an (effectively) infinite traversal until it hits an invalid memory location and hard faults. This seemed to be an issue with the caller passing in an invalid parameter, but the immediate caller (wlu_parse_tlvs inlined into wwd_scan_result_handler) was too buried to fix so I made a stopgap solution:

I created an equivalent C implementation to the tlv_find_tlv8() function (let me know if anything differs, I've only ensured cursory equivalence) and compiled it into a library to replace the Lib_TLV.a archive in hal/src/photon/lib:

arm-none-eabi-gcc -c -O3 -mthumb -fPIC Lib_TLV.c
arm-none-eabi-ar rcs Lib_TLV.a Lib_TLV.o

and just dropped that into the location where the old archive. Full clean, compile, and flash and to my surprise my MTBF improved significantly! (I was expecting to have to add bounds checks or something into my C function but I guess not) Here is the output of GCC when compiling this new function. Note that the tlv_find_tlv8() function is the only function that gets linked from that archive so I left the other ones out. I really have no idea why this workaround works in my case but my code has been running fault-free since I made this code substitution so I'm not going to mess with it unless the problem arises again.

I hope this gives you all some clues and a potential short-term workaround. A "proper" solution would probably involve figuring out what is going wrong with the parameters passed in by the callers. Unfortunately, I don't have much to offer with regard to repro steps. I've just had the consistent hard fault issue in my house with ~4 networks within range.

@monkbroc

This comment has been minimized.

Copy link
Member Author

commented Mar 11, 2016

Great detective work @jamesr66a! It's weird that a functionally equivalent implementation of tlv_find_tlv8 improved the situation for you.

I'll trace through wwd_scan_result_handler with GDB to see if I can see huge lengths passed to wlu_parse_tlvs. I had the crash last October at my office, but I haven't been able to reproduce the crash recently. Maybe the problem is still there, just not triggering crashes.

monkbroc added a commit that referenced this issue Mar 15, 2016
Avoid crash during wifi scan due to reading invalid RAM
Update WICED so that marginal scans of WPA protected networks don't lead to a crash.

Fixes #651
@monkbroc

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2016

@embedded-creations @jamesr66a If you want to build the firmware using the feature/safe-scan-wpa-ie branch you can check if that fixes the issue for you. Thanks again for the help pinpointing the issue.

@m-mcgowan m-mcgowan closed this Mar 15, 2016

@m-mcgowan m-mcgowan added this to the 0.5.x milestone Mar 15, 2016

@embedded-creations

This comment has been minimized.

Copy link
Contributor

commented Mar 15, 2016

Glad you found a fix! I can't easily reproduce this (it's at a customer site, not near me), but I'll give it a try the next chance I have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.