Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Photon/P1: A number of networking-related fixes [v2] #1500

Merged
merged 7 commits into from Apr 5, 2018

Conversation

@avtolstoy
Copy link
Member

commented Mar 14, 2018

Solution

  1. Fixes a socket resource leak in TCPServer::available()
  2. Default write timeouts added to TCPClient (30s) and cloud socket (20s)
  3. Removed socket_close_all() from wlan_disconnect_now()
  4. Makes socket_xxx() calls thread safe
  5. particle-iot/photon-wiced#29

This PR is based on #1492 and targets its branch for ease of review.

Steps to Test

N/A

References


Completeness

  • User is totes amazing for contributing!
  • Contributor has signed CLA (Info here)
  • Problem and Solution clearly stated
  • Run unit/integration/application tests on device
  • Added documentation
  • Added to CHANGELOG.md after merging (add links to docs and issues)

@avtolstoy avtolstoy added this to the 0.8.0-rc.3 milestone Mar 14, 2018

@avtolstoy avtolstoy requested a review from sergeuz Mar 14, 2018

@elcojacobs

This comment has been minimized.

Copy link
Contributor

commented Mar 14, 2018

Good to see you're working on this again.

Any other steps needed to test this than merging the branch?
Any tests you want me to re-run?

Could/should this branch be rebased on 0.8.0-rc.2?

@avtolstoy

This comment has been minimized.

Copy link
Member Author

commented Mar 14, 2018

Hey @elcojacobs.

Apart from fixing the hardfault you were seeing in #1371, there are a couple of preventive measures implemented here that should guard the system thread from being blocked (at the very least for a substantial amount of time) due to socket and networking operations.

I did test this with your wifi-test application, but I'd appreciate if you could try these changes in your environment.

The only issue that this PR does not address (because I couldn't reproduce it) is the latest blinking green reported in #1492, but I'm hoping it will no longer present itself.

Let me know how it goes, and I'm hoping we can release these changes in 0.8.0-rc.3 as soon as possible.

@avtolstoy

This comment has been minimized.

Copy link
Member Author

commented Mar 14, 2018

Could/should this branch be rebased on 0.8.0-rc.2?

You may merge this branch into 0.8.0-rc.2 if needed, but it's based on #1492 originally, which in turn should be based on develop, which should be at or around 0.8.0-rc.2. So in theory you can use it as-is as well.

@elcojacobs

This comment has been minimized.

Copy link
Contributor

commented Mar 14, 2018

Allright, I'll skip the 0.8.0-rc.2 rebase. The changes seem to mostly in version flags.
I think the fast blinking green could be explained by a hanging system thread. If that was caused by a hanging TCP write, this would indeed prevent that.

As a last failsafe watchdog, I would like to reset the network stack if no communication has taken place for 15 minutes.
What would be the safest and most complete way to do that without rebooting? I don't want to reset the user thread, but I do want to reset everything related to network and TCP.

@avtolstoy

This comment has been minimized.

Copy link
Member Author

commented Mar 14, 2018

Depending on the meaning of "no communication has taken place for 15 minutes", I'd say something like running a separate thread (or using an application watchdog) and doing WiFi.off() and WiFi.on() in it should do it, but it's not a bullet-proof solution, and might not help in certain cases.

You might try wlan_restart(nullptr);, but that function is not thread-safe and might make things even worse.

@elcojacobs

This comment has been minimized.

Copy link
Contributor

commented Mar 14, 2018

Ok, based on your edits, I guess WiFi.off() and Wifi.on() are causing problems? What did you encounter?

I will stop TCP communication before taking down WiFi on purpose.

The user thread is not blocked, so I can add the check there. I already have a hard reboot watchdog for when the user thread stalls.
I can wrap parts of the code in SINGLE_THREADED_BLOCK(), as long as the code inside that block has no potential to block.

Ideally, I would be able to set a flag that causes the system thread to restart WiFi in the background, without blocking the user thread.
If there are flags I need to monitor or set to prevent problems and to prevent using waitFor(), that would be best.

Pseudo code example:

if(no communication since 15 min){
   if (wifi enabled){
      tcpClient.stop();
      tcpServer.stop();
      Particle.disconnect();
      wifi.disconnect();
      wifi.off()
   }
   if(wifi fully shut down){
       wifi.on();
       wifi.connect(WIFI_SKIP_LISTEN);
   }
}
@elcojacobs

This comment has been minimized.

Copy link
Contributor

commented Mar 14, 2018

wlan_restart(nullptr) results in hard fault and wrapping it in a SINGLE_THREADED_BLOCK() results in going from breathing cyan to breathing green, without connectivity (ping).

                    SINGLE_THREADED_BLOCK(){
                        // wlan_restart is not thread safe
                        wlan_restart(nullptr);
                    }
@avtolstoy

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2018

@elcojacobs Wrapping wlan_restart() in SINGLE_THREADED_BLOCK() will definitely not work, as some of the internal WICED threads need to cleanly exit during that call.

If your system thread is working normally (and I hope that now we've eliminated any potential network-related issues/lockups), then as I originally suggested WiFi.off()/on() should work just fine. Otherwise it's a bug that should be fixed.

@elcojacobs

This comment has been minimized.

Copy link
Contributor

commented Mar 15, 2018

WiFi.off();
WiFi.on();

Seems to cause no crashes. It does result in a strange status LED when I turn off WiFi on the router.

State: breathing cyan, connected to cloud, everyone is happy.

Action: turn off wifi on router
New state: breathing dark blue

Action: re-enable WiFi
New state: fast blinking green (connecting to WiFi).

It remains in this state, even after the WiFi connection has been established. It responds to ping and local TCP traffic. WiFi.ready() returns false.
If I reset the device, it connects to the cloud and breaths cyan again.

Alternative:

WiFi.off();
WiFi.on();
Particle.connect();

State: breathing cyan, connected to cloud, everyone is happy.

Action: turn off wifi on router
New state: blinking green

Action: re-enable WiFi
New state: fast blinking cyan, followed by breathing cyan.


Seems like the second solution works. Leaving out Particle.connect() seems to trigger a bug.

@elcojacobs

This comment has been minimized.

Copy link
Contributor

commented Mar 15, 2018

Upon further testing, first behavior also occurred with the code below to reset:

                    Particle.disconnect();
                    WiFi.off();
                    WiFi.on();
                    Particle.connect();

Going to slow breathing dark blue on WiFi loss, staying in fast blinking green on WiFi reconnect.

It is probably timing related. Maybe the WiFi.off() call is made just when the wifi is reconnecting.

@elcojacobs

This comment has been minimized.

Copy link
Contributor

commented Mar 15, 2018

Wrapping in if(WiFi.connecting()) seems to be a good workaround.

            ticks_seconds_t sinceReceive = ticks.timeSinceSeconds(lastReceive);
            ticks_seconds_t sinceReset = ticks.timeSinceSeconds(lastReset);
            ticks_seconds_t now = ticks.seconds();
            if(sinceReceive > TCP_RESTART_ON_NO_RECEIVE_INTERVAL && sinceReset > TCP_RESTART_ON_NO_RECEIVE_INTERVAL){
                // no receive for the past 10 minutes
                if(tcpServerRunning){
                    stopTcp();
                    lastReset = now;
                    // restart will be handled next time this function is called if WiFi is ready.
                }
                if(sinceReceive > WIFI_RESTART_ON_NO_RECEIVE_INTERVAL){
                    // Restarting TCP didn't help
                    stopTcp();
                    if(!WiFi.connecting()){
                        Particle.disconnect();
                        WiFi.off();
                        WiFi.on();
                        Particle.connect();
                    }
                    lastReceive = now; // reset last receive to delay next restart
                    lastReset = now;
                }
            }
            else if(!WiFi.ready() || WiFi.RSSI() >= 0){
                // WiFi is in error state, stop TCP server
                if(tcpServerRunning){
                    stopTcp();
                }
            }
            else{
                if(!tcpServerRunning){
                    startTcp();
                }
            }
@elcojacobs

This comment has been minimized.

Copy link
Contributor

commented Mar 17, 2018

I found a new issue.

When building with MODULAR=n

  • The photon does not connect to WiFi, it keeps just blinking green
  • When I enter listening mode and use the Particle app to set up the photon, I get a hard fault SOS (1 blink)

The same code works with a modular build.

I wasn't sure whether to list this as a new issue or comment here.

WiFi test app output for non-modular build:

TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1112136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1113136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1114136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
0001115130 [hal.wlan] ERROR: wiced_join_ap_specific(), result: 1024
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
network event 11
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1115136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
network event 6
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
0001115272 [hal.wlan] INFO: Joining BrouwerijDeSchutter
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
0001115272 [hal.wlan] TRACE: Free RAM connect: 51992
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1116136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1117136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1118136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1119136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1120136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1121136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1122136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
0001122371 [hal.wlan] ERROR: wiced_join_ap_specific(), result: 1024
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
0001122371 [hal.wlan] INFO: Joining BrouwerijDeSchutter
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
0001122371 [hal.wlan] TRACE: Free RAM connect: 51992
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1123136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1124136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1125136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1126136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1127136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1128136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
WiFi.ready(): 0         IP: 0.0.0.0             RSSI: 2         TCP client connected: 0         millis(): 1129136
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
0001129471 [hal.wlan] ERROR: wiced_join_ap_specific(), result: 1024
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
0001129471 [hal.wlan] INFO: Joining BrouwerijDeSchutter
TCP Error: Could not open port socket://192.168.1.61:6666: [Errno 113] No route to host)
@elcojacobs

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2018

To confirm that this is a bug in Particle code:

Just run in the /main directory of this repo (on this branch):

make clean all PLATFORM=p1 WARNINGS_AS_ERRORS=n MODULAR=n program-dfu

And confirm that the device does not connect to WiFi anymore.
This is with the default Particle app!

@elcojacobs

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2018

In addition to the bug above, please do some testing of listening mode with these changes. A user testing this for me experienced issues setting up WiFi with the Particle app.

@technobly technobly requested review from technobly and removed request for sergeuz Apr 5, 2018

@technobly technobly merged commit 5466956 into fix/photon-tcp Apr 5, 2018

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@technobly technobly deleted the fix/photon-networking-1 branch Apr 5, 2018

@technobly technobly restored the fix/photon-networking-1 branch Apr 5, 2018

@technobly technobly deleted the fix/photon-networking-1 branch Apr 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.