Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACM MVP #2689

Merged
merged 24 commits into from Nov 30, 2023
Merged

ACM MVP #2689

merged 24 commits into from Nov 30, 2023

Conversation

scott-brust
Copy link
Member

@scott-brust scott-brust commented Aug 24, 2023

Problem

MSoM hardware needs support for basic ACM. This PR adds that support

Solution

ACM MVP consists of a number of specific items:

  1. Add system_connection_manager to manage the preferred network interface as well as the connection tester logic. The connection tester will test all of the ready network interfaces and collect metrics on their latency
  2. Do not power on the modem when cellular_device_info() is called. When the modem is on, cache the response to the AT commands. Return the cached data if cellular_device_info() is called when the modem is off
  3. Implementing Particle.connect(WiFi) to force the cloud connection over the specified network. This will bind the cloud socket to that interface, even if that network is not ready/up/routable.
  4. Implement Network.preferred() / Network.isPreferred(). This will prioritize the specified network, and assuming it is up and ready when opening the cloud connection, that network will be bound to the socket.
  5. Implement Particle.connectionInterface() to return which network is in use for the current cloud connection.
  6. Separate Wifi + Cellular signal structures now that a device can have both. The currently active interface will be returned in the primary signal structure. An alternate signal structure will be sent with the other interfaces statistics. This needs server side work to support. Currently the alternate signal data is sent as application diagnostics and are parsed as raw integers.

Steps to Test

Run example app, bring up any number of network interfaces, use the 1 command to test the interface, then 2 to print some stats about the test diagnostics. p to force a publish and see the combined vitals signal data

Example App

See blank/application.cpp

Example output:

Running internet test by pressing 1, with all 3 interfaces configured and connected:

0003755362 [app] INFO: Ethernet Ready: 1 WiFi ready: 1 Cellular ready: 1 Cloud conn: 3
0003755365 [system.cm] TRACE: Resolving publish-receiver-udp.particle.io#40000
0003755666 [system.cm] INFO: test socket # 1, wl4 bound to WiFi     connected to 54.234.140.231#40000
0003755671 [system.cm] INFO: test socket # 2, en2 bound to Ethernet connected to 54.234.140.231#40000
0003755679 [system.cm] INFO: test socket # 3, pp3 bound to Cellular connected to 54.234.140.231#40000
0003755690 [system.cm] TRACE: sock 1 packet # 1 tx > 203
0003755739 [system.cm] TRACE: sock 2 packet # 1 tx > 203
0003755741 [system.cm] TRACE: sock 3 packet # 1 tx > 203
0003756771 [system.cm] TRACE: sock 1 packet # 1 rx < 203
0003756773 [system.cm] TRACE: sock 1 packet # 2 tx > 203
0003757002 [system.cm] TRACE: sock 2 packet # 1 rx < 203
0003757014 [system.cm] TRACE: sock 2 packet # 2 tx > 203
0003757527 [system.cm] TRACE: sock 3 packet # 1 rx < 203
0003757529 [system.cm] TRACE: sock 3 packet # 2 tx > 203
0003757862 [system.cm] TRACE: sock 1 packet # 2 rx < 203
0003757864 [system.cm] TRACE: sock 1 packet # 3 tx > 203
0003758295 [system.cm] TRACE: sock 2 packet # 2 rx < 203
0003758323 [system.cm] TRACE: sock 2 packet # 3 tx > 203
0003758808 [system.cm] TRACE: sock 3 packet # 2 rx < 203
0003758810 [system.cm] TRACE: sock 3 packet # 3 tx > 203
0003758955 [system.cm] TRACE: sock 1 packet # 3 rx < 203
0003758957 [system.cm] TRACE: sock 1 packet # 4 tx > 203
0003759616 [system.cm] TRACE: sock 2 packet # 3 rx < 203
0003759631 [system.cm] TRACE: sock 2 packet # 4 tx > 203
0003760041 [system.cm] TRACE: sock 1 packet # 4 rx < 203
0003760044 [system.cm] TRACE: sock 1 packet # 5 tx > 203
0003760088 [system.cm] TRACE: sock 3 packet # 3 rx < 203
0003760090 [system.cm] TRACE: sock 3 packet # 4 tx > 203
0003760931 [system.cm] TRACE: sock 2 packet # 4 rx < 203
0003761125 [system.cm] TRACE: sock 1 packet # 5 rx < 203
0003761367 [system.cm] TRACE: sock 3 packet # 4 rx < 203
0003761369 [system.cm] INFO: WiFi     sent 5 packets, got 5 packets, avg rtt: 1085
0003761375 [system.cm] INFO: Ethernet sent 4 packets, got 4 packets, avg rtt: 1285
0003761382 [system.cm] INFO: Cellular sent 4 packets, got 4 packets, avg rtt: 1405

Example combined vitals:

network":{
"cellular":{
"radio_access_technology":"LTE"
"operator":"Verizon Wireless"
"cell_global_identity":{
"mobile_country_code":311
"mobile_network_code":"480"
"location_area_code":8194
"cell_id":8348674
}
}
"signal":{
"at":"LTE Cat-M1"
"strength":100
"strength_units":"%"
"strengthv":-67
"strengthv_units":"dBm"
"strengthv_type":"RSRP"
"quality":62.5
"quality_units":"%"
"qualityv":-9
"qualityv_units":"dB"
"qualityv_type":"RSRQ"
}
...
"app":{
"vital_44":1     // This is the cloud connection. It will be a shared enum between server and device. see below for definition
"vital_49":1   // This is the alternate signal RAT see `hal_net_access_tech_t` for definitions
"vital_46":25087
"vital_47":24773
"vital_45":-51
"vital_48":39
}

vital_44 is the cloud connection enum:

    enum NetworkInterface {
        UNKNOWN = 0,
        ETHERNET = 1,
        CELLULAR = 2,
        WIFI = 3
    };

hal/src/msom/network/network.cpp Outdated Show resolved Hide resolved
system/src/system_network_diagnostics.cpp Show resolved Hide resolved
system/src/system_network_diagnostics.h Outdated Show resolved Hide resolved
system/src/system_network_manager.cpp Show resolved Hide resolved
system/src/system_task.cpp Outdated Show resolved Hide resolved
@eberseth
Copy link
Contributor

How are the interfaces encoded for this output? IF#1 is cellular and IF#2 WiFi?

0000756758 [app] INFO: Ethernet Ready: 0 WiFi ready: 1 Cellular ready: 1
0000756831 [system.nd] INFO: IF #3 failed to resolve DNS for publish-receiver-udp.particle.io : 0.0.0.0
0000756855 [system.nd] INFO: Testing IF #4 with publish-receiver-udp.particle.io : 54.234.140.231
0000758420 [system.nd] INFO: UDP begin 1 send: 5 receive: 5 message: Hello
0000758442 [system.nd] INFO: Testing IF #5 with publish-receiver-udp.particle.io : 54.234.140.231
0000759523 [system.nd] INFO: UDP begin 1 send: 5 receive: 5 message: Hello
0000804680 [app] INFO: Ethernet Ready: 0 WiFi ready: 1 Cellular ready: 1
0000804682 [app] INFO: LWIP link  xmit: 199 recv 193 drop 1 err 1
0000804688 [app] INFO: LWIP udp   xmit: 56 recv 87 drop 10 err 0
0000804694 [app] INFO: interface 3 dns_att:2 dns_fail: 2 tx 0 rx 0
0000804699 [app] INFO: interface 4 dns_att:2 dns_fail: 0 tx 10 rx 10
0000804706 [app] INFO: interface 5 dns_att:2 dns_fail: 1 tx 5 rx 5

@scott-brust
Copy link
Member Author

scott-brust commented Aug 29, 2023

How are the interfaces encoded for this output? IF#1 is cellular and IF#2 WiFi?

These are network_interface_index enum values used within Device OS, ie:

/**
 * Network interfaces.
 */
typedef enum network_interface_index {
    NETWORK_INTERFACE_ALL = 0,
    NETWORK_INTERFACE_LOOPBACK = 1,
    // NETWORK_INTERFACE_MESH = 2, // Deprecated
    NETWORK_INTERFACE_ETHERNET = 3,
    NETWORK_INTERFACE_CELLULAR = 4,
#if !(HAL_PLATFORM_CELLULAR && HAL_PLATFORM_WIFI)
    // XXX: we cannot easily change these numbers for existing platforms,
    // as it will break ABI compatibility.
    NETWORK_INTERFACE_WIFI_STA = 4,
    NETWORK_INTERFACE_WIFI_AP = 5
#else
    // For new platforms that have both cellular and wifi interfaces available
    // we use new correct definitions.
    NETWORK_INTERFACE_WIFI_STA = 5,
    NETWORK_INTERFACE_WIFI_AP = 6
#endif // !(HAL_PLATFORM_CELLULAR && HAL_PLATFORM_WIFI)
} network_interface_index;

ie
Ethernet = 3
Cellular = 4
Wifi station = 5

Note these are different than the LWIP internal indexes for the interfaces.
These are the numbers that are on the interface names. so
Ethernet = en2 = LWIP interface index 2
Cellular = pp3 = 3
Wifi station = wl4 = 4
When looking up name<->interface index with LWIP stack these are the numbers used

@scott-brust
Copy link
Member Author

Binding the cloud connection to a specific network interface is still a work in progress. There are issues with closing/recreating the socket with it bound to a different interface. Will keep working through the details but the last commit should be roughly the direction we want to go in.

Cellular + Wifi connection binding works as expected. I can use the test app to bind to either and I can see the cloud traffic goes through the bound interface. Swapping between the two works fine with the cloud connection

Ethernet binding is much less reliable, but I think that is due to the current issues with ethernet + cloud connection in general with p2/msom platform

@scott-brust scott-brust force-pushed the feature/sc-120714/network-tester branch 2 times, most recently from 03739dd to acb1028 Compare September 29, 2023 20:55
@scott-brust scott-brust changed the title Network Interface Tester First Draft ACM MVP Oct 3, 2023
@scott-brust scott-brust force-pushed the feature/sc-120714/network-tester branch from e1e2be8 to 5fe67ab Compare October 17, 2023 16:29
@scott-brust
Copy link
Member Author

Refactored to use sock_poll() instead of UDPClient
Running the reachability test looks like this now (Ethernet is disconnected):

0001170027 [system.cm] TRACE: Resolving publish-receiver-udp.particle.io#40000
0001170107 [system.cm] INFO: test socket # 1, name en2 bound to Ethernet connected to 54.234.140.231#40000
0001170112 [system.cm] INFO: test socket # 2, name wl4 bound to WiFi     connected to 54.234.140.231#40000
0001170122 [system.cm] INFO: test socket # 3, name pp3 bound to Cellular connected to 54.234.140.231#40000
0001170131 [system.cm] TRACE: sock 1 packet # 1 tx > 262
0001170137 [system.cm] TRACE: sock 2 packet # 1 tx > 262
0001170142 [system.cm] TRACE: sock 3 packet # 1 tx > 262
0001171219 [system.cm] TRACE: sock 2 packet # 1 rx < 262
0001171222 [system.cm] TRACE: sock 2 packet # 2 tx > 262
0001171642 [system.cm] TRACE: sock 3 packet # 1 rx < 262
0001171645 [system.cm] TRACE: sock 3 packet # 2 tx > 262
0001172305 [system.cm] TRACE: sock 2 packet # 2 rx < 262
0001172307 [system.cm] TRACE: sock 2 packet # 3 tx > 262
0001172923 [system.cm] TRACE: sock 3 packet # 2 rx < 262
0001172925 [system.cm] TRACE: sock 3 packet # 3 tx > 262
0001173391 [system.cm] TRACE: sock 2 packet # 3 rx < 262
0001173393 [system.cm] TRACE: sock 2 packet # 4 tx > 262
0001174204 [system.cm] TRACE: sock 3 packet # 3 rx < 262
0001174206 [system.cm] TRACE: sock 3 packet # 4 tx > 262
0001174476 [system.cm] TRACE: sock 2 packet # 4 rx < 262
0001174478 [system.cm] TRACE: sock 2 packet # 5 tx > 262
0001175483 [system.cm] TRACE: sock 3 packet # 4 rx < 262
0001175484 [system.cm] INFO: Ethernet sent 1 packets, got 0 packets, avg rtt: 0
0001175491 [system.cm] INFO: WiFi     sent 5 packets, got 4 packets, avg rtt: 1083
0001175498 [system.cm] INFO: Cellular sent 4 packets, got 4 packets, avg rtt: 1333

@scott-brust scott-brust added the 5.x label Nov 9, 2023
@scott-brust scott-brust added this to the 5.6.0 milestone Nov 9, 2023
@@ -62,6 +62,11 @@ void spark_cloud_udp_port_set(uint16_t port)
cloud_udp_port = port;
}

uint16_t spark_cloud_udp_port_get()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more authoritative source for the cloud connection port to use?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloud_udp_port is it, so I think you are good. Do you want to update port = cloud_udp_port; to port = spark_cloud_udp_port_get() as well? If we are not planning on using spark_cloud_udp_port_set(), we could remove it.

system/src/system_connection_manager.cpp Show resolved Hide resolved
system/src/system_network_diagnostics.cpp Outdated Show resolved Hide resolved
system/src/system_network_diagnostics.h Show resolved Hide resolved
@scott-brust scott-brust force-pushed the feature/sc-120714/network-tester branch from 108a4ea to 73a5f67 Compare November 14, 2023 18:46
@scott-brust scott-brust force-pushed the feature/sc-120714/network-tester branch from 21e73fa to 36d256d Compare November 20, 2023 16:30

const uint8_t REACHABILITY_TEST_MSG = 252;
const unsigned REACHABILITY_MAX_PAYLOAD_SIZE = 256;
const unsigned REACHABILITY_TEST_DURATION_MS = 5000;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is arbitrary, what should this be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we said maybe this should be changed so it doesn't blindly run for 5s, but maybe try to run 5 times and timeout after 5 seconds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently its a tradeoff of how much traffic do we want to generate vs how long do we want to block the system task, as the connection tester is done on the system thread currently.
We should probably lower the test time, or run the test on its own thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that there is a more reliable amount of data used for this tester, a predetermined count of packets seems reasonable? If you do it based on time alone, you will have more or less packets depending on how fast the networks are at that given time, but more packets than a certain amount will likely not contribute to a better test value. Still though you might have some really slow networks and a timeout will be required.

@scott-brust scott-brust force-pushed the feature/sc-120714/network-tester branch from 36d256d to 5f05700 Compare November 20, 2023 17:56
snprintf(tmpserv, sizeof(tmpserv), "%u", tmpport);
LOG(TRACE, "Resolving %s#%s", tmphost, tmpserv);
// TODO: get addrinfo/server IP from DNS lookup using the specific interface?
r = netdb_getaddrinfo(tmphost, tmpserv, &hints, &info);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is becoming a problem when different interfaces DNS servers resolve the device service hostname to different IPs and we try to use the same IP to try to connect with all the interfaces.
IE Some situations im seeing are wifi+ethernet sockets will connect but cellular doesnt and vice versa.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example
Cellular + WIFI DNS servers:

[2023-11-28 11:45:14.380] 0000204617 [app] INFO: DNS IP 8.8.4.4
[2023-11-28 11:45:14.380] 0000204621 [app] INFO: DNS IP 8.8.8.8
[2023-11-28 11:45:14.385] 0000204624 [app] INFO: DNS IP 192.168.1.1

Server IP Resolves to 3.222.143.118#5684

Ethernet + Cellular + WIFI DNS servers:

[2023-11-28 11:45:51.616] 0000241854 [app] INFO: DNS IP 192.168.0.1
[2023-11-28 11:45:51.622] 0000241858 [app] INFO: DNS IP 8.8.4.4
[2023-11-28 11:45:51.622] 0000241861 [app] INFO: DNS IP 8.8.8.8
[2023-11-28 11:45:51.627] 0000241865 [app] INFO: DNS IP 192.168.1.1

Server IP resolves to a different address 3.215.122.234#5684
For whatever reason, cellular sometimes cannot reach this IP, or the traffic is much slower than with its own resolved IP

@scott-brust scott-brust force-pushed the feature/sc-120714/network-tester branch from d53d078 to 08b5d6f Compare November 28, 2023 18:40
@scott-brust scott-brust merged commit 61f4702 into develop Nov 30, 2023
13 checks passed
@scott-brust scott-brust deleted the feature/sc-120714/network-tester branch November 30, 2023 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants