Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIP 94: Response Time Windows for Witness Rewarding #749

Merged
merged 32 commits into from
Aug 25, 2023
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
7dd0486
Propose to switch witness rewards from 14 first responding hotspots t…
disk91 Jul 24, 2023
36817ed
Merge branch 'helium:main' into main
disk91 Jul 26, 2023
fcd8ca6
Comply with template + in progress additional elements
disk91 Jul 26, 2023
647c778
Add information about the ECC timings
disk91 Jul 26, 2023
77a9b99
Create directory
disk91 Jul 26, 2023
2cae783
add illustration files
disk91 Jul 26, 2023
29511d2
Add the highly valuable hotspot illustration
disk91 Jul 26, 2023
6dc3032
Add suburb illustration
disk91 Jul 26, 2023
c26fb99
Add files via upload
disk91 Jul 26, 2023
38f8cee
Add the data packet waterfall
disk91 Jul 26, 2023
c835ba2
some precisions
disk91 Jul 26, 2023
16e43cb
Fix RX windows typo and add precision on windows advantages
disk91 Jul 26, 2023
ce6c78c
Precision about coverage area scale
disk91 Jul 26, 2023
0630929
typo
disk91 Jul 26, 2023
86dda7f
precisions on goal
disk91 Jul 26, 2023
4d69830
Add element related to MCU performance on different solutions
disk91 Jul 27, 2023
e980d01
Add the notion of EXTENEDE_TIME_WINDOWS and add exemple to illustrate…
disk91 Jul 27, 2023
fb3d044
Fix typo and add open idea
disk91 Jul 27, 2023
7fc2bc2
Alternate proposal, to be discussed
disk91 Jul 28, 2023
72f0522
Add files via upload
disk91 Jul 28, 2023
d2bb3ce
ADD impact of ECC on Witness selection
disk91 Jul 28, 2023
761e068
Add Witness waterfall details
disk91 Jul 29, 2023
91ec9cb
Add data about hotspot reduction
disk91 Jul 29, 2023
ebfc486
add illustrations
disk91 Jul 29, 2023
38b0a41
Fix typo in file name
disk91 Jul 29, 2023
b7d0f2a
Add success metrics
disk91 Jul 31, 2023
b8c6a84
Clarification & Data Addition
disk91 Aug 9, 2023
5ed50e2
HIP edits to XX-Response-Time-Windows-for-Witness-Rewarding.md
waveform06 Aug 9, 2023
bade6d6
Merge pull request #1 from waveform06/main-1
disk91 Aug 9, 2023
3e754f3
try to make alternate solution more understandable
disk91 Aug 9, 2023
1a33c3c
Number / Updating frontmatter and formatting
hiptron Aug 25, 2023
a8291f6
rename HIP file
hiptron Aug 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
188 changes: 188 additions & 0 deletions XX-Response-Time-Windows-for-Witness-Rewarding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
# HIP XX: Response Time Windows for Witness Rewarding

- Author: @disk91, @jmarcelino
- Start Date: 2023/07/20
- Category: Economic, Technical
- Tracking issue:
- Voting Requirements: veIoT Holders

# Summary

Currently the Proof-of-Coverage Oracles rewards the 14 first hotspots reporting a witness. This rewards
the fastest hotspots, incentivizing fiber backhauls and specific hardware models that happen to be able
to produce fast signatures. The result is that the same hotspots are selected, making others unviable
even if they provide unique and useful coverage for the network. In other words, punishes hotspots for
falling short of millisecond optimizations when the LoRaWAN protocol functions to the order of seconds.
A hotspot’s utility in providing LoRaWAN coverage is based on measuring “good enough” response times, not
absolute fastest as absolute speeds provides no marginal utility, Uplink does not have a particular time-window,
donwlink time windows is up to 2 seconds, Join process up to 6 seconds.

This HIP proposes to restore random hotspot selection but adding a response time window to eliminate only
slow hotspots that fail to meet LoRaWAN-grade timing constraints and push helium hotspots to improve their reponse time
over time reasonibly.

# Motivation

The LoRaWAN network has some timing constraints to be considered, these ones are related to the JOIN mechanism
and ACK/Downlink mechanism. JOIN requires a full loop within 5 seconds, up to 6. ACK/Downlink requires a full loop in 1
second for RX1 window, up to 2 seconds for RX2 windows. Out of this time frame, the response will be ignored by the devices. [Appendix](#packet-processing-and-lorawan-time-constraints) gives the consequence of these constraints on the expected hotspot response time.

LoRaWAN is a question of seconds, not a question of microseconds, this is why creating a competition between hotspots and network connectivity at a millisecond scale is not achieving any network goal.

Hotspots with highly valuable locations, such as the mountaintops, cell towers, and even rooftops
sometimes rely on higher latency connectivity (4G/5G, Home Plug, Satellites) which adds anywhere from
10ms to 100ms. These hotspots generally have a higher operating cost due to dedicated connectivity and hosting cost
and are unfairly impacted by the current selection algorithm as they still operate within LoRaWAN timing specifications.
See [appendix](#highly-valuable-coverage) about highly valuable coverage.

Hotspot out of the city centers will get a slower Internet response time from the one in the city center,
even with fast Internet access, due to some extra network hops or downgraded connectivity technology like xDSL.
See [Appendix](#suburb-valuable-coverage) about suburb valuable coverage.

Hotspot with the worst locations, indoor, in the city center, gets the best response time experience
with direct fiber connectivity.

We have seen that depending on the hardware of the hotspot, the Witness processing time is largely dependent
on the packet signature as described in the [appendix](#ecc-signature-impact). As the signature is delegated to an hardware
ECC chip for most of the deployed hotspot, the processing time depends on the hardware solution soldered.
Even if firmware can be improved, the current solution disqualifies certain hardware whatever is the hotspot owner's efforts.

More generally speaking, it disqualifies lower-end hardware like light-hotspots based on micro-controllers as, by definition, their capacity to process the same thing than an hotspot based on CPU is lower. This is nonsense as these hardware have a
better fit for stability, long-life, energy saving, lower cost, and should be privileged long term.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any proof backing this up? Because current light hotspots in the field do not exhibit the behavior you describe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evidence does not really need to be shown.MCU computation power is lower than CPU computation power. Code on MCU may be better optimized someway ... but gateway-rs code is the same for all. Currently time is mostly spent on ECC signature witch is equivalent and hide the other difference. Let's be a bit scientific or prove me that there will never be a difference of computation power in favor of CPI based system compared to MCU based system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully agree, let's be scientific and back up your claim.


The Peoples Network must be accessible to anyone and not be a competition of miliseconds optimization limited to a
small group of experts.

The witness response time is not representative of the data packet processing. Witness response time does not have any
constraint of time. Witnesses are processed by Oracle, and response time depends on the Oracle localization,
associated with the network path to reach that Oracle. It totally differ from the LNS network path.
Response time can differ from a Witness to another Witness, the absolute response time is not a good way to measure a performance responding to the LoRaWAN timing constraints. The [Witness process](#witness-processing-waterfall) and the [Packet process](#packet-processing-waterfall) are detailed in the respective appendixes.

As for a given witness, the Hotspots from a similar area will report it to the same Oracle, we can consider a response
time window starting from the first witness received by the Oracle as a viable solution to eliminate the variable offset
timing and to eliminate hotspot that are really slower and can cause a problem for data processing.

# Rationale and Alternatives

This HIP proposes to select a valid witness from the ones arriving in a time window of MAX_WITNESS_WAIT_WINDOW_MS starting
from the first received witness by the Oracle.

The MAX_WITNESS_WAIT_WINDOWS_MS parameter will be initially set to 200ms, accordingly to the calculation described in appendix.
It could be later adjusted from 100ms to 300ms by Helium Foundation to optimize the network quality without a need for a new
vote. The purpose of this adjustement is to push hardware manufacturers to optimize their solutions in a scheduled way. The initial 200ms take into consideration the ECC signature and radio backhaul normal impact vs DiY in the LoRaWan constraints.

This means:
1. Different hotspots receive a beacon and send the witness information to the related Oracle
2. The Oracle receives the first witness notification and opens a witness reception window for MAX_WITNESS_WAIT_WINDOWS_MS
milliseconds. Witness is marked valid.
3. The Oracle receives the next witnesses during the MAX_WITNESS_WAIT_WINDOWS_MS ms and mark them valid.
4. The Oracle receives the next witnesses after the MAX_WITNESS_WAIT_WINDOWS_MS ms time window and marks them invalid.
5. The Oracle selects 14 of the valid witnesses randomly to be rewarded.

# Unresolved Questions

- Packet processing is currently involving signature from ECC fro every packet, this is time costly and not mandatory.
Some discussions are already opened to move this with a software signature from an ECC derivated key negotiated with Helium
Packet Router for a given period of time.

# Deployment Impact
Oracle PoC rewarding code needs to be modified to take this into consideration. Deployment is global, Hotspots are not impacted.

# Success Metrics

# Appendix

## Highly Valuable Coverage

Some of the hotspot have a really large coverage by being installed in really good / high location. These hotspots, due to the
geographical location and the height are covering wide zone, larger than the city. That way, they are offering a uniq coverage. The following picture is illustrating this: the blue area is representing the coverage offered by this sigle hotspot, the orange
represent the city coverage provided by the mass of the other hotspot in the same area. All these hotspots are in competition for the same witness.

![Highly valuable coverage illustration](XX-response-time-windows-for-witness-rewarding/valuable-coverage.png)

This illustration is based on a real exemple, but is a general illustration for hotspot placed on cell-towers and high elevation point. The hotspot used in the illustration as model is Attractive-Olive-Cybord, located in Clermont-Ferrand, France. This hotspot is deployed on high elevation point. Coverage can be seen on mappers.helium.com, with 80km coverage from North to South.

## Suburb Valuable Coverage

The hotspots located in suburb and above, in blue in the illustration, are expending the network coverage with uniq
coverage zone and are in competition with the hotspot inside the city. They are also the hotspot that are opening the PoC rewarding to the next hotspot out of the city and suburb.

![Suburb illustration](XX-response-time-windows-for-witness-rewarding/suburb-coverage.png)

These hotspots does not get benefit of the fastest Internet connection as their fiber connectivity will pass through the city center to reach the main Internet highways. For most of them the fiber connectivity will not be available and they are going to rely on xDSL connectivity before reaching the ISPs fiber Internet backhall.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a pretty big assumption without supporting evidence. I don't think this holds true in general, but it might hold true in the particular example you show.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also an evidence, that the way the networks are build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An evidence of what? If you're saying that fiber will only pass city centers and nowhere else then I don't think your statement holds true.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An evidence of what? If you're saying that fiber will only pass city centers and nowhere else then I don't think your statement holds true.

https://fttx.gr/

that's a basic example to your ''asking for evidence'' claims,
if you see the map, you can see cabinets of fiber, the center of city has the most dense fiber infrastructure while rural are barely covering anything, the unique coverage is where there is no fiber.
hope that helps to draw a picture what your HIP 83 actually did.


These hotspots are participating to the same PoC as the city center hotspot getting benefit of the fastest Internet connection.

## Packet Processing and LoRaWan time constraints

The Helium Packet Router (HPR) is accepeting all the coming packets up to the limit of the max_copies set on the route or eui in
the config service. First come, first paid. Only for roaming the HPR is applying a time limit. The time limit for a packet is
decided by the LNS after the HPR. This time limit for the LNS is a LNS setup and can vary for each of the LNS.

LoRaWAN time constraints are the following:
- UPLINK first copy arrival has no time constraint, next copies are withing the defined LNS deduplication time windows.
- JOIN REQUEST needs to be responded within 5s for RX1 (same frequency, standard power) or 6s for RX2 (other frequency, potentially higher power). LNS decides of the selection between RX1 and RX2 dynamically, according to the time available.
- DOWNLINK REQUEST / ACK needs to be responded within 5s for RX1 (same frequency, standard power) or 6s for RX2 (other frequency, potentially higher power). LNS decides of the selection between RX1 and RX2 dynamically, according to the time available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downlinks are 1 second RX1 and 2 second RX2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally agree, thank you for reporting typo.


There is no reasons to prefer RX1 vs RX2, most of the implementations try to reach RX1 first, but RX2 provides different advantages, in particular in Europe where the duty cycle on RX2 is better and the higher power makes more chance to reach the device.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If RX1 suffices it certainly has benefits over RX2 because it uses the same channel (= more bandwidth available) and the device can return to sleep faster (= energy saving)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it also have larger negative impacts : it has a higher risk of collision because it create a larger use of the same bandwidth (so basically you have less bandwidth) , is has a 1% duty cycle, really impacting the gateway duty cycle shared across all the devices and the limited power in Europe is a great disadvantage with a larger reception loss due to the unbalance noise level and antenna radio gain at the device level. This conduct to retransmission with a higher cost than the wait for 1s. Downlink usage in a correct implementation is limited to a couple per day and at the end the global power impact is limited. This is what an experienced device developper will tell you about it. RX1 / RX2 is a LNS choice not a network choice. As the HIP demonstrate it, the factor impacting the choice between RX1 and RX2 in a windows of 200ms is at first related to the distance between the device and the LNS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerning the global remark, PoC reward coverage, if we take that exemple - and this is just and exemple as you requested exemples, to illustrate a general principle that apply to any cell-tower hotspot - this hotspot is 60km radius (80 for real) covering 60603.14 km2 of territory. This is to be compared with regular good hotspot 10103.14 km2. basically the coverage is 36x more. If we compare to the average coverage witch is more about 1km range 1x1x3.14 km2 is about 3600x. You can consider that x7 in reward well paid and I won't agree in this.
The question is not about the amount of reward this particular hotspot have. The question is why this hotspot get a penalty that can conduct some of the them to get 0 reward, compared to the one in city center, with fiber, providing a small coverage than get a benefit of the current HIP-83 situation.
HIP83 did not show any evidence of it's positive expected impact when written. So please show us evidence that the rewarding is, since HIP-83, better balanced to extend the network coverage. That way we will have something interesting to discuss.


Basically, only the DOWNLINK REQ / ACK creates generate a time constraint at the time scale discussed in this HIP. This time constraints is to make sure that the order to send the ACK or the DOWNLINK transfer order comes to the desired Hotspot before the RX2 windows.

In this constraint of time, we need to execute the [full packet processing steps](#packet-processing-waterfall).

Based on this the LNS is able to accept incoming packets in a time windows with a minimal duration of 210ms with a 200ms margin,
so basically up to 350-400ms.

## ECC Signature impact

The ECC Signature process is having a significant impact on the overall processing, some measure have been conducted by
community members like Miroslav (heliootics) & co, Jose Marcelino, using the gateway-mfr-rs test kit. For the tested hotspot provider we can the following average signature time:

| Manufacturer Brand | Model | Avg Signature Time | Note |
| ------------------ | ----- | ------------------ | ---- |
| Calchip v1 | | ? | no ECC |
| DiY | x86 | ? | no ECC |
| DiY | RPI 4 | ? | no ECC |
| Bobcat | RK3566 | 105 ms | |
| Milesight | | 105 ms | |
| Nebra | indoor CM3 v1 | 113 ms | |
| Synchrobit | CM4 | 120 ms | |
| Cotx | X3 | 130 ms | |
| Heltec | HT-M2808 | 135 ms | |
| Bobcat | Others | 150 ms | |
| Linxdot | | 152 ms | |
| Dusun | | 154 ms | |
| Nebra | indoor CM3 v2 | 157 ms | |
| RAK / MNTD | Gold | 165 ms | |
| RAK / MNTD | Black | 174 ms | |
| Pycom | Other | 175 ms | |
| Sensecap | M1 | 175 ms | |
| PantherX1 | | 179 ms | |
| Pisces | | 180 ms | |
| Controllino | | 180 ms | |
| Heltec | Other | 242 ms | |

The signature impact on the first to arrive show a variability up to 250ms. ( to be completed with the non ECC device data later one)


## Packet Processing Waterfall

The following waterfall represents the different steps in the data packet processing, in the case of a packet requesting a ACK or a downlink. The HPR is geo-replicated, time to reach it is within the zone.

Two scenarios are identified:
- the first one in gray is the best case scenario, hotspot is fast and the LNS is in the same zone the device is, so the communication from HPR to LNS then LNS to Hotspot is short.
- the second one in orange is the worst case scenario, hotspot is slower and LNS and device are in at the longest Internet distance, considering 600ms.

The first scenario gives an idea of the acceptable copies reception windows to match with the RX1 window, in blue, up to 600ms. The second scenario gives an idea of the acceptable copies reception windows to match with RX2 window (RX1 not achievable due to round trip delay), up to 210ms. Both scenario includes a margin of 200ms for non seen yet.

![Packer Processing Waterfall](XX-response-time-windows-for-witness-rewarding/packet-processing-waterfall.png)

This minimum reception window to achieve the worst case has been used to propose the initial time windows for accepting the
witnesses.


## Witness Processing Waterfall




1 change: 1 addition & 0 deletions XX-response-time-windows-for-witness-rewarding/empty
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.