Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIP: LongFi Semantics #3

Merged
merged 14 commits into from
Sep 18, 2019
213 changes: 213 additions & 0 deletions text/0000-longfi.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
- Start Date: 2019-08-16
- HIP PR: <!-- leave this empty -->
- Tracking Issue: <!-- leave this empty -->

Table of Contents
=================

* [Summary](#summary)
* [Regulatory](#regulatory)
* [Protocol](#protocol)
* [Versioning](#versioning)
* [Joining](#joining)
* [Datagram](#datagram)
* [Uplink](#uplink)
* [Downlink](#downlink)
* [Fragmentation](#fragmentation)
* [Channels](#channels)

## Summary
[summary]: #summary

This whitepaper introduces the high-level semantics of LongFi, the Helium network's wireless protocol.

Providing wide-area wireless connectivity is the Helium network's _raison d'être_. Providing this connectivity in a manner that is implementable by both Helium and third-parties requires a free and open protocol that devices, hotspots, and routers understand.

This proposal is not an all-encompassing specification but lays the foundation for further HIPs which will serve as the specification.

## Versioning
[versioning]: #versioning

LongFi is versioned so that it can be improved in future revisions without breaking backward compatibility.

## Regulatory
[regulatory]: #regulatory

Regulations on intentional radiators vary region by region. These regulations inform much of LongFi's design, primarily...

> TODO:
> - time on air
> - duty cycle

## Protocol
[protocol]: #protocol

LongFi is a session-oriented protocol. However, unlike most wireless protocols which operate within a network of trusted base stations, devices in the Helium communicate _through_ untrusted hotspots. Therefore, sessions in the Helium network are between devices and routers. Sessions persist regardless of which or how many hotspots receive their packets.
Copy link
Contributor

@fvasquez fvasquez Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So hotspots have no awareness of sessions? They merely see individual packets from devices that they forward on to their intended routers.

Are sessions mandatory for LongFi? Is there no session-less means of communication like UDP?

What about downlink communications from routers to devices? Are sessions bi-directional or are downlink sessions something different?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a form of a session somewhere to have any down-link. The association between connected device to hotspot target is less certain with session associated with the router, but it may be manageable. We won't know how manageable this is until this is deployed in the wild.

I believe you will need to establish a session from the device side, where they can query if a down-link packet is available or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are sessions mandatory for LongFi? Is there no session-less means of communication like UDP?

From a practical standpoint, I don't think we can enforce sessions if a device/router pair decided not to; at least there's no way for us to force some a device/router pair from using the hotspots to move packets despite a lack of session. And I think your UDP analogy is really good.

That being said, I think the LongFi spec is describes a possible device/router protocol in addition to the hotspot routing protocol.

As a side-note, have we dropped LoFi/HiFi nomenclature, @JayKickliter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we dropped LoFi/HiFi nomenclature

Yes, at least for now. I didn't want to force the separation from the outset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using fountain codes, we will implicitly have sessions between routers and devices to fulfill the droplet aggregation process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why droplet aggregation has to be on the hotspot vs router?

If we use fountain code, reconstruction will be on the router and not on the hotspot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what the reasoning was though? I had dreamed up schemes where aggregation/reassembly could happen router side

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I thought I was saying:

  • Hotspot: does not do reassembly
  • Router: does reassembly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or am I misreading something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am the one misreading something. My bad


```
┌──────────┐
│ Router │
└──────────┘
┌─────┴─────┐
▼ ▼
┌─────────┐ ┌─────────┐
│ Hotspot │ │ Hotspot │
└─────────┘ └─────────┘
▲ ▲
└─────┬─────┘
┌────────────┐
│ Device │
└────────────┘
```

### Joining
[joining]: #joining

When device starts up, it is session-less, or not connected to its organization's router. The process of establish a session is called joining. The send and response layer is called a super frame. All call/response messages will have the following fields at a minimum:

Datagram Key (DGK) - OUI - Device ID (DID) - Fingerprint (FP)

| DGK | OUI | DID | FP |
|-----|-----|-----|----|

Sessions have a finite lifetime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what? A known timeout?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the second standard deviation of total payload size in addition to the payload (198% total) is collected without getting a full message, the session is terminated and it needs to be re-initiated right now. This puts a cap on chattyness of devices and gives us a fixed memory overhead required for embedded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You maybe answered a different question @refugeesus ? I think the maximum length of a packet is after encoding is different than lifetime session, which I think @fvasquez is asking about.

I don't think we have defined lifetime of a session. It needs to be as infrequent as possible, to save on resources, but frequent enough to be secure. So it might depend on how many bytes you've sent using that session key?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working from payloads being collections of packets.

The total lifetime of a session, given a payload 100 bytes in length, is 198 bytes collected or the maximum amount of time it would take to send this amount of data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unique session key per ACK'ed payload sounds expensive as I mention here #3 (comment)


Super frames contain necessary connection information and requested payload size needed to facilitate communication. The added fields are: Session ID (SID), Payload Size (PLS). The generic super frame structure is as follows:

| DGK | OUI | DID | FP | SID | PLS |
|-----|-----|-----|----|-----|-----|

The send structure of an unconnected device has the following fields completed:

| DGK | OUI | DID | FP | SID | PLS |
|-----|-----|-----|----|-----|-----|
| X | X | X | X | _ | X |

The received structure for an unconnected device has the following fields completed:

| DGK | OUI | DID | FP | SID | PLS |
|-----|-----|-----|----|-----|-----|
| X | X | X | X | X | X |

Once the device receives a complete super frame structure (has been assigned a session ID) it is considered connected and can begin transmitting data frames.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that the device allocate the session ID and make the acknowledgement optional so devices can be 'fire and forget'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you give one-shot transmissions a different DGK value the optional ack is implied and session ID (which should probably be the last field) can be ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your definition of session ID is different then what I was understanding previously from @Vagabond

My previous understand, you might "connect" and create a session ID once a day let's say, and then use that key to send every packet during the day, with or without ACKs.

From what you're writing, it sounds like anything that's not fire and forget requires a unique session key which expires after the payload is sent. Which sounds expensive to me...


#### Connection Frame

Before sending a payload, a device must broadcast identifying information and receive confirmation from a router via hotspot that it is ready to receive data. The call/response for this described above again is:

*Call*

| DGK | OUI | DID | FP | SID | PLS |
|-----|-----|-----|----|-----|-----|
| X | X | X | X | _ | X |

*Response*

| DGK | OUI | DID | FP | SID | PLS |
|-----|-----|-----|----|-----|-----|
| X | X | X | X | X | X |

A completed response indicates that a receiver is in range and a router is capable of receiving/forwarding data to a desired endpoint.

#### Data Frame

Once a connection has been established, the device, referred to as sender, will transmit an upper bounded number of packets to the receiver. The added fields are: Seed 1 (S1), Seed 2 (S2), Payload (PL). The general structure of the data frame is as follows:

*Call*

| DGK | OUI | DID | FP | S1 | S2 | PL |
|-----|-----|-----|----|----|----|----|

Once the router has successfully received the entire message (one or many transmissions by the sender), the device will receive a response data frame. Added fields are: Acknowledge (ACK). The response data frame will have the following structure:

*Response*

| DGK | OUI | DID | FP | ACK |
|-----|-----|-----|----|-----|

> TODO:
> - Detail information on when ACK's can occur may be needed. Alternatively, should it be excluded for brevity in this doc?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think ACKs should be excluded from this document. IoT developers want some guidance on what to expect from a LongFi router.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can include it from the prior doc then.

> - Size of fields to be included
> - S1/S1 may be converted to sequence number?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are S1 and S2 simply sequence numbers? If so then why are there two for each "Call" datagram? I thought LDPC didn't require sequence numbers to distinguish between droplets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No S1 and S2 are seed numbers used for the fountain code process.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know they're a pain but UML sequence diagrams might help others visualize session message flow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah....

I'll get to it then

@vagabond

> TODO:
> - diffie hellman?
> - how long do sessions live for?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember @refugeesus telling me that an LDPC decoder can determine how much of a message has been received. A device can decide to terminate a session and start a new one when a download stops making progress.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see second standard deviation comment below.

> - explanations of each field

---
**DGK**

Datagram kind. A tag value indicating this datagram's variant type.

> **TODO:**
> - How many variants are needed?
Copy link
Contributor

@fvasquez fvasquez Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the lowest level, a sensor can be thought of as a collection of registers that you can read from and write to. So I would start with Read and Write variants. Request and Response variants can be used to describe uplink and downlink communications. A short list of variants that satisfies these basic protocol requirements would include ReadRequest, WriteRequest, ReadResponse and WriteResponse DGK types. A WriteResponse is always preceded by a WriteRequest but a ReadResponse can be unsolicited for things like uplink sensor status values. Alternatively, you could define a fifth type like Status for unsolicited ReadResponse[s] if you do not want to overload the meaning of ReadResponse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that is the kind of Type that should be here as it seems more relevant to Payload contents than Datagram function.

Copy link
Contributor

@fvasquez fvasquez Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that is the kind of Type that should be here as it seems more relevant to Payload contents than Datagram function.

Fair enough. At the very least I think the protocol needs some way to distinguish between uplink and downlink datagrams so the network knows how to route them. That may not need to be a type encoded in the datagram. Could simply be whether the packet originated from the internet or a radio receiver.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, distinguishing uplink/downlink and/or first fragment vs not-first fragments would be good Datagram kinds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to order fragments anymore.

Why do you need to distinguish between uplink/downlink? A device will have to query if data is available before receiving some payload. If the query returns true, the device inherently knows it is a receiver and the gateway the sender. Otherwise the direction is implied to be the other way. The only other bi-directional communication is joining acknowledgement and a full message received acknowledgement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to order fragments anymore.

I don't think there's enough information in the spec to state this.

Why do you need to distinguish between uplink/downlink?

There is, again, not enough information to indicate that uplink/downlink could be implied.

I only propose those as potential types of Datagram kind. As the spec gets fleshed out, perhaps Datagram kind can be omitted as an explicit field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to order fragments anymore.

Fountain codes remove this problem.

Why do you need to distinguish between uplink/downlink?

DGK is included in generic format for now.

> - Can this tag serve both versioning and variant disambiguation?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think versioning should be separate from DGK. That simplifies variant encoding and version parsing.

> - Is this enough?

---
**OUI**

Organizationally unique identifier. A globally unique number which hotspots use to forward datagrams to the correct organization's router.

---
**DID**

Device identifier. DIDs are assigned to devices by organizations. Every hardware device in an organization _should_ have a unique DID, but sharing DIDs is not forbidden.
Copy link
Contributor

@fvasquez fvasquez Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So an OUI combined with a DID constitute a unique device address on the network?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my thinking


> TODO:
> - Can DIDs really be shared?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From our perspective, we don't care if organizations want to do weird things with DIDs.


---
**PAY**

Datagram payload. Payload lengths depend on spreading and coding, but on their actual content. Payloads are intended to be encrypted and opaque to hotspots.
Copy link
Contributor

@fvasquez fvasquez Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to say "Payload lengths [do not] depend on spreading and coding ..."?

Copy link
Contributor Author

@JayKickliter JayKickliter Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but not


> **TODO:**
> - ~indended~ required to be encrypted?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we don't care what organizations want to do. They probably should encrypt the information but it's not required. Imagine open networks of weather stations, etc.


---
**FP**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously called MAC/HMAC. I like Fingerprint better because it's more expressive, but I know the blog post by Dal and perhaps the white paper make reference to MAC.


Fingerprint. Packet brokerage between hotspots and routers depend on fingerprints. They allow a hotspot to prove to a router the hotspot has a datagram destined for that router, without divulging the datagram's payload. This ability is core to hotspots earning data credits for forwarding datagrams.
Copy link
Contributor

@fvasquez fvasquez Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Packet brokerage gets complicated. Do we reward hotspots for forwarding duplicate packets? Do hotspots stop forwarding packets they aren't being rewarded for? I don't know the answers to these questions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no hardware link layer session, it seems you can either pay for packets until they are no longer needed, or drop duplicates and make gateways eat the cost of doing the work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the model is:

  • hotspots presents fingerpint (MAC) + device ID pair
  • router says, gimme the payload
  • hotspot delivers payload
  • router is expected to burn DCs on behalf of router

The interaction is supposed to happen quickly and with a small amount of trust; ie: the hotspot does not wait to see the transaction post before delivering the payload. The theory is that if a router continuously stiff the hotspot, the hotspot will blacklist said router.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The theory is that if a router continuously stiff the hotspot, the hotspot will blacklist said router.

Works for me but I don't think the blacklist period should be permanent. A hotspot should eventually give a router another chance to pay in the event that router's favorite hotspot goes down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A hotspot should eventually give a router another chance to pay in the event that router's favorite hotspot goes down.

I think the key thing is that the router must say "i like that fingerprint, give me the payload" if and only if it intends on burning a DC. Just providing the fingerprint and the router ignoring or NAK'ing should not be what makes a hotspot blacklist the router.

In the case where a router has said "I like that fingerprint give me the payload" and repeatedly doesn't pay, this is pretty nefarious behavior and I think a very long or permanent blacklisting of that hotspot is in order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where a router has said "I like that fingerprint give me the payload" and repeatedly doesn't pay, this is pretty nefarious behavior and I think a very long or permanent blacklisting of that hotspot is in order.

Agreed. That is definitely bad behavior on the part of a router. So a router won't ask a hotspot for a packet payload if it already just received a packet from that same device via a different hotspot? I'm good with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently do not have any method to atomically swap credits for packets via the blockchain, so this is still all up in the air and might be unspecified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was atomic swap is just unfeasible due to latencies, so it would be more of an untrusted microtransaction with eventual settlement.


> **TODO:**
> - What data are fingerprints derived from?
> - Are they SHA or something more exotic?
> - Is the fingerprint, along with non-payload data, a
> zero-knowledge-proof (ZKP)?


### Uplink
[uplink]: #uplink

Uplink communication is from device to router.

> TODO:
> - Unacknowledged vs acknowledged
> - Listen before talk
> - Spreading-factor vs dwell-time vs range
> - Initial uplink spreading factor

### Downlink
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have two categories of devices?

Category Sync: downlink triggered only by uplink packets
Category Async: devices are always on and ready to receive downlink

Or do we say everything is "sync" but that some devices have a long dwell time waiting for downlink packets. If they pulse out every 6 hours and wait for 6 hours on some channel, they're pretty much always on and this might allow for more dynamic behavior (whereas "device category" feels pretty static).

Copy link
Contributor

@fvasquez fvasquez Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. The expectation that a device is "always on" seems unreasonable to me. Most downlink communications are what I call WriteRequests and hence synchronous in nature because a corresponding WriteResponse is expected by the router. I did not consider how downlink communications would be triggered by uplink packets but that makes sense in an environment where devices sleep most of the time to conserve battery power.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downlink communication will have to be initiated by a query from the device to router due to the class of device we deal with. Devices that are not dependent upon power conservation and network models which have stronger device-to-gateway association (sessions and handoff) are capable of asynchronously receiving in practice. We can't guarantee the former and explicitly do not have the latter for our devices.

[downlink]: #downlink

Downlink communication is from router to device.

> TODO:
> - Unacknowledged vs acknowledged
> - Listen before talk
> - Spreading-factor vs dwell-time vs range
> - Initial uplink spreading factor

### Fragmentation
[fragmentation]: #fragmentation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what @refugeesus described to me in our meeting last week I believe the LDPC layer handles all message fragmentation, reassembly and retransmission.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fragmentation, reassembly, and FEC are covered by the process linked in a previous comment. No need for retransmission anymore. The paper excerpt also covers the ACK and rolling window timeout concepts needed to make the link layer mostly complete.


Datagrams are the fundamental unit of messaging in LongFi. Additionally, they have regulatory imposed maximum payload sizes. This poses a problem for applications needing to send data of arbitrary length. The solution to this is problem is fragmentation. Fragmentation is the process of decomposing large application-level messages into several datagrams and reassembling those fragments at the recipient's end of the link. A naive implementation of this process is fraught with peril when communicating over unreliable links.

### Channels
[channels]: #channels