# XDP experiments

This notebook groups experiments related to XDP in the context of stable kernel bypass.

## XDP POLL

### XDP Polling Overwrite-Apocalypse

#### Context
While executing the pingpong test with `pp_poll`, most of the times the following error messages would be shown, sooner or later:

In [72]:
!head -n 20 data/xdp/overwrite_apocalypse/err

WARN: missed 64 packets between 1278788 and 1278853
WARN: missed -16 packets between 1278855 and 1278840
WARN: missed -15 packets between 1278855 and 1278841
WARN: missed -14 packets between 1278855 and 1278842
WARN: missed -13 packets between 1278855 and 1278843
WARN: missed -12 packets between 1278855 and 1278844
WARN: missed -11 packets between 1278855 and 1278845
WARN: missed -10 packets between 1278855 and 1278846
WARN: missed -9 packets between 1278855 and 1278847
WARN: missed -8 packets between 1278855 and 1278848
WARN: missed -7 packets between 1278855 and 1278849
WARN: missed -6 packets between 1278855 and 1278850
WARN: missed -5 packets between 1278855 and 1278851
WARN: missed -4 packets between 1278855 and 1278852
WARN: missed 13 packets between 1278855 and 1278869
WARN: missed -16 packets between 1278871 and 1278856
WARN: missed -15 packets between 1278871 and 1278857
WARN: missed -14 packets between 1278871 and 1278858
WARN: missed -13 packets between 1278871 and 1278859
W

This behaviour would often result in either a large packet loss or, at least, in a out-of-order reception of the packets.
It was empirically observed that the faster the packets, the faster and more likely the problem would happen.

#### Environment
|  |  | 
| --- | --- | 
| **Experiment date** |  22/02/2024 - 23/02/2024 |
| **Code** |  [3562befa5052519193277147264496251f4ce7de](https://github.com/swystems/det-bypass/tree/3562befa5052519193277147264496251f4ce7de) |
| **Machine** |  xl170 (Cloudlab) |
| **Isolation** |  none |
| **Artifical machine workload** |  none |
| **Pingpong send interval** |  1000ns |
| **eBPF map size** |  16 payloads |

#### Procedure
The main investigation tool for such a problem was adding a thread that would constantly dump the map. This way, at any time, it would be possible to analyze what was inside the map and what was the userspace app reading.

#### Analysis
Running the experiment with such a debugging tool gives the following output:

In [73]:
!head -n 50 data/xdp/overwrite_apocalypse/out

  EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY   
                                        ^^^^^^^^^                                                                                                               
  EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY   
                                        ^^^^^^^^^                                                                                                               
  EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY   
                                        ^^^^^^^^^                                                                                                               
  EMPTY     EMPTY     EMPTY     EM

In this output, `EMPTY` refers to an empty cell, `(x)` refers to a cell containing the payload with id `x` and `^^^^^^^^` is the position the userspace app is currently polling.
Note that the currently-polled positions are flagged as `EMPTY` because the "dumping" thread constantly goes through the map; therefore, the busy-wait will be much faster at reading and erasing that value compared to the dumping thread, making it very unlikely for it to read the available packet before it is actually read.

It can be noticed that every time the problem occurs, it starts with the user-space app blocked: even if packet `1278773` is recognized as available, the user-space does not handle it. At a random point, the app decides to actually handle the packet and, since there are many available, it handles them all very quickly (the polling app is much faster at reading than the XDP program is at writing, because `interval between polling < send interval`).

When XDP receives a new packet, but in its place in the queue there is an old unread packet, it drops the new one. For this reason packet `1278773` is not overridden by `1278788`: as shown in the error messages, 64 packets were actually lost, because the "block" in the polling gave XDP time to receive 64 packets that it had to drop because the queue was full already. 

After this problem, a repeated pattern of new-packets-old-packets starts because packet `1278840` is written 3 position ahead of the userspace polling index; 
userspace will then think that there is no new packet, and will keep waiting in its position.

In [74]:
!sed -n "45,175p" data/xdp/overwrite_apocalypse/out

  EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY   (1278840)   EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY   
                                        ^^^^^^^^^                                                                                                               
  EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY   (1278840)   EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY   
                                        ^^^^^^^^^                                                                                                               
  EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY   (1278840)   EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY     EMPTY   
                                        ^^^^^^^^^                                                                                                               
  EMPTY     EMPTY     EMPTY     EM

Only when XDP writes packet `1278853` at the polling index, the app will finally resume the poll (new packets), until it finds the old packets (`1278840`-`1278852`), which will be read very fast since they are available already.

After all the old packets are read, the app will wait again, in the same position as before, and XDP will re-write in its advanced position (in the last rows, packet `1278856`), repeating the same pattern over and over.

This cycles often repeat until the end of the experiments. However, it can happen that they solve themselves, like in this example.

The window of empty packets, which is creating this effect, can become larger and larger until it covers the whole map and restores the original and correct behaviour. In the example above, the initial window size is 3, because the XDP write index is 3 steps in front of the poll index, leaving a gap of 3 empty cells.

Arbitrarily, the window can happen to expand: if XDP goes all around, fills the window but userspace happens to be too slow (for any reason) and does not read some packets, XDP will find thos slots full and would proceed. This leads to a loss of that number of packets but also of an expansion of the window of that same amount.

Take the following section of the output file as an example:

In [75]:
!sed -n "45670,45730p" data/xdp/overwrite_apocalypse/out

                                        ^^^^^^^^^                                                                                                               
(1279873) (1279874)   EMPTY     EMPTY     EMPTY     EMPTY     EMPTY   (1279864) (1279865) (1279866) (1279867) (1279868) (1279869) (1279870) (1279871) (1279872) 
                                        ^^^^^^^^^                                                                                                               
(1279873) (1279874)   EMPTY     EMPTY     EMPTY     EMPTY     EMPTY   (1279864) (1279865) (1279866) (1279867) (1279868) (1279869) (1279870) (1279871) (1279872) 
                                        ^^^^^^^^^                                                                                                               
(1279873) (1279874)   EMPTY     EMPTY     EMPTY     EMPTY     EMPTY   (1279864) (1279865) (1279866) (1279867) (1279868) (1279869) (1279870) (1279871) (1279872) 
                                  

Here, the userspace app is stuck on index 4, while XDP started writing from index 7 (just like the previous case). However, after packet `1279877` unlocks the polling, it proceeds too slowly, giving time to XDP to try writing packet `1279880` to slot 7; however, this time, userspace has not read yet the packet in that position. XDP therefore drops the packet, goes to next position with packet `1279881`; this time, the userspace app had started reading; XDP finds an empty slots and puts the new packet there. 

After this process, also position 7 is empty, widening the window. If this happens over and over, the window will grow enough to cover the whole queue, basically resuming a normal behaviour. 

#### Considerations
After the analysis, it is clear that this problem happens in two steps:
- The userspace app blocks
- XDP fills the queue and then starts writing ahead of userspace

The reason for the latter can be simply explained by looking at the code:
```c
    bpf_spin_lock (&lock->value);
    __u32 key = lock->index;
    lock->index = (lock->index + 1) % PACKETS_MAP_SIZE;
    bpf_spin_unlock (&lock->value);

    struct pingpong_payload *old_payload = bpf_map_lookup_elem (&last_payload, &key);
    if (!old_payload)
    {
        bpf_printk ("Failed to lookup element at index: %D\n", key);
        return -1;
    }

    if (valid_pingpong_payload (old_payload))
    {
        return -1;
    }
```
`lock->index` is the index where to write the new packet: it is first read (where to put the current packet) and then increment (where to put the next one). When in the current slot a valid packet is present already, the function simply exits, dropping the packet; however, the position is never reset: if a packet is dropped in position 5, the next packet should be written to position 5 again, not position 6. 

This is the simple cause of the problem: when the queue is full, XDP finds valid pingpong payloads and drops the new packets; however, the position is always increment and, when empty slots are available because userspace managed to recover, XDP will write the packets at any position it currently is, which is pretty much arbitrary. In the previous example, it is 3 positions ahead of userspace.

The code was corrected to make sure that dropped packets do not increment the position of the next packet:
```c
    bpf_spin_lock (&lock->value);
    __u32 key = lock->index;
    lock->index = (lock->index + 1) % PACKETS_MAP_SIZE;
    bpf_spin_unlock (&lock->value);

    struct pingpong_payload *old_payload = bpf_map_lookup_elem (&last_payload, &key);
    if (!old_payload)
    {
        bpf_printk ("Failed to lookup element at index: %D\n", key);
        return -1;
    }

    if (valid_pingpong_payload (old_payload))
    {
        /*
         * If there is already a packet at the current index, it means that the map is full.
         * Drop the packet and reset the next index to the current one.
         *
         * TODO: Fix the data race in the reset of the index
         */
        bpf_spin_lock (&lock->value);
        lock->index = key;
        bpf_spin_unlock (&lock->value);
        return -1;
    }
```
After this update, there is no more unexpected behaviour and the problem is actually gone: still happens that packets are lost because of the blocking of userspace polling, but at least there is no "Apocalypse" after that.

The reason for the blocking of userspace are hard to identify: it is very likely that the reason is some overhead by the operating system (e.g. context switch) that stops the polling thread from actually polling. It will be likely check in future stability tests by tweaking kernel parameters.