# IPv6 topology measurements: BGP target lists vs. hitlist target lists using RIPE Atlas

RIPE Atlas runs IPv6 "topology" measurements from all RIPE Atlas probes using a target list populated from a recent full BGP table. These measurements target only the `::1` address in each announced prefix.

The [TUM IPv6 hitlist](https://ipv6hitlist.github.io/) is an attempt to gather useful targets for IPv6 network measurement. All RIPE Atlas probes also run measurements to targets taken from that hitlist.

This notebook presents a brief, first look at the difference between these two datasets.

## RIPE Atlas measurement specifications

RIPE Atlas probes each operate [built-in measurements](https://atlas.ripe.net/docs/built-in/). The IPv6 topology built-in measurements are:

* [measurement 6052](https://atlas.ripe.net/measurements/6052/): traceroute with UDP outbound
* [measurement 6152](https://atlas.ripe.net/measurements/6152/): traceroute with ICMP outbound

Some time back I set up all-probe measurements with the intent of using hitlist targets. These are:

* [measurement 24304870](https://atlas.ripe.net/measurements/24304870/): traceroute with UDP outbound
* [measurement 24304869](https://atlas.ripe.net/measurements/24304869/): traceroute with ICMP outbound

I'll typically refer to these as the "BGP" measurements and the "hitlist" measurements.

## Measurement operation and cycles

RIPE Atlas probes can be provided with either a name ("resolve on probe") or an IP address to target; if they are provided with a name as part of a recurring measurement, they'll resolve that name each time. RIPE Atlas probes cannot be provided directly with a list of targets.

These four measurements make use of two DNS names that iterate in a simple round-robin over whatever list of IP addresses are provided. These lists are updated daily. Each successful resolution will be followed by a traceroute measurement to that IP.

The upside of this approach is simplicity: there is little coordination required, multiple targets are reached. The downside is that the probes cannot control what IP addresses they receive, and the DNS service is not designed to provide per-probe lists. (This isn't infeasible, but the service isn't designed this way.)


### DNS resolution & traceroute

The two pairs of measurements (BGP vs. hitlist) operate at different speeds: the rationale for this is that the BGP measurements run *extremely* slowly from the perspective of one probe. The hitlist measurements still run slowly, but less so.

The pacing of each is:

* BGP topo runs one traceroute every 15 minutes, for each of UDP & ICMP.

That is, each probe will likely measure 4 distinct targets per hour over UDP, and 4 distinct targets per hour over ICMP, totalling 192 targets in 24 hours. (Note: there is no guarantee the probe won't receive the same target multiple times; the DNS service will loop over the list multiple times in a day, but in general each probe will receive 192 distinct targets.)

* hitlist topo runs one traceroute every 2 minutes, for each of UDP & ICMP.

That is, 30 distinct targets per hour over UDP, and 30 distinct targets over ICMP, meaning 1,440 distinct targets per day.

That's a 7.5x difference in *rate* that ought to be taken into account later for a direct result comparison.

## Filtering the hitlist

The TUM hitlist is accumulating a *lot* of addresses in a small number of ASNs, which I'm filtering for the purposes of RIPE Atlas being able to exhaust the target list based on the pacing above. An example of the list of heavy-hitters in the hitlist in mid-January 2022 [is here](https://gist.github.com/sdstrowes/165b5c62e8078bc70a96bff4300e60f0).

To work around this, I minimise the hitlist by taking the first 128 addresses per ASN (according to a BGP match against a RIS BGP table from the same date). This reduces the list of targets to approximately 750,000.

### 0: notebook preamble

These things need to run only to make the queries in the notebook work.

In [None]:
 from google.colab import auth
auth.authenticate_user()
print('Authenticated')


Authenticated


In [None]:
measurements = [6052, 6152, 24304869, 24304870]

# running totals: targets attempted

In [None]:
%%bigquery target_counter --project netsys-162413

with classified as (
    select * except(msm_id),
    case
    when (msm_id = 6052 or msm_id = 6152) then
        case
        when protoc = "UDP"  then "BGP targets (UDP)"
        when protoc = "ICMP" then "BGP targets (ICMP)"
        end
    when (msm_id = 24304870 or msm_id = 24304869) then
        case
        when protoc = "UDP"  then "hitlist targets (UDP)"
        when protoc = "ICMP" then "hitlist targets (ICMP)"
        end
    end as msm_label,
  from `bq-test-237918.v6topo.data`
  where start_time >= "2022-02-02 03:00:00" and start_time < "2022-02-04 03:00:00"
),

data as (
  select msm_label,
    min(timestamp_trunc(start_time, minute)) start_time, dst_addr
  from classified
  group by msm_label, dst_addr
),

counted as (
  select msm_label, start_time, count(*) c
  from data
  group by msm_label, start_time
)

select *, SUM(c)
OVER
  (PARTITION BY msm_label
  ORDER BY start_time asc) AS running_c
from counted

In [None]:
%%bigquery combined_target_counter --project netsys-162413

with classified as (
  select * except(msm_id),
    case
      when (msm_id = 6052 or msm_id = 6152)         then "BGP targets (combined)"
      when (msm_id = 24304869 or msm_id = 24304870) then "hitlist targets (combined)"
    end as msm_label
  from `bq-test-237918.v6topo.data`
  where start_time >= "2022-02-02 03:00:00" and start_time < "2022-02-04 03:00:00"
),

data as (
  select msm_label, min(timestamp_trunc(start_time, minute)) start_time, dst_addr
  from classified
  group by msm_label, dst_addr
  order by start_time
),

counted as (
  select msm_label, start_time, count(*) c
  from data
  group by msm_label, start_time
)

select *, SUM(c)
OVER
  (PARTITION BY msm_label
  ORDER BY start_time asc) AS running_c
from counted

In [None]:
#@title 
import plotly.express as px
import pandas as pd

data = pd.concat([target_counter, combined_target_counter])

ordered = pd.DataFrame()
ordered = ordered.append(data[data["msm_label"] == "hitlist targets (combined)"])
ordered = ordered.append(data[data["msm_label"] == "hitlist targets (ICMP)"])
ordered = ordered.append(data[data["msm_label"] == "hitlist targets (UDP)"])
ordered = ordered.append(data[data["msm_label"] == "BGP targets (combined)"])
ordered = ordered.append(data[data["msm_label"] == "BGP targets (ICMP)"])
ordered = ordered.append(data[data["msm_label"] == "BGP targets (UDP)"])

fig = px.line(ordered, x="start_time", y="running_c", color='msm_label')


fig.show()



## responsive hops (cumulative, by RIPE Atlas measurement ID)

In [None]:
%%bigquery responsive_hop_counter --project netsys-162413

with classified as (
    select * except(msm_id),
    case
    when (msm_id = 6052 or msm_id = 6152) then
        case
        when protoc = "UDP"  then "BGP targets (UDP)"
        when protoc = "ICMP" then "BGP targets (ICMP)"
        end
    when (msm_id = 24304870 or msm_id = 24304869) then
        case
        when protoc = "UDP"  then "hitlist targets (UDP)"
        when protoc = "ICMP" then "hitlist targets (ICMP)"
        end
    end as msm_label,
  from `bq-test-237918.v6topo.data`
  where start_time >= "2022-02-02 03:00:00" and start_time < "2022-02-04 03:00:00"
),

data as (
  select msm_label, min(timestamp_trunc(start_time, minute)) start_time, hop_addr
  from classified, unnest(hops) h
  group by msm_label, hop_addr
),

counted as (
  select msm_label, start_time, count(*) c
  from data
  group by msm_label, start_time
)

select *, SUM(c)
OVER
  (PARTITION BY msm_label
  ORDER BY start_time asc) AS running_c
from counted

In [None]:
%%bigquery combined_responsive_hop_counter --project netsys-162413

with classified as (
  select * except(msm_id),
    case
      when (msm_id = 6052 or msm_id = 6152)         then "BGP targets (combined)"
      when (msm_id = 24304869 or msm_id = 24304870) then "hitlist targets (combined)"
    end as msm_label
  from `bq-test-237918.v6topo.data`
  where start_time >= "2022-02-02 03:00:00" and start_time < "2022-02-04 03:00:00"
),

data as (
  select msm_label, min(timestamp_trunc(start_time, minute)) start_time, hop_addr
  from classified, unnest(hops) h
  group by msm_label, hop_addr
),

counted as (
  select msm_label, start_time, count(*) c
  from data
  group by msm_label, start_time
)

select *, SUM(c)
OVER
  (PARTITION BY msm_label
  ORDER BY start_time asc) AS running_c
from counted

In [None]:
#@title
import plotly.express as px
import pandas as pd

data = pd.concat([responsive_hop_counter, combined_responsive_hop_counter])

ordered = pd.DataFrame()
ordered = ordered.append(data[data["msm_label"] == "hitlist targets (combined)"])
ordered = ordered.append(data[data["msm_label"] == "hitlist targets (ICMP)"])
ordered = ordered.append(data[data["msm_label"] == "hitlist targets (UDP)"])
ordered = ordered.append(data[data["msm_label"] == "BGP targets (combined)"])
ordered = ordered.append(data[data["msm_label"] == "BGP targets (ICMP)"])
ordered = ordered.append(data[data["msm_label"] == "BGP targets (UDP)"])

fig = px.line(ordered, x="start_time", y="running_c", color='msm_label')

fig.show()