BISmark Active Dataset Description

Nick Feamster edited this page Aug 5, 2015 · 3 revisions

BISmark Dataset - Active Measurements

BISmark active measurements are a set of automated commands periodically executed by BISmark routers to generate network performance statistics, currently, for the following type of measurements:

  • Download Throughput
  • Upload Throughput
  • Round Trip Latency
  • Last Mile Latency
  • Shape Rate
  • Under Load
  • Traceroute

Standard linux commands such as ping, fping, paris-traceroute, netperf, ditg and shaperprobe generate compressed xml data available for public download from our dataset repository. The dataset feeds the BISmark Network Dashboard displaying histograms per device, ISP, or country. Most of the measurements are taken against the MLab servers. A typical day of BISmark active measurements will run approximately 144 times a day (once every 10 minutes) generating the following number of xml lines per tool:

378 DITG, 2341 FPING, 448 HOST, 1150 paristraceroute, 28 NETPERF, 313 PING

The frequency of measurements can be changed on demand using bismark-active.conf configuration file (router's file system).

XML Format

Each XML file accumulates multiple measurements as shown below. The main xml tag <measurements ..> opens the set of tags with test results for each tool and parameter <measurement param="..." tool="...". The version="1.3" is used for our parser server to enable the appropriated parser version for handling the data contained in the file. Syntactical xml errors only invalidate the lines where they are found, not the entire XML document.

Header:

 <?xml version="3.0" encoding="UTF-8" standalone="yes"?>
     <measurements version="1.3">
        <info deviceid="OWA81AXXXXXXXX" />

Content:

  ...
  <measurement param="LMRTT" tool="PING" srcip="XX.XX.XX.XX" dstip="XX.XX.XX.XX" timestamp="1409703495" avg="7.286200" std="2.578413" min="4.073000" max="11.356000" med="7.996500" iqr="4.247000"  />
  <measurement param="BITRATE" tool="NETPERF_1" srcip="XX.XX.XX.XX" dstip="XX.XX.XX.XX" timestamp="1409703533" avg="13565.300000" std="0" min="13565.300000" max="13565.300000" med="13565.300000" iqr="0" direction="dw" />
  ...

As mentioned before, although new active measurements can have slightly different formats adding different parameters, our parsing tools take care of feeding or databases whenever this generic template is followed:

  <measurement param="" tool="" srcip="XX.XX.XX.XX" dstip="XX.XX.XX.XX" timestamp="" avg="" std="" min="" max="" med="" iqr=""  />

Footer:

  </measurements>

In order to have any new tool data being automatically parsed by our servers, this template has to be adopted. Otherwise, the data will be ignored.
param - tool parameter;
tool(id) - linux tool in use;
srcip - source ipv4 address, non-anonymous;
dstip - destination ipv4 address, non-anonymous;
timestamp - epoch format timestamp;
avg/max/min/std/iqr - average/maximum/minimum/standard deviation/inter quartile range readings;
direction - download or upload directions;

XMLs may repeat avg values for min/max whenever these values are unavailable. Zeroed values are also an option considering the absence of data.

Download Throughput


Figure 1. Download Throughput

Download Throughput is the rate at which content is transferred from the Internet to your home, and is what most people mean when they talk about the "speed" of Internet access. BISmark performs two slightly different measurements to measure download throughput:

Single-threaded TCP is the performance of a single data transfer, and approximates the performance you would normally observe from your Internet service (e.g. when downloading a single file).

Multi-threaded TCP is the performance of several simultaneous data transfers, and approaches the maximum performance you can hope to observe from your Internet service (e.g. aggregate speed when downloading multiple files simultaneously). A small difference is expected between the two numbers. If the difference is large, something may be wrong with your connection or our measurement server, or you may be very far away from our nearest measurement server.

XML entry

Below a typical entry for a Download throughput xml entry. NETPERF_1 is the mode where netperf tool was executed, in this case, single threaded. NETPERF_2 is used for multi-threaded execution.

     <measurement param="BITRATE" tool="NETPERF_1" srcip="XX.XX.XX.XX" dstip="XX.XX.XX.XX" timestamp="1409703533" avg="13565.300000" std="0" min="13565.300000" max="13565.300000" med="13565.300000" iqr="0" direction="dw" />

Upload Throughput


Figure 2. Upload Throughput view

Upload Throughput is the rate at which content is transferred from your home to the Internet. While upload throughput is often less than download throughput for most types of residential Internet service, it is still important for users who frequently upload large photos, videos, or work documents. BISmark performs two slightly different measurements to measure upload throughput:

Single-threaded TCP is the performance of a single data transfer, and approximates the performance you would normally observe from your Internet service (e.g. when uploading a single file). Multi-threaded TCP is the performance of several simultaneous data transfers, and approaches the maximum performance you can hope to observe from your Internet service (e.g. aggregate speed when uploading multiple files simultaneously). A small difference is expected between the two numbers. If the difference is large, something may be wrong with your connection or our measurement server, or you may be very far away from our nearest measurement server.

XML entry

A upload throughput measurements follows the same format as the download. The only differences is that the upload has a netperf tool parameterizing with NETPERF_3 and NETPERF_4 for upload single thread and multi-thread respectively, as well as direction set to "up".

     <measurement param="BITRATE" tool="NETPERF_3" srcip="XX.XX.XX.XX" dstip="XX.XX.XX.XX" timestamp="1409703590" avg="6783.600000" std="0" min="6783.600000" max="6783.600000" med="13565.300000" iqr="0" direction="up" />

Round Trip Latency


Figure 3. Round Trip Latency view

Round Trip Latency is the amount of time it takes for a message from your home to reach a particular server on the Internet and then return to your home. In general, lower latency is better, and low latency is especially important for Internet telephone (VoIP) and video calls, gaming, and video streaming. High latency also affects the performance of most web traffic.

Latency increases with geographical distance, which means that latencies between points across the United States or between Europe and the U.S. are expected to be high (more than 100 milliseconds). If no server in the plot below has a round trip latency of less than 50 milliseconds, it may mean that you are far away from the nearest measurement server, or that your Internet connection has high latency. To verify if it is your Internet connection, look at Last mile latency too.

XML entry

The fping is used for round-trip time latency measurements. The xml entry expresses the ICMP round-trip times for router to key MLab servers communication. Servers are located on different continents.

    <measurement param="RTT" tool="FPING" srcip="XX.XX.XX.XX" dstip="XX.XX.XX.XX" timestamp="1412644339" avg="144.600000" std="17.037214" min="127.000000" max="182.000000" med="143.000000" iqr="20.750000"  />

Last Mile Latency


Figure 4. Last Mile Latency view

Last Mile Latency is the amount of time it takes for a message from your home to reach your Internet service provider's (ISP's) network and then return to your home. As this delay is usually introduced by the length of cable and other equipment between your home and your ISP, it is often called the last mile latency.

As with round trip latency, high last mile latency negatively affects the performance of many common Internet applications. Last mile latency is particularly important because all traffic that travels between your home and the Internet experiences at least this much delay.

XML entry

Last mile latency uses the standard ping tool for measuring the latency of communication between the router and the first ISP's gateway serving the user.

     <measurement param="LMRTT" tool="PING" srcip="XX.XX.XX.XX" dstip="XX.XX.XX.XX" timestamp="1412718759" avg="4.267900" std="5.078689" min="1.554000" max="18.538000" med="2.595500" iqr="0.787750"  />

Shape Rate


Figure 5. Shape Rate view
Shape rate measurements or rate limit detection are taken by the ShapeProbe tool. It actively generates general UDP data in order to detect traffic degradation by ISPs. For both upload and download ShapeProbe shows the expected internet link capacity and the estimated link degradation in Mbits/sec.

XML entry

The shapeprobe measurements provide 4 xml entries. Two for estimated link capacity, both upload and download. Other two for estimated throughput degradation detection. The traffic is based on random content payloads over UDP.

     <measurement param=CAPACITY tool=SP srcip=XX.XX.XX.XX dstip=XX.XX.XX.XX timestamp=1409703536 avg=2880 std=0 min=2880 max=2880 med=2880 iqr=0 direction="up" />
     <measurement param=CAPACITY tool=SP srcip=XX.XX.XX.XX dstip=XX.XX.XX.XX timestamp=1409703536 avg=14901 std=0 min=14901 max=14901 med=14901 iqr=0 direction="dw" />
     <measurement param=SHAPERATE tool=SP srcip=XX.XX.XX.XX dstip=XX.XX.XX.XX timestamp=1409703536 avg=0 std=0 min=0 max=0 med=0 iqr=0 direction="up" />
     <measurement param=SHAPERATE tool=SP srcip=XX.XX.XX.XX dstip=XX.XX.XX.XX timestamp=1409703536 avg=0 std=0 min=0 max=0 med=0 iqr=0 direction="dw" />

Traceroutes

For the active traceroute measurements BISmark uses the paris-traceroute tool. There are 2 types of traceroute measurements being taken: The forward traceroute and the reverse traceroute. Currently, there are no networkdashboard charts being generated with the traceroute data.

Forward Paris-Traceroute For the forward paris traceroute BISmark takes two measurements every 30 minutes. The first one uses a static list of selected ipv4 addresses to serve as a target for the paris-traceroute. For the second one, the nearest MLab server is calculated first and subsequently used as a target for the measurement.

California 128.48.110.150
Italy 143.225.229.254
ATL 4.71.254.166
LAX 38.98.51.12
AMS 213.244.128.169
JNB 196.24.45.146
NBO 197.136.0.108
HND 203.178.130.210
SYD 175.45.79.44
IIT 180.149.52.20
BRAZIL 143.54.2.20
GOOG 8.8.8.8
TUN 41.231.21.44

The generated data entry follows the format below:

    <traceroute srcip="X.X.X.X" dstip="128.48.110.150" timestamp="1432089154" hops="15" direction="up" toolid="paristraceroute">
            <hop id="1" ip="192.168.X.X" rtt="1.047" />
            <hop id="2" ip="X.X.X.121" rtt="9.602" />
            <hop id="3" ip="X.X.X.118" rtt="10.024" />
            <hop id="4" ip="X.X.0.235" rtt="9.414" />
            <hop id="5" ip="Y.X.Y.229" rtt="9.972" />
            <hop id="6" ip="Y.Y.Y.13" rtt="19.758" />
            <hop id="7" ip="Y.Y.50.201" rtt="19.795" />
            <hop id="8" ip="Y.Y.15.246" rtt="19.275" />
            <hop id="9" ip="*" rtt="" />
            <hop id="10" ip="*" rtt="" />
            <hop id="11" ip="Z.Z.122.46" rtt="156.688" />
            <hop id="12" ip="Z.Z.46.144" rtt="157.179" />
            <hop id="13" ip="Z.Z.1.133" rtt="159.210" />
            <hop id="14" ip="Z.Z.86.5" rtt="157.719" />
            <hop id="15" ip="128.48.110.150" rtt="159.673" />
    </traceroute>

Reverse Paris-Traceroute The reverse traceroute also uses a static list to provide source for the reverse measurement. In this case, the BISmark router is the target for the measurement. There's also a second measurement taken from the nearest Mlab server targeting the BISmark router.

The output format for the reverse paris traceroute looks like this:

    <traceroute srcip="X.X.X.X" dstip="Y.Y.Y.Y" timestamp="1437874094" hops="8" direction="dw" toolid="paristraceroute">
            <hop id="1" ip="X.X.X.1" rtt="2.752" />
            <hop id="2" ip="X.246.0.1" rtt="0.625" />
            <hop id="3" ip="Z.Z.95.77" rtt="43.799" />
            <hop id="4" ip="Z.Z.128.201" rtt="41.888" />
            <hop id="5" ip="Y.Y.56.138" rtt="41.245" />
            <hop id="6" ip="*" rtt="" />
            <hop id="7" ip="*" rtt="" />
            <hop id="8" ip="Y.Y.Y.Y" rtt="47.938" />
    </traceroute>