Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

throttling: measure HTTP body download speed #2493

Closed
bassosimone opened this issue Jun 19, 2023 · 2 comments
Closed

throttling: measure HTTP body download speed #2493

bassosimone opened this issue Jun 19, 2023 · 2 comments
Assignees
Labels
data quality enhancement improving existing code or new feature funder/drl2022-2024 methodology issues related to the testing methodology ooni/probe-engine priority/high

Comments

@bassosimone
Copy link
Contributor

bassosimone commented Jun 19, 2023

This issue is part of ooni/ooni.org#1296. We aim to modify dslx and Web Connectivity LTE to include support for measuring the HTTP body download speed. This change will be instrumental to make sure we can detect heavy throttling of specific HTTP URLs by observing the body download speed.

@bassosimone bassosimone added enhancement improving existing code or new feature priority/high methodology issues related to the testing methodology data quality ooni/probe-engine funder/drl2022-2024 labels Jun 19, 2023
@bassosimone bassosimone self-assigned this Jun 19, 2023
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 27, 2023
The ndt7 server uses memoryless to avoid sampling the download speed
at constant intervals, which provides PASTA properties (where PASTA
means Poisson Arrivals See Time Averages.

In other words, ndt7 has a better download speed observation mechanism
than one that samples at fixed intervals because the observation is
independent of possibly cyclical events happening in the network that
could synchronize with the download speed polling period.

We want to have the same properties for measuring the download
speed inside of Web Connectivity LTE.

To this end, let us import the package m-lab uses for ndt7 server.

This work is part of ooni/probe#2493.
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 27, 2023
The ndt7 server uses memoryless to avoid sampling the download speed at
constant intervals, which provides PASTA properties (where PASTA means
Poisson Arrivals See Time Averages).

In other words, ndt7 has a better download speed observation mechanism
than one that samples at fixed intervals because the observation is
independent of possibly cyclical events happening in the network that
could synchronize with the download speed polling period.

We want to have the same properties for measuring the download speed
inside of Web Connectivity LTE.

To this end, let us import the package m-lab uses for ndt7 server.

This work is part of ooni/probe#2493.
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 27, 2023
This commit backports 554ad6d from
the master development branch.

The ndt7 server uses memoryless to avoid sampling the download speed at
constant intervals, which provides PASTA properties (where PASTA means
Poisson Arrivals See Time Averages).

In other words, ndt7 has a better download speed observation mechanism
than one that samples at fixed intervals because the observation is
independent of possibly cyclical events happening in the network that
could synchronize with the download speed polling period.

We want to have the same properties for measuring the download speed
inside of Web Connectivity LTE.

To this end, let us import the package m-lab uses for ndt7 server.

This work is part of ooni/probe#2493.
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 27, 2023
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 27, 2023
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 27, 2023
This commit backports 1428fb1 from
the main development branch.

I noticed this issue while working on
ooni/probe#2493
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 27, 2023
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 27, 2023
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 27, 2023
Otherwise, when we're measuring the download speed, we end up
with too many events in the JSON file.

Part of ooni/probe#2493
bassosimone added a commit to ooni/2023-05-richer-input that referenced this issue Jun 27, 2023
bassosimone added a commit to ooni/2023-05-richer-input that referenced this issue Jun 27, 2023
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 29, 2023
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 29, 2023
This diff implements a lightweight approach to throttling that takes
advantage of the step-by-step design and should also be suitable to
measure throttling using `dslx` (see
ooni/probe#2493).

Before discussing the approach implemented here, it is important to
point out that:

1. if we're using step-by-step, we're collecting up to 64 network events
for a single network connection;

2. with step-by-step, each trace is bound to a single network connection
or DNS round trip;

3. both Web Connectivity v0.5 and dslx use the step-by-step approach;

4. therefore, for extreme throttling of a single connection, 64 I/O
events are *a lot of events* to observe throttling;

5. additionally, we're currently limited at downloading `1<<19` bytes of
the body, so there is not much room for collecting *lots of data*
anyway;

6. additionally, if we were to collect more bytes, the bottleneck would
become collecting and uploading the HTTP response body to the OONI
backend.

That said, by exploiting the fact that step-by-step means that a trace
is bound to a single network connection, we can add passive atomic
collection of the bytes received by a trace. Because we're dealing with
unconnected UDP sockets, we also need to be careful about accounting the
bytes received from the peer that sent the bytes. To this end, we
maintain a map from the remote endpoint address and protocol to the
number of bytes received. The trace allows one to export the current
map. Because data collection is passive, we can start as late as the
HTTP download and we would still collect correct cumulative data.

We also introduce a new sampler for measuring throttling. The design of
the sampler is similar to the design we're using inside of ndt7. We use
a memoryless ticker to avoid sampling periodically but we clamp the
distribution such that we will typically receive the expected amount of
samplers for each time period.

It is also worth noting that I believe the already collected 64 network
events are fine to determine throttling, but we cannot know for sure,
hence it makes sense to improve our data collection capabilities.

The related spec PR is ooni/spec#276.

Once this diff is merged, we would still need to do the following:

- [ ] update dslx to use this functionality
- [ ] land Web Connectivity LTE

The latter is fundamental to collect speed samples. We're not doing that
with Web Connectivity v0.4.

While there, this diff also improves the measurexlite documentation a
bit.
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 29, 2023
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 29, 2023
bassosimone added a commit to ooni/probe-cli that referenced this issue Jul 4, 2023
We should have done this in #1166.

We'll account this diff as part of
ooni/probe#2493.
bassosimone added a commit to ooni/probe-cli that referenced this issue Jul 4, 2023
This diff backports 7b88651 from
the master branch to the release/3.18 branch.

This diff implements a lightweight approach to throttling that takes
advantage of the step-by-step design and should also be suitable to
measure throttling using `dslx` (see
ooni/probe#2493).

Before discussing the approach implemented here, it is important to
point out that:

1. if we're using step-by-step, we're collecting up to 64 network events
for a single network connection;

2. with step-by-step, each trace is bound to a single network connection
or DNS round trip;

3. both Web Connectivity v0.5 and dslx use the step-by-step approach;

4. therefore, for extreme throttling of a single connection, 64 I/O
events are *a lot of events* to observe throttling;

5. additionally, we're currently limited at downloading `1<<19` bytes of
the body, so there is not much room for collecting *lots of data*
anyway;

6. additionally, if we were to collect more bytes, the bottleneck would
become collecting and uploading the HTTP response body to the OONI
backend.

That said, by exploiting the fact that step-by-step means that a trace
is bound to a single network connection, we can add passive atomic
collection of the bytes received by a trace. Because we're dealing with
unconnected UDP sockets, we also need to be careful about accounting the
bytes received from the peer that sent the bytes. To this end, we
maintain a map from the remote endpoint address and protocol to the
number of bytes received. The trace allows one to export the current
map. Because data collection is passive, we can start as late as the
HTTP download and we would still collect correct cumulative data.

We also introduce a new sampler for measuring throttling. The design of
the sampler is similar to the design we're using inside of ndt7. We use
a memoryless ticker to avoid sampling periodically but we clamp the
distribution such that we will typically receive the expected amount of
samplers for each time period.

It is also worth noting that I believe the already collected 64 network
events are fine to determine throttling, but we cannot know for sure,
hence it makes sense to improve our data collection capabilities.

The related spec PR is ooni/spec#276.

Once this diff is merged, we would still need to do the following:

- [ ] update dslx to use this functionality
- [ ] land Web Connectivity LTE

The latter is fundamental to collect speed samples. We're not doing that
with Web Connectivity v0.4.

While there, this diff also improves the measurexlite documentation a
bit.
bassosimone added a commit to ooni/probe-cli that referenced this issue Jul 4, 2023
This commit cherry-picks 7fc1b70
from the master to the release/3.18 branch.

We should have done this in #1166.

We'll account this diff as part of
ooni/probe#2493.
@bassosimone
Copy link
Contributor Author

bassosimone commented Jul 4, 2023

I have also backported the relevant patches to the release/3.18 branch. We can now consider this issue done.

This issue was originally about adding support for dslx only. But then I realized I also needed to add this functionality to Web Connectivity LTE to collect samples. So, I ended up working on both of them.

@bassosimone
Copy link
Contributor Author

bassosimone commented Jul 4, 2023

This is a measurement collected with Web Connectivity LTE v0.5.24 that includes extra information useful to detect throttling: https://explorer.ooni.org/m/20230704134254.363856_IT_webconnectivity_06739f10803ce20e.

Here's the new throttling information we collect extracted from such a measurement:

{
  "address": "157.240.231.35:443",
  "failure": null,
  "num_bytes": 3271,
  "operation": "bytes_received_cumulative",
  "proto": "tcp",
  "t0": 0.170633,
  "t": 0.170633,
  "transaction_id": 7
}
{
  "address": "31.13.86.36:443",
  "failure": null,
  "num_bytes": 107792,
  "operation": "bytes_received_cumulative",
  "proto": "tcp",
  "t0": 0.317107,
  "t": 0.317107,
  "transaction_id": 5
}
{
  "address": "31.13.71.36:443",
  "failure": null,
  "num_bytes": 3242,
  "operation": "bytes_received_cumulative",
  "proto": "tcp",
  "t0": 1.341794,
  "t": 1.341794,
  "transaction_id": 8
}

The operation is bytes_received_cumulative. The address and transaction_id are consistent with the rest of the trace. Every connection collects at least one such sample when we're done reading the body. It is possible that we collect more samples. Whether this happens it's time dependent. We schedule sampling on the average every 0.25 milliseconds using a truncated exponential distribution. If the download is slow, we'll have more samples.

Data analysis should initially focus on the read and read_from network events, which have a coarser granularity than this event. The purpose of this event is to have aggregate data shall the download run for quite some time.

In practice, I do not think this would happen as long as (a) we limit the body size and (b) we don't include URLs specific for measuring throttling. OTOH, shall we include larger URLs we would see the samples. But, larger URLs mean larger measurement JSONs to submit, which means we also need to strike a balance here and perhaps avoid submitting the body when we are just focusing on measuring throttling.

@bassosimone bassosimone changed the title throttling: modify dslx to measure HTTP body download speed throttling: measure HTTP body download speed Jul 4, 2023
cyBerta pushed a commit to cyBerta/probe-cli that referenced this issue Aug 4, 2023
The ndt7 server uses memoryless to avoid sampling the download speed at
constant intervals, which provides PASTA properties (where PASTA means
Poisson Arrivals See Time Averages).

In other words, ndt7 has a better download speed observation mechanism
than one that samples at fixed intervals because the observation is
independent of possibly cyclical events happening in the network that
could synchronize with the download speed polling period.

We want to have the same properties for measuring the download speed
inside of Web Connectivity LTE.

To this end, let us import the package m-lab uses for ndt7 server.

This work is part of ooni/probe#2493.
cyBerta pushed a commit to cyBerta/probe-cli that referenced this issue Aug 4, 2023
cyBerta pushed a commit to cyBerta/probe-cli that referenced this issue Aug 4, 2023
This diff implements a lightweight approach to throttling that takes
advantage of the step-by-step design and should also be suitable to
measure throttling using `dslx` (see
ooni/probe#2493).

Before discussing the approach implemented here, it is important to
point out that:

1. if we're using step-by-step, we're collecting up to 64 network events
for a single network connection;

2. with step-by-step, each trace is bound to a single network connection
or DNS round trip;

3. both Web Connectivity v0.5 and dslx use the step-by-step approach;

4. therefore, for extreme throttling of a single connection, 64 I/O
events are *a lot of events* to observe throttling;

5. additionally, we're currently limited at downloading `1<<19` bytes of
the body, so there is not much room for collecting *lots of data*
anyway;

6. additionally, if we were to collect more bytes, the bottleneck would
become collecting and uploading the HTTP response body to the OONI
backend.

That said, by exploiting the fact that step-by-step means that a trace
is bound to a single network connection, we can add passive atomic
collection of the bytes received by a trace. Because we're dealing with
unconnected UDP sockets, we also need to be careful about accounting the
bytes received from the peer that sent the bytes. To this end, we
maintain a map from the remote endpoint address and protocol to the
number of bytes received. The trace allows one to export the current
map. Because data collection is passive, we can start as late as the
HTTP download and we would still collect correct cumulative data.

We also introduce a new sampler for measuring throttling. The design of
the sampler is similar to the design we're using inside of ndt7. We use
a memoryless ticker to avoid sampling periodically but we clamp the
distribution such that we will typically receive the expected amount of
samplers for each time period.

It is also worth noting that I believe the already collected 64 network
events are fine to determine throttling, but we cannot know for sure,
hence it makes sense to improve our data collection capabilities.

The related spec PR is ooni/spec#276.

Once this diff is merged, we would still need to do the following:

- [ ] update dslx to use this functionality
- [ ] land Web Connectivity LTE

The latter is fundamental to collect speed samples. We're not doing that
with Web Connectivity v0.4.

While there, this diff also improves the measurexlite documentation a
bit.
cyBerta pushed a commit to cyBerta/probe-cli that referenced this issue Aug 4, 2023
cyBerta pushed a commit to cyBerta/probe-cli that referenced this issue Aug 4, 2023
This commit shuffles around the `netemx` implementation to simplify
construction and usage. Here are some design goals that underpin this
set of changes:

1. construction using optional functions and sane defaults, so one needs
to write much less;

2. we always want an HTTP server listening 80/tcp, 443/tcp, and 443/udp,
so we can simplify the code *a lot*.
 
I worked on this change while trying to wrap up with
ooni/probe#2493. My main aim was to add netem
based smoke testing to Web Connectivity LTE such that we can start
increasing the amount of users using it with some extra confidence
compared to the current situation where code coverage is very low.
cyBerta pushed a commit to cyBerta/probe-cli that referenced this issue Aug 4, 2023
This diff moves measurement-time functions to the internal/model
package. In turn, this work would be helpful to write some basic smoke
tests for Web Connectivity LTE. In turn, doing that is functional to
ooni/probe#2493. With more smoke testing for
Web Connectivity LTE, we can enable it for more users and we can
therefore collect more data about potential throttling.
cyBerta pushed a commit to cyBerta/probe-cli that referenced this issue Aug 4, 2023
This PR starts adding test coverage to Web Connectivity LTE. With
increasing test coverage, we can start thinking about increasing the
number of users using this version of Web Connectivity. In turn, this
means that we can collect more data about throttling, which we
implemented as part of ooni/probe#2493.
cyBerta pushed a commit to cyBerta/probe-cli that referenced this issue Aug 4, 2023
We should have done this in ooni#1166.

We'll account this diff as part of
ooni/probe#2493.
cyBerta pushed a commit to cyBerta/probe-cli that referenced this issue Aug 4, 2023
This diff implements a lightweight approach to throttling that takes
advantage of the step-by-step design and should also be suitable to
measure throttling using `dslx` (see
ooni/probe#2493).

Before discussing the approach implemented here, it is important to
point out that:

1. if we're using step-by-step, we're collecting up to 64 network events
for a single network connection;

2. with step-by-step, each trace is bound to a single network connection
or DNS round trip;

3. both Web Connectivity v0.5 and dslx use the step-by-step approach;

4. therefore, for extreme throttling of a single connection, 64 I/O
events are *a lot of events* to observe throttling;

5. additionally, we're currently limited at downloading `1<<19` bytes of
the body, so there is not much room for collecting *lots of data*
anyway;

6. additionally, if we were to collect more bytes, the bottleneck would
become collecting and uploading the HTTP response body to the OONI
backend.

That said, by exploiting the fact that step-by-step means that a trace
is bound to a single network connection, we can add passive atomic
collection of the bytes received by a trace. Because we're dealing with
unconnected UDP sockets, we also need to be careful about accounting the
bytes received from the peer that sent the bytes. To this end, we
maintain a map from the remote endpoint address and protocol to the
number of bytes received. The trace allows one to export the current
map. Because data collection is passive, we can start as late as the
HTTP download and we would still collect correct cumulative data.

We also introduce a new sampler for measuring throttling. The design of
the sampler is similar to the design we're using inside of ndt7. We use
a memoryless ticker to avoid sampling periodically but we clamp the
distribution such that we will typically receive the expected amount of
samplers for each time period.

It is also worth noting that I believe the already collected 64 network
events are fine to determine throttling, but we cannot know for sure,
hence it makes sense to improve our data collection capabilities.

The related spec PR is ooni/spec#276.

Once this diff is merged, we would still need to do the following:

- [ ] update dslx to use this functionality
- [ ] land Web Connectivity LTE

The latter is fundamental to collect speed samples. We're not doing that
with Web Connectivity v0.4.

While there, this diff also improves the measurexlite documentation a
bit.
cyBerta pushed a commit to cyBerta/probe-cli that referenced this issue Aug 4, 2023
cyBerta pushed a commit to cyBerta/probe-cli that referenced this issue Aug 4, 2023
This commit shuffles around the `netemx` implementation to simplify
construction and usage. Here are some design goals that underpin this
set of changes:

1. construction using optional functions and sane defaults, so one needs
to write much less;

2. we always want an HTTP server listening 80/tcp, 443/tcp, and 443/udp,
so we can simplify the code *a lot*.
 
I worked on this change while trying to wrap up with
ooni/probe#2493. My main aim was to add netem
based smoke testing to Web Connectivity LTE such that we can start
increasing the amount of users using it with some extra confidence
compared to the current situation where code coverage is very low.
cyBerta pushed a commit to cyBerta/probe-cli that referenced this issue Aug 4, 2023
This diff moves measurement-time functions to the internal/model
package. In turn, this work would be helpful to write some basic smoke
tests for Web Connectivity LTE. In turn, doing that is functional to
ooni/probe#2493. With more smoke testing for
Web Connectivity LTE, we can enable it for more users and we can
therefore collect more data about potential throttling.
cyBerta pushed a commit to cyBerta/probe-cli that referenced this issue Aug 4, 2023
This PR starts adding test coverage to Web Connectivity LTE. With
increasing test coverage, we can start thinking about increasing the
number of users using this version of Web Connectivity. In turn, this
means that we can collect more data about throttling, which we
implemented as part of ooni/probe#2493.
cyBerta pushed a commit to cyBerta/probe-cli that referenced this issue Aug 4, 2023
We should have done this in ooni#1166.

We'll account this diff as part of
ooni/probe#2493.
Murphy-OrangeMud pushed a commit to Murphy-OrangeMud/probe-cli that referenced this issue Feb 13, 2024
The ndt7 server uses memoryless to avoid sampling the download speed at
constant intervals, which provides PASTA properties (where PASTA means
Poisson Arrivals See Time Averages).

In other words, ndt7 has a better download speed observation mechanism
than one that samples at fixed intervals because the observation is
independent of possibly cyclical events happening in the network that
could synchronize with the download speed polling period.

We want to have the same properties for measuring the download speed
inside of Web Connectivity LTE.

To this end, let us import the package m-lab uses for ndt7 server.

This work is part of ooni/probe#2493.
Murphy-OrangeMud pushed a commit to Murphy-OrangeMud/probe-cli that referenced this issue Feb 13, 2024
Murphy-OrangeMud pushed a commit to Murphy-OrangeMud/probe-cli that referenced this issue Feb 13, 2024
This diff implements a lightweight approach to throttling that takes
advantage of the step-by-step design and should also be suitable to
measure throttling using `dslx` (see
ooni/probe#2493).

Before discussing the approach implemented here, it is important to
point out that:

1. if we're using step-by-step, we're collecting up to 64 network events
for a single network connection;

2. with step-by-step, each trace is bound to a single network connection
or DNS round trip;

3. both Web Connectivity v0.5 and dslx use the step-by-step approach;

4. therefore, for extreme throttling of a single connection, 64 I/O
events are *a lot of events* to observe throttling;

5. additionally, we're currently limited at downloading `1<<19` bytes of
the body, so there is not much room for collecting *lots of data*
anyway;

6. additionally, if we were to collect more bytes, the bottleneck would
become collecting and uploading the HTTP response body to the OONI
backend.

That said, by exploiting the fact that step-by-step means that a trace
is bound to a single network connection, we can add passive atomic
collection of the bytes received by a trace. Because we're dealing with
unconnected UDP sockets, we also need to be careful about accounting the
bytes received from the peer that sent the bytes. To this end, we
maintain a map from the remote endpoint address and protocol to the
number of bytes received. The trace allows one to export the current
map. Because data collection is passive, we can start as late as the
HTTP download and we would still collect correct cumulative data.

We also introduce a new sampler for measuring throttling. The design of
the sampler is similar to the design we're using inside of ndt7. We use
a memoryless ticker to avoid sampling periodically but we clamp the
distribution such that we will typically receive the expected amount of
samplers for each time period.

It is also worth noting that I believe the already collected 64 network
events are fine to determine throttling, but we cannot know for sure,
hence it makes sense to improve our data collection capabilities.

The related spec PR is ooni/spec#276.

Once this diff is merged, we would still need to do the following:

- [ ] update dslx to use this functionality
- [ ] land Web Connectivity LTE

The latter is fundamental to collect speed samples. We're not doing that
with Web Connectivity v0.4.

While there, this diff also improves the measurexlite documentation a
bit.
Murphy-OrangeMud pushed a commit to Murphy-OrangeMud/probe-cli that referenced this issue Feb 13, 2024
Murphy-OrangeMud pushed a commit to Murphy-OrangeMud/probe-cli that referenced this issue Feb 13, 2024
This commit shuffles around the `netemx` implementation to simplify
construction and usage. Here are some design goals that underpin this
set of changes:

1. construction using optional functions and sane defaults, so one needs
to write much less;

2. we always want an HTTP server listening 80/tcp, 443/tcp, and 443/udp,
so we can simplify the code *a lot*.
 
I worked on this change while trying to wrap up with
ooni/probe#2493. My main aim was to add netem
based smoke testing to Web Connectivity LTE such that we can start
increasing the amount of users using it with some extra confidence
compared to the current situation where code coverage is very low.
Murphy-OrangeMud pushed a commit to Murphy-OrangeMud/probe-cli that referenced this issue Feb 13, 2024
This diff moves measurement-time functions to the internal/model
package. In turn, this work would be helpful to write some basic smoke
tests for Web Connectivity LTE. In turn, doing that is functional to
ooni/probe#2493. With more smoke testing for
Web Connectivity LTE, we can enable it for more users and we can
therefore collect more data about potential throttling.
Murphy-OrangeMud pushed a commit to Murphy-OrangeMud/probe-cli that referenced this issue Feb 13, 2024
This PR starts adding test coverage to Web Connectivity LTE. With
increasing test coverage, we can start thinking about increasing the
number of users using this version of Web Connectivity. In turn, this
means that we can collect more data about throttling, which we
implemented as part of ooni/probe#2493.
Murphy-OrangeMud pushed a commit to Murphy-OrangeMud/probe-cli that referenced this issue Feb 13, 2024
We should have done this in ooni#1166.

We'll account this diff as part of
ooni/probe#2493.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data quality enhancement improving existing code or new feature funder/drl2022-2024 methodology issues related to the testing methodology ooni/probe-engine priority/high
Projects
None yet
Development

No branches or pull requests

1 participant