Skip to content

Reprocess measurements following the 2026-04-10 incident #398

@hellais

Description

@hellais

Following the incident in #396 we now have 867286 measurements stored inside of the ooniprobe-failed-reports-eu-central-1-1d24426a bucket.

During the call yesterday we were discussing several options for doing it and we seemed to be leaning towards making use of the existing fastpath as much as possible.

Doing some investigation into the fastpath code, this seems to be possible and it would not affect the size of buckets, contrary to what we thought during the call. The bucket timestamp is derived from the measurement_uid, which means that an older measurement_uid would lead to it having an older bucket date.

One thing that is critical, though, is that we MUST not run the ooniapi-uploader with the same configuration as it's running in the fastpath, otherwise this will cause the postcans and jsonl to be overriden due to a path conflict (see: https://github.com/ooni/devops/blob/main/ansible/roles/fastpath/templates/ooni_api_uploader.py#L190). What needs to happen instead is that the collector_id should be set to something unique (eg. s3) and the ooni_api_uploader should be run on a different host.

In summary, what we should do to reprocess measurements is as follows:

  1. Setup a new fastpath host with a different unique collector_id
  2. Write a script that
    • takes measurements from s3 and performs a POST to the localhttpfeeder as if they were sent directly from the OONI API setting the correct measurement_uid
    • upon successful read delete the original file from s3
  3. Once all of these have been posted, wait for the ooniapi-uploader to run populating the buckets in s3

We can probably optimize step 2 a bit, by updating the reprocessor code and batching writes instead of doing them sequentially, but we can probably just take more time to do this and apply some kind of throttling on the requests.

If we throttle the requests to 50 per second, we should be done reprocessing data in ~2.5h.

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority/mediumNormal priority issuetaskTechnical implementation task

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions