Skip to content

Investigate missing jsonl files from s3 bucket #538

@hellais

Description

@hellais

It was brought up by a community member that certain raw JSONL files are missing from 2020.

For example looking at the autoclavedlookup table I see we have this measurement entry:

8325347	36787	96223	2989	2020-08-01/20200801T003343Z-BY-AS6697-web_connectivity-20200801T003344Z_AS6697_JwVqBeEB4cqXDHwlgTdigXaFA1XiUkplDhRLucQ7YNkrdTWzZy-0.2.0-probe.json	2020-08-01/web_connectivity.02.tar.lz4	20200801T003344Z_AS6697_JwVqBeEB4cqXDHwlgTdigXaFA1XiUkplDhRLucQ7YNkrdTWzZy	http://www.metacafe.com/

and it's present in

aws s3 --no-sign-request ls --recursive s3://ooni-data-eu-fra/autoclaved/jsonl.tar.lz4/2020-08-01/web_connectivity.02.tar.lz4

2020-08-02 03:25:27   13981487 autoclaved/jsonl.tar.lz4/2020-08-01/web_connectivity.02.tar.lz4

Yet when I try to look it up in the jsonl directory tree I get nothing:

% aws s3 --no-sign-request ls --recursive s3://ooni-data-eu-fra/jsonl/webconnectivity/BY/20200801/

~ %

Metadata

Metadata

Labels

bugSomething isn't working correctlyooni/pipelineIssues related to https://github.com/ooni/pipelinepriority/lowNice to have

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions