Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

response_bodies.2018_12_01_mobile missing data #70

Closed
rviscomi opened this issue Dec 11, 2018 · 5 comments
Closed

response_bodies.2018_12_01_mobile missing data #70

rviscomi opened this issue Dec 11, 2018 · 5 comments
Assignees
Labels

Comments

@rviscomi
Copy link
Member

The 12/1 mobile table is much smaller and missing a lot of data compared to the previous crawl and the current desktop crawl.

2018_11_15_mobile: 35,441,289 rows, 1.41 TB
2018_12_01_mobile: 18,084,199 rows, 152 GB

2018_11_15_desktop: 45,975,086 rows, 2.00 TB
2018_12_01_desktop: 46,284,186 rows, 2.01 TB

I just reran the 12/1 mobile HAR dataflow pipeline and it produced identical results.

cc @jeffposnick

@rviscomi rviscomi added the bug label Dec 11, 2018
@rviscomi
Copy link
Member Author

@pmeenan is looking into this from the WPT side

@pmeenan
Copy link
Member

pmeenan commented Dec 14, 2018

At this point I have no idea why they aren't there but the HARs are definitely light (around 1/2 the bytes in the directory). I reset the archiving code to the latest in GitHub just in case I changed something manually while debugging. From the agent perspective, there should be no reason it would differ from desktop so it has to be on the server side

@pmeenan
Copy link
Member

pmeenan commented Dec 20, 2018

Hmm, I'm not optimistic that it was just a transient issue. The Dec 15th crawl is ~1/2 done and the mobile HARs are trending to be around the same size as the Dec 1 crawl. It doesn't make any sense why mobile should be different from desktop but I'll see if it's something I can reproduce.

@pmeenan
Copy link
Member

pmeenan commented Dec 21, 2018

I can reproduce it when lighthouse capture is enabled which probably explains why it is only hitting mobile. I assume something is going on in the prep for the lighthouse run that is deleting the bodies so hopefully that means it will be easy to track down and fix (being able to reproduce it makes it MUCH easier to track down).

@pmeenan
Copy link
Member

pmeenan commented Dec 21, 2018

Fixed. Will take effect with the Jan 1 crawl. Not sure why it only showed up recently (or maybe we only noticed it recently).

The creation of the bodies zip file was being done at the point where the browser was prepared. In the case of a lighthouse test we go through the prep a second time and it deletes the zip file from the actual test. Only reason we had some bodies was there's a pass to backfill any missing bodies but it would have been trying to grab them from the Lighthouse instance. Either way, all fixed now and the bodies are reliably coming back (and correct) again.

While I was in there I also added svg images to the list of text bodies that we capture so we'll start getting data for those now as well.

I'll leave this open until after we can verify in the Jan 1 crawl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants