Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefetching in the SiteBaker is too eager #3514

Open
ikesau opened this issue Apr 18, 2024 · 0 comments
Open

Prefetching in the SiteBaker is too eager #3514

ikesau opened this issue Apr 18, 2024 · 0 comments

Comments

@ikesau
Copy link
Member

ikesau commented Apr 18, 2024

The idea behind prefetched attachments was that instead of each of our ~530 gdocs fetching all their own (often identical) attachments from the DB and S3 in series, we could prefetch and cache them.

But at the moment we're doing this naively, and selecting ALL the data even though much of it isn't used in any gdoc.

This is especially painful as we prefetch ~6000 charts and ~1100 indicators, which require calls to S3, take upwards of 4 minutes, and frequently time out.

Here's a snippet from a local bake I ran, with some added logging:

node itsJustJavascript/baker/buildLocalBake.js --steps gdocPosts gdriveImages
Baking site locally with baseUrl 'http://localhost:3000/' to dir 'localBake'
--- BakeAll [=---------] 1/10 0.0s ✅ cache flushed
--- BakeAll [==--------] 2/10 0.3s ✅ Prefetched 532 gdocs
--- BakeAll [===-------] 3/10 0.1s ✅ Prefetched 1442 images
--- BakeAll [====------] 4/10 8.3s ✅ Prefetched 45 explorers
--- BakeAll [=====-----] 5/10 116.5s ✅ Prefetched 6179 charts
TypeError: fetch failed at async fetchS3MetadataByPath
  code: 'UND_ERR_CONNECT_TIMEOUT'

Given that we're currently only using linkedIndicators for 8 key-indicators on the homepage, this seems like a grossly counterproductive optimization 😅

I propose that for charts and linkedIndicators, we analyze our gdocs, compute the attachments we need, and only prefetch those. That way we still only have to handle things inside the prefetchAttachments function, but we don't have to worry about it shooting us in the foot if we happen to add a bunch more datapages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants