Prefetching in the SiteBaker is too eager #3514

ikesau · 2024-04-18T20:04:45Z

The idea behind prefetched attachments was that instead of each of our ~530 gdocs fetching all their own (often identical) attachments from the DB and S3 in series, we could prefetch and cache them.

But at the moment we're doing this naively, and selecting ALL the data even though much of it isn't used in any gdoc.

This is especially painful as we prefetch ~6000 charts and ~1100 indicators, which require calls to S3, take upwards of 4 minutes, and frequently time out.

Here's a snippet from a local bake I ran, with some added logging:

node itsJustJavascript/baker/buildLocalBake.js --steps gdocPosts gdriveImages
Baking site locally with baseUrl 'http://localhost:3000/' to dir 'localBake'
--- BakeAll [=---------] 1/10 0.0s ✅ cache flushed
--- BakeAll [==--------] 2/10 0.3s ✅ Prefetched 532 gdocs
--- BakeAll [===-------] 3/10 0.1s ✅ Prefetched 1442 images
--- BakeAll [====------] 4/10 8.3s ✅ Prefetched 45 explorers
--- BakeAll [=====-----] 5/10 116.5s ✅ Prefetched 6179 charts
TypeError: fetch failed at async fetchS3MetadataByPath
  code: 'UND_ERR_CONNECT_TIMEOUT'

Given that we're currently only using linkedIndicators for 8 key-indicators on the homepage, this seems like a grossly counterproductive optimization 😅

I propose that for charts and linkedIndicators, we analyze our gdocs, compute the attachments we need, and only prefetch those. That way we still only have to handle things inside the prefetchAttachments function, but we don't have to worry about it shooting us in the foot if we happen to add a bunch more datapages.

The text was updated successfully, but these errors were encountered:

ikesau added bug needs triage site labels Apr 18, 2024

ikesau mentioned this issue Apr 18, 2024

Don't fetch image metadata during Algolia sync (with tags) #3515

Merged

marcelgerber added the perf label Apr 19, 2024

danyx23 mentioned this issue May 7, 2024

Knex migration slowed down content bakes #3462

Closed

danyx23 added priority 3 - nice to have and removed needs triage labels May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefetching in the SiteBaker is too eager #3514

Prefetching in the SiteBaker is too eager #3514

ikesau commented Apr 18, 2024 •

edited

Prefetching in the SiteBaker is too eager #3514

Prefetching in the SiteBaker is too eager #3514

Comments

ikesau commented Apr 18, 2024 • edited

ikesau commented Apr 18, 2024 •

edited