Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gatsby-source-drupal): cache backlink records #33444

Merged
merged 12 commits into from
Oct 25, 2021
10 changes: 10 additions & 0 deletions packages/gatsby-source-drupal/src/__tests__/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,15 @@ jest.mock(`gatsby-source-filesystem`, () => {
}
})

function makeCache() {
const store = new Map()
return {
get: async id => store.get(id),
set: async (key, value) => store.set(key, value),
store,
}
}

const normalize = require(`../normalize`)
const downloadFileSpy = jest.spyOn(normalize, `downloadFile`)

Expand Down Expand Up @@ -75,6 +84,7 @@ describe(`gatsby-source-drupal`, () => {
store,
getNode: id => nodes[id],
getNodes,
cache: makeCache(),
}

beforeAll(async () => {
Expand Down
9 changes: 9 additions & 0 deletions packages/gatsby-source-drupal/src/gatsby-node.js
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ const { setOptions, getOptions } = require(`./plugin-options`)

const { nodeFromData, downloadFile, isFileNode } = require(`./normalize`)
const {
initRefsLookups,
storeRefsLookups,
handleReferences,
handleWebhookUpdate,
handleDeletedNode,
Expand Down Expand Up @@ -150,6 +152,8 @@ exports.sourceNodes = async (
} = pluginOptions
const { createNode, setPluginStatus, touchNode } = actions

await initRefsLookups({ cache, getNode })

// Update the concurrency limit from the plugin options
requestQueue.concurrency = concurrentAPIRequests

Expand Down Expand Up @@ -202,6 +206,7 @@ ${JSON.stringify(webhookBody, null, 4)}`
}

changesActivity.end()
await storeRefsLookups({ cache })
return
}

Expand Down Expand Up @@ -232,6 +237,7 @@ ${JSON.stringify(webhookBody, null, 4)}`
return
}
changesActivity.end()
await storeRefsLookups({ cache })
return
}

Expand Down Expand Up @@ -362,6 +368,7 @@ ${JSON.stringify(webhookBody, null, 4)}`

drupalFetchIncrementalActivity.end()
fastBuildsSpan.finish()
await storeRefsLookups({ cache })
return
}

Expand All @@ -372,6 +379,7 @@ ${JSON.stringify(webhookBody, null, 4)}`
initialSourcing = false

if (!requireFullRebuild) {
await storeRefsLookups({ cache })
return
}
}
Expand Down Expand Up @@ -635,6 +643,7 @@ ${JSON.stringify(webhookBody, null, 4)}`
initialSourcing = false

createNodesSpan.finish()
await storeRefsLookups({ cache, getNodes })
return
}

Expand Down
38 changes: 35 additions & 3 deletions packages/gatsby-source-drupal/src/utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,38 @@ const {

const { getOptions } = require(`./plugin-options`)

const backRefsNamesLookup = new Map()
const referencedNodesLookup = new Map()
let backRefsNamesLookup = new Map()
let referencedNodesLookup = new Map()

const initRefsLookups = async ({ cache }) => {
const backRefsNamesLookupStr = await cache.get(`backRefsNamesLookup`)
const referencedNodesLookupStr = await cache.get(`referencedNodesLookup`)

if (backRefsNamesLookupStr) {
backRefsNamesLookup = new Map(JSON.parse(backRefsNamesLookupStr))
}

if (referencedNodesLookupStr) {
referencedNodesLookup = new Map(JSON.parse(referencedNodesLookupStr))
Comment on lines +20 to +24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this can become very memory expensive. Does this run on PSU without problems? If not we should look into http://ndjson.org/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it on a smaller site but good point — lemme test how expensive this is for a PSU-sized site

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rss increases about 400mb while stringifying the two maps into a totoal of ~85 million characters. Execution time is ~600ms.

I think reasonable enough given how big PSU is — they're using ~6.5gb already at this point so this is a temp 15% bump in memory use that'll be GCed fairly soon after that.

}
}

exports.initRefsLookups = initRefsLookups

const storeRefsLookups = async ({ cache }) => {
await Promise.all([
cache.set(
`backRefsNamesLookup`,
JSON.stringify(Array.from(backRefsNamesLookup.entries()))
),
cache.set(
`referencedNodesLookup`,
JSON.stringify(Array.from(referencedNodesLookup.entries()))
),
])
}

exports.storeRefsLookups = storeRefsLookups

const handleReferences = (
node,
Expand Down Expand Up @@ -333,7 +363,9 @@ ${JSON.stringify(nodeToUpdate, null, 4)}
}
node.internal.contentDigest = createContentDigest(node)
createNode(node)
reporter.log(`Updated Gatsby node: ${node.id}`)
reporter.log(
`Updated Gatsby node: id: ${node.id} — type: ${node.internal.type}`
)
}
}

Expand Down