Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document best practice usage of pageContext #6467

Closed
hswolff opened this issue Jul 16, 2018 · 18 comments · Fixed by #23136
Closed

Document best practice usage of pageContext #6467

hswolff opened this issue Jul 16, 2018 · 18 comments · Fixed by #23136
Assignees
Labels
type: documentation An issue or pull request for improving or updating Gatsby's documentation

Comments

@hswolff
Copy link

hswolff commented Jul 16, 2018

Edit (by @m-allanson): original issue was solved - this is now a documentation issue. See #6467 (comment) and follow up comments.

Description

Recently upgraded my blog to Gatsby v2 and am seeing it take ~50 seconds longer to build.

Steps to reproduce

This is the PR that upgrades my PR to Gatsby v2: hswolff/blog#4

You can check out master and run install deps and then run yarn build on both branches to see the change in behavior.

@KyleAMathews
Copy link
Contributor

Specifically updating his schema after createPages is taking nearly 2 minutes. Seems like it's hitting some sort of pathological case.

@KyleAMathews
Copy link
Contributor

@pieh could you look into this?

@pieh
Copy link
Contributor

pieh commented Jul 16, 2018

Yeah, I'm on it

@pieh pieh self-assigned this Jul 16, 2018
@pieh
Copy link
Contributor

pieh commented Jul 16, 2018

This is interesting:

===PROFILE===
├─ ROOT: 126363.297062ms | 100.00%
└─ generate schema: 106728.97009ms | 84.46%
   ├─ extractFieldExamples SitePage: 264.611494ms | 0.25%
   ├─ extractFieldExamples SitePlugin: 4.2849900000000005ms | 0.00%
   ├─ extractFieldExamples Site: 0.277943ms | 0.00%
   ├─ extractFieldExamples Directory: 1.577542ms | 0.00%
   ├─ extractFieldExamples File: 55.946252ms | 0.05%
   ├─ extractFieldExamples MarkdownRemark: 35.786373ms | 0.03%
   ├─ buildNodeTypes: 1343.161175ms | 1.26%
   ├─ buildNodeConnections: 217.83265400000002ms | 0.20%
   └─ new GraphQLSchema: 105165.92856ms | 98.54%

The slowdown is in this part:

const schema = new GraphQLSchema({
query: new GraphQLObjectType({
name: `RootQueryType`,
fields: { ...connections, ...nodes },
}),
})

Will update when I get to the bottom of it

@pieh
Copy link
Contributor

pieh commented Jul 16, 2018

https://github.com/hswolff/blog/blob/4d9410166b7d0cc2da0e4ed89fc794ffce0107a0/gatsby-node.js#L152-L165

If you change it like that:

  edges.forEach(({ node }) => {
    if (node.frontmatter.tags) {
      node.frontmatter.tags.forEach(tag => {
        if (!tags[tag]) {
          tags[tag] = {
            name: tag,
            slug: createFullUrl(`tag/${tag}`),
-            nodes: [],
+            nodes___NODE: [],
          };
        }
-        tags[tag].nodes.push(node);
+        tags[tag].nodes___NODE.push(node.id);
      });
    }
  });

Before:

createNodeFields SitePage: 106747.933646ms | 98.17%

After:

createNodeFields SitePage: 1082.52279ms | 54.36%

Not sure why this is different from v1 to be honest ... there should be no difference in this regard

@pieh
Copy link
Contributor

pieh commented Jul 16, 2018

So this is interesting:
v1vsv2
longer schema generation is because v2 have this fileAbsolutePath field (not there in v1) which we will do intensive processing to try to link to File node (95% of overall schema generation time is spent in FileType.shouldInfer). It also takes very long in general because tags is object with over 1000 fields and for each of those fields we do separate FileType.shouldInfer call

@pieh
Copy link
Contributor

pieh commented Jul 17, 2018

I've opened PR that fixes this.

But really full data shouldn't be passed by context to page - you should only pass minimal needed data ( like ids or slugs) to be used in page queries to get full data there.

@hswolff
Copy link
Author

hswolff commented Jul 17, 2018

Exciting you found a fix! Should be a good thing to document to educate users - that there are ways you can misuse Gatsby resulting in unexpected performance/behavior.

@pieh
Copy link
Contributor

pieh commented Jul 17, 2018

I agree. TBH I thought we had this in docs, but apparently I was wrong.

Looking through docs I see those spots that could be good place for that information:

Additionally - maybe we could try to display warning if passed context is really heavy - just not sure how to determine what's too heavy

@KyleAMathews
Copy link
Contributor

We also might want to start removing context from the page nodes to prevent a schema for being created for them. It's of pretty limited utility I think.

@m-allanson
Copy link
Contributor

@pieh's fix published in:

  • gatsby@2.0.0-beta.46
  • gatsby-source-filesystem@2.0.1-beta.6
  • gatsby-transformer-toml@2.1.1-beta.4

I'm going to re-open this and mark it as a docs issue. See #6467 (comment) and the follow up comments.

@m-allanson m-allanson added help wanted Issue with a clear description that the community can help with. type: documentation An issue or pull request for improving or updating Gatsby's documentation labels Jul 18, 2018
@m-allanson m-allanson reopened this Jul 18, 2018
@m-allanson m-allanson changed the title Gatsby v2 builds site slower than Gatsby v1 Document best practice usage of pageContext Jul 18, 2018
@dlbnco
Copy link
Contributor

dlbnco commented Dec 3, 2018

We also might want to start removing context from the page nodes to prevent a schema for being created for them. It's of pretty limited utility I think.

Is this an issue with gatsby or with how it is implemented? I don't know if it's related, but I just migrated to v2 and now the build is creating empty json files for some pages, and they look like this:

{"pageContext":{}}

I'm creating pages from markdown files in gatsby-node.js using createPage and also setting context for them, and they are working fine. The rest of the pages that are just .js files under /src/pages with no pageContext are the ones showing up with those empty json files.

I found this after testing the page with PageSpeed. They are showing up under the "Serve resources from a consistent URL" recommendation because they are 5 different files with the same content.

@gatsbot
Copy link

gatsbot bot commented Feb 4, 2019

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

Thanks for being a part of the Gatsby community! 💪💜

@gatsbot gatsbot bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Feb 4, 2019
@gatsbot
Copy link

gatsbot bot commented Feb 15, 2019

Hey again!

It’s been 30 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it.

Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to reopen this issue or create a new one if you need anything else.

Thanks again for being part of the Gatsby community!

@gatsbot gatsbot bot closed this as completed Feb 15, 2019
@NWRichmond
Copy link
Contributor

NWRichmond commented Feb 7, 2020

I would be very excited to see some documentation on pageContext best practices.

My interest in pageContext best practices is at an all-time-high after reading this blog post, which suggests that it's more performant to stuff data fetched in gatsby-node into pageContext for large sites which rely on queries from external data sources (3rd-party GraphQL APIs etc.). The idea being that you don't want to have hundreds or even thousands of page queries firing off separate requests for data that can be collected in a single query in gatsby-node.

The reasoning makes sense to me at face value, but I'm also not familiar enough with the Gatsby internals to understand the trade-offs. Some guidance in the docs (especially https://www.gatsbyjs.org/docs/creating-and-modifying-pages/, I think) would be a blessing, especially to those of us building very large sites.

@NWRichmond NWRichmond reopened this Feb 7, 2020
@NWRichmond
Copy link
Contributor

NWRichmond commented Feb 7, 2020

But really full data shouldn't be passed by context to page - you should only pass minimal needed data ( like ids or slugs) to be used in page queries to get full data there.

@pieh can you shed some light on why that's the convention?

When it comes to performance, I'm unsure of the material difference between:

  • Fetching all of the data data is gatsby-node.js and sending results to page components via pageContext
  • Fetching identifying data in gatsby-node.js, sending it to the page component via pageContext, then fetching the rest of the data in the page component itself via a Page Query

Ultimately, doesn't the data end up in page-data.json using either approach? I suspect there are implications to passing lots of data via pageContext that I'm not appreciating.

@laurieontech laurieontech removed the stale? Issue that may be closed soon due to the original author not responding any more. label Mar 10, 2020
@marcysutton marcysutton added this to To prioritize in Documentation Roadmap via automation Mar 11, 2020
@marcysutton marcysutton removed the help wanted Issue with a clear description that the community can help with. label Apr 9, 2020
@NWRichmond
Copy link
Contributor

If you have similar questions to the ones I posed above, most of the answers can now be found in the updated Creating and Modifying Pages docs. Many thanks to @Ekwuno for adding this helpful documentation!

With respect to the question about flooding 3rd-party APIs at build time with requests in page or component queries:

My interest in pageContext best practices is at an all-time-high after reading this blog post, which suggests that it's more performant to stuff data fetched in gatsby-node into pageContext for large sites which rely on queries from external data sources (3rd-party GraphQL APIs etc.). The idea being that you don't want to have hundreds or even thousands of page queries firing off separate requests for data that can be collected in a single query in gatsby-node.

One way to avoid this issue is to create a source plugin which will do all of the data fetching at once (in gatsby-node.js) but will expose that data via Gatsby's internal GraphQL node interface so that it can be referenced by many pages or components. If the API that your sourcing data from is a GraphQL API, consider using gatsby-graphql-source-toolkit to scaffold the source plugin.

@wvbe
Copy link

wvbe commented Jan 10, 2021

Thanks for posting that info @NWRichmond. For those of us googling for pageContext/query best practices in a hurry, let me summarise my main takeaways;

  • Both pageContext and query exports provide page templates with information.
  • pageContext is calculated during gatsby-node, which is a phase where multiple pages may be created. This allows you to run a cost-effective "bulk" GraphQL queries and generate many pages out of it, which may save time versus the alternative;
  • The query export might run a similar GraphQL query as you might have done during gatsby-node, but it will do so for every page that is generated. Thus, you may be paying a part of the cost multiple times.
  • As an advantage over pageContext, a page template can hot-reload itself if any of the data returned by the query changes -- not so when using pageContext.

Source, source

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: documentation An issue or pull request for improving or updating Gatsby's documentation
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

10 participants