Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gatsby): Schema rebuilding #19092

Merged
merged 54 commits into from Nov 19, 2019
Merged

feat(gatsby): Schema rebuilding #19092

merged 54 commits into from Nov 19, 2019

Conversation

@vladar
Copy link
Contributor

vladar commented Oct 28, 2019

Description

Enables schema rebuilding so that gatsby develop restart is not required when nodes change. It should also help with incremental builds.

This whole feature is related to types and fields created via inference. Types and fields created via schema customization are considered static and shouldn't rebuild.

TODO

  • Initial PoC
  • Remove stale inferred fields
  • Handle ADD_FIELD_TO_NODE in the reducer
  • Rework existing tests to use new APIs (utilizing metadata vs. list of nodes for inferring)
  • Derived types (filter/sort types, nested inferred types, etc.)
  • Root fields / arguments
  • Handle implicit parent-child relations and ADD_CHILD_NODE_TO_PARENT_NODE
  • Updating extensions for node and derived types
  • New tests for this specific feature (relied on existing tests so far which are green)
  • Make sure types and fields created via schema customization are not rebuilt
  • Move schema customization before node sourcing (to skip inference for types marked with @dontInfer)
  • Parse type defs before dispatching CREATE_TYPES action
  • Fix eslint to pick up new schema after rebuild
  • Re-run queries after rebuild (as they may be broken after schema change)
  • Improve dirty-checks to minimize false-positives
  • Test with multiple medium to big size projects

Use cases supported in develop

  1. Adding new node types
  2. Adding nodes with new or structurally modified fields
  3. Modifying nodes (adding new fields, changing field types, etc.)
  4. Deleting fields from node types
  5. Deleting of a type when there are no nodes or fields that could produce it in the first place

In these cases, schema rebuilds without restarting. Non-structural node changes do not trigger a rebuild (i.e., nodes added with the same structure)

Inference Refactoring

These changes required refactoring of the inference process (or more specifically - getExampleValue).

Before this change, getExampleValue was looping through all the nodes of a given type to construct an example value, which was later utilized by type inference. This approach doesn't play well with incremental schema rebuilding because it requires O(N*M) to create example value (where N is the number of node fields, including nested fields, and M is the number of nodes of the given type)

After this PR getExampleValue uses node metadata stored in the redux store and updated on every node-related action. Initial metadata building during bootstrap is the same O(N*M) but getExampleValue is now O(N) which makes it reasonably fast for incremental re-building of the example value.

Caveat: conflict tracking

Conflict tracking for arrays is tricky, i.e.: { a: [5, "foo"] } and { a: [5] }, { a: ["foo"] }
are represented identically in metadata and reported identically. To workaround it we additionally track first NodeId for a type:

{ a: { array: { item: { int: { total: 1, first: "1" }, string: { total: 1, first: "1" } }}
{ a: { array: { item: { int: { total: 1, first: "1" }, string: { total: 1, first: "2" } }}

This helps producing more useful conflict reports (still rare edge cases possible when reporting may be confusing, i.e. when node is deleted)

Caveat: dirty checking

Some plugins will delete nodes and then re-create on any change. So even if final metadata is identical it will be still marked as dirty. We additionally compare dirty metadata between calls (and skip rebuilding if the inferred structure is the same)

Caveat: rebuild granularity

Currently, we rebuild full schema instance from scratch (only when there are some structural changes). Granular updates are complicated as graphql-compose wasn't designed for mutations like this: it is great for complex schema builds but provides little help when you delete or modify types / fields / arguments.

Possible follow-ups

  • Rebuild granularity (see caveat above). This will likely require some coordination with graphql-compose author or our own type/field/arg dependency tracking. But the complexity involved might not worth the effort.

  • We could use metadata directly for inference (vs. example value). Metadata has more information for inference or granular rebuilds. For example, we could handle data conflicts more gently. Say when < 1% of nodes have a conflicting field type - add the majority field and warn about specific conflicting node id (currently we remove a field on conflicts and report both nodes)

  • Investigate webpack invalidation on schema change, see #19092 (review)

Related Issues

Fixes #18939

@vladar vladar requested a review from freiksenet Oct 28, 2019
Copy link
Contributor

freiksenet left a comment

This is so cool! 👍

I think you are missing child/parent relationship cases, so when child/parent relationship is added either through action or by setting 'parent' field.

We usually test against .org (www folder in monorepo) using gatsby-dev-cli. You can test against multiple sites using develop-runner. You need to modify source code a bit, but it's still very handy.

We often publish a pre-release version for packages with big changes, then you can use it to test with develop-runner. Eg you can reuse tag @schema-customization. (To do it, go to gatsby/packages/gatsby and do yarn publish --tag schema-customization).

@@ -0,0 +1,346 @@
/*

This comment has been minimized.

Copy link
@freiksenet

freiksenet Oct 29, 2019

Contributor

Nice!

const report = require(`gatsby-cli/lib/reporter`)

// API_RUNNING_QUEUE_EMPTY could be emitted multiple types
// in a short period of time, so debounce seems reasonable

This comment has been minimized.

Copy link
@pieh

pieh Oct 29, 2019

Contributor

This is true and probably something we should just look into to debounce API_RUNNING_QUEUE_EMPTY event elsewhere (most systems that listen to that are also potentially expensive operations, so having global debounce would help)

This comment has been minimized.

Copy link
@vladar

vladar Nov 5, 2019

Author Contributor

Oops, missed this comment, sorry. It makes sense. But I suggest doing this in a separate PR

// bar: {
// string: { total: 1, example: 'str' },
// },
// }

This comment has been minimized.

Copy link
@pieh

pieh Oct 29, 2019

Contributor

How are nested objects represented here? How arrays are represented?

I.e.
what would

const node2 = { id: '1', nested: { foo: 'bar' }, array: [1, 2, 3, "string"] }

produce

(if this is handled already)

This comment has been minimized.

Copy link
@pieh

pieh Oct 29, 2019

Contributor

Ah there are types below - ignore above ;)

This comment has been minimized.

Copy link
@vladar

vladar Oct 30, 2019

Author Contributor

Yeah, those are just simple usage examples, didn't want to distract with all the recursion stuff here %)

@vladar vladar force-pushed the vladar-rebuild-schema branch from 0149d75 to 0607c9e Nov 12, 2019
@vladar vladar changed the title WIP: schema rebuilding feat(gatsby): Schema rebuilding Nov 13, 2019
@vladar vladar marked this pull request as ready for review Nov 13, 2019
@vladar vladar requested review from gatsbyjs/core as code owners Nov 13, 2019
Copy link
Member

wardpeet left a comment

I've only looked at the webpack plugin and it looks great! 👏 (well done!)
if we want to invalidate the webpack config when the schema changes we need to do some more hackery 😂

app.use(
require(`webpack-dev-middleware`)(compiler, {
logLevel: `silent`,
publicPath: devConfig.output.publicPath,
stats: `errors-only`,
})
)

The webpack-dev-middleware returns an instance that has an invalidate method. This method tells webpack to invalidate itself which reruns eslint on all watched files.

I'm thinking we should do something like

  const webpackDevMiddleware = require(`webpack-dev-middleware`)(compiler, {
    logLevel: `silent`,
    publicPath: devConfig.output.publicPath,
    stats: `errors-only`,
  });

  app.use(webpackDevMiddleware)
  
  // this should only be triggered when it actually has changed.
  emitter.on('SCHEMA_REBUILD', () => {
    webpackDevMiddleware.invalidate()
  })
packages/gatsby/src/utils/webpack-utils.js Outdated Show resolved Hide resolved
packages/gatsby/src/utils/webpack-utils.js Outdated Show resolved Hide resolved
Copy link
Contributor

freiksenet left a comment

LGTM 👍

@vladar

This comment has been minimized.

Copy link
Contributor Author

vladar commented Nov 14, 2019

@wardpeet I tried doing webpack invalidation but for some reason, eslint-loader is not running after it. For now, we re-run queries here:

await rebuild({ parentSpan: activity })
await updateStateAndRunQueries(false, { parentSpan: activity })

which will fail with error anyway. I will add invalidation to possible follow-ups and will probably need your help to debug why it is not working as expected.

@wardpeet

This comment has been minimized.

Copy link
Member

wardpeet commented Nov 14, 2019

Sweet! Sounds great, I'm not 100% sure my pseudo-code was the right call. Happy to debug later :)

@gatsbybot gatsbybot merged commit e4dae4d into master Nov 19, 2019
20 of 21 checks passed
20 of 21 checks passed
Gatsby Build Service Gatsby Build Service
Details
Danger All good
Details
Peril All green. Yay.
Details
ci/circleci: bootstrap Your tests passed on CircleCI!
Details
ci/circleci: e2e_tests_development_runtime Your tests passed on CircleCI!
Details
ci/circleci: e2e_tests_gatsby-image Your tests passed on CircleCI!
Details
ci/circleci: e2e_tests_path-prefix Your tests passed on CircleCI!
Details
ci/circleci: e2e_tests_production_runtime Your tests passed on CircleCI!
Details
ci/circleci: integration_tests_gatsby_pipeline Your tests passed on CircleCI!
Details
ci/circleci: integration_tests_long_term_caching Your tests passed on CircleCI!
Details
ci/circleci: integration_tests_structured_logging Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
ci/circleci: starters_validate Your tests passed on CircleCI!
Details
ci/circleci: themes_e2e_tests_development_runtime Your tests passed on CircleCI!
Details
ci/circleci: themes_e2e_tests_production_runtime Your tests passed on CircleCI!
Details
ci/circleci: unit_tests_node10 Your tests passed on CircleCI!
Details
ci/circleci: unit_tests_node12 Your tests passed on CircleCI!
Details
ci/circleci: unit_tests_node8 Your tests passed on CircleCI!
Details
ci/circleci: unit_tests_www Your tests passed on CircleCI!
Details
ci/circleci: windows_unit_tests Your tests passed on CircleCI!
Details
cypress: default-group 83 tests passed in 00:31
Details
@delete-merged-branch delete-merged-branch bot deleted the vladar-rebuild-schema branch Nov 19, 2019
@vladar

This comment has been minimized.

Copy link
Contributor Author

vladar commented Nov 19, 2019

Published in gatsby 2.18.0

@AileenCGN

This comment has been minimized.

Copy link
Contributor

AileenCGN commented Nov 25, 2019

Hey @vladar 👋🏼
I have a question re this task:

Test with multiple medium to big size projects

Has this been done actually? I'm asking because the jump to Gatsby@2.18.0 significantly slowed down the development process for us:

With Gastby@2.18.0 after changing some markup in a page, createPages takes almost 30s:

success Building development bundle - 22.864s
info added file at /Users/aileen/code/G3/src/pages/members.js
success write out requires - 0.007s
success createPages - 28.390s
success run queries - 0.104s - 0/1 9.58/s
success extract queries from components - 2.147s
success write out requires - 0.007s
success Re-building development bundle - 3.067s
success run queries - 4.639s - 385/385 82.99/s

With Gatsby@2.17.17 after changing some markup in a page, createPages takes 4s:

success Building development bundle - 21.763s
info added file at /Users/aileen/code/G3/src/pages/members.js
success write out requires - 0.006s
success createPages - 3.962s
success run queries - 0.105s - 0/1 9.54/s
success extract queries from components - 2.090s
success write out requires - 0.004s
success Re-building development bundle - 3.149s
success run queries - 4.879s - 385/385 78.90/s
success run queries - 0.078s - 10/10 127.74/s

Is this known? Do you want me to open an issue for that? For now, we'll have to stick using the pre 2.18 version, as developing is kinda unbearable otherwise.

@vladar

This comment has been minimized.

Copy link
Contributor Author

vladar commented Nov 25, 2019

There are two known performance regressions. One in 2.18.0 after schema rebuilding and another one in 2.18.1 after #17681

I guess we need to figure out which one affects you the most. Could you maybe try the exact version 2.18.0 and 2.18.1 and compare results?

@AileenCGN

This comment has been minimized.

Copy link
Contributor

AileenCGN commented Nov 26, 2019

Oh!! Good call! Sorry, for not being accurate enough 😬

Seems like the longer rebuilding time is actually caused by 2.18.1 👇🏼

With Gatsby@2.18.0 (~3.2s):

info added file at /Users/aileen/code/G3/src/pages/members.js
success write out requires - 0.003s
success createPages - 3.158s
success run queries - 0.103s - 1/1 9.69/s
success extract queries from components - 1.753s
success write out requires - 0.006s
success Re-building development bundle - 2.537s
success run queries - 4.605s - 385/385 83.60/s
success run queries - 0.060s - 10/10 165.95/s

With Gatsby@2.18.1 (~21.8s):

info added file at /Users/aileen/code/G3/src/pages/members.js
success extract queries from components - 0.201s
success write out requires - 0.006s
success createPages - 21.790s
success run queries - 0.118s - 1/1 8.46/s
success write out requires - 0.004s
success Re-building development bundle - 22.950s
success run queries - 4.647s - 385/385 82.84/s
@vladar

This comment has been minimized.

Copy link
Contributor Author

vladar commented Nov 27, 2019

@AileenCGN Potential fix for this regression is published in gatsby 2.18.4 (see PR #19774). Could you try again and maybe post here if it improves things for you?

@AileenCGN

This comment has been minimized.

Copy link
Contributor

AileenCGN commented Nov 28, 2019

@vladar Awesome!! It's fixed now 🎉 Back to old rebuild times now 🤗

@tu4mo

This comment has been minimized.

Copy link

tu4mo commented Dec 11, 2019

Hey @vladar, this seems to have somehow broken gatsby-transform-react-docgen, see #20043.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.