Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More work to prevent queries from running when there's in-progress node processing #8859

Merged
merged 4 commits into from
Oct 9, 2018

Conversation

KyleAMathews
Copy link
Contributor

@KyleAMathews KyleAMathews commented Oct 6, 2018

@gaearon reported that reactjs.org was still seeing problems with broken queries
when saving a markdown file multiple times in quick succession.

I investigated and confirmed and found two improvements to our query running
system that prevent this. With these fixes, I'm not able to trigger the bug anymore
with VSCode set to autosave every 100ms.

This PR has two fixes:

  • Changing a file can trigger pages to be recreated which often trigger
    TOUCH_NODE which would change the state back to idle. This wasn't accurrate
    as queries haven't yet been run so I added a new state to source-filesystem's
    state machine to track that a query has been queued so we actually wait to emit
    new nodes until queries have run.
  • Pause query running when new nodes are created and resume once the processing
    is finished. A file node could get created which would dirty a page with a
    markdown query and the query sometimes runs before the markdown file node had
    been recreated which results in an empty query result. Pausing and resuming
    queue running fixes that.

After more investigation, it occurred to me that all that was necessary was to pause query running when node processing was happening which is easy to detect. That removed the need for special code in gatsby-source-filesystem & means we'll solve this same problem for any other plugin which might encounter it in the future.

@KyleAMathews KyleAMathews requested a review from a team as a code owner October 6, 2018 02:06
@KyleAMathews
Copy link
Contributor Author

@gatsbyjs/core it'd be great to write tests for handling these coordination problems between nodes being created and queries being run.

@thebigredgeek it'd be great to talk to you sometime about how you solved similar problems in https://github.com/thebigredgeek/papq

@KyleAMathews
Copy link
Contributor Author

Thoughts on how to test this.

Add a new integration test site that has a fake source and transformer plugin. The source plugin emits updates to a node every 10ms and then the transformer plugin transforms the node with an artificial delay of 20ms. There's one page that queries the transformed node. The test could be starting gatsby develop and then after the source plugin has created the node a few times, it waits for the query to finish running and then quits. The test is successful if the resulting query file has the correct info.

We could also just change a markdown file directly but that's not as controllable around timings so probably not what we want.

@@ -61,8 +61,22 @@ const queue = new Queue((plObj, callback) => {
)
}, queueOptions)

// Pause running queries when new nodes are added (processing starts).
emitter.on(`CREATE_NODE`, () => {
queue.pause()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be converted to use xstate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't pause task that is already in progress, right? Just will stop any queued tasks from executing? So there is case where we still will be executing query in parallel to CREATE_NODE processing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Though in practice this is fine as most queries are sync and what's not sync is background jobs or fetching data from elsewhere, neither of which would be affectes by node transformations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but markdown processing is async (and most of reports about problem this PR solves is about markdown). That's why I was asking if we should delay dispatching CREATE_NODE action while query are executed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a query of a markdown node is in progress, it gets the markdown node syncly (this should be a word) at the start so even if a new file node immediately causes the markdown remark node to be deleted, it won't affect the in-progress query.

That being said, querying will eventually be fully async so we'll need a better solution for that. I think the current PR is fine as it's simple and solves the problem. Next action would be to add tests and then ensure we have a solution when we go async e.g. delaying API/redux work work until the current queries are finished.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I was looking in wrong spot - at the transformer level we are already shielded from this issue as node is already passed. But before that we use run-sift to actually get the node and this one is async as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR seems good as-is, just trying to see if we need to pursue further

@KyleAMathews
Copy link
Contributor Author

It could be actually that my adding xstate to gatsby-source-filesystem to only emit new File nodes when the previous work was done was a red herring and all that was necessary to do (as included in this PR) was to pause query running when a new node is created and only resume when plugin work is finished.

I should have a bit of time Monday to test this theory.

@pieh
Copy link
Contributor

pieh commented Oct 8, 2018

I'm thinking if maybe we should add some redux middleware to delay dispatching CREATE_NODE (and other node related actions) while we run queries. Plugins wouldn't need to implement their own state that hooks into gatsby internals and createNode would just be safe for them to use

@KyleAMathews
Copy link
Contributor Author

@pieh turns out it's what I suspected that pausing query running on CREATE_NODE is all that's required... which is both great as this simplifies the problem a lot and annoying as I had been chasing the wrong thing for a while haha :-D

Oh well.

Would appreciate some manual testing from someone to verify this fixes things on e.g. reactjs.org!

@pieh
Copy link
Contributor

pieh commented Oct 9, 2018

Checked reactjs.org before and after. Definitely could reproduce cannot read frontmatter of null before. Couldn't reproduce it after, so seems like it fixes it. There is no downside to getting this in, so let's merge it!

@pieh pieh merged commit 00eeef0 into master Oct 9, 2018
@pieh pieh deleted the source-file branch October 9, 2018 19:44
KyleAMathews added a commit to KyleAMathews/reactjs.org that referenced this pull request Oct 11, 2018
This includes a number of fixing including this PR which prevents errors
during development when editing markdown files gatsbyjs/gatsby#8859
gaearon pushed a commit to reactjs/react.dev that referenced this pull request Oct 11, 2018
This includes a number of fixing including this PR which prevents errors
during development when editing markdown files gatsbyjs/gatsby#8859
pieh pushed a commit that referenced this pull request Oct 26, 2018
m-allanson added a commit to Bouncey/gatsby that referenced this pull request Dec 7, 2018
Mostly this involved including the changes from gatsbyjs#8859 into file-watcher.js
gpetrioli pushed a commit to gpetrioli/gatsby that referenced this pull request Jan 22, 2019
OhIAmFine pushed a commit to OhIAmFine/zh-hans.reactjs.org that referenced this pull request May 6, 2019
This includes a number of fixing including this PR which prevents errors
during development when editing markdown files gatsbyjs/gatsby#8859
mwfrost pushed a commit to mwfrost/gatsby that referenced this pull request Apr 20, 2023
…ss node processing (gatsbyjs#8859)

* More work to prevent queries from running when there's in-progress node processing

* Remove unused code

* Turns out having query running pause on CREATE_NODE is all that's needed

* Make state machine local to each instance of the plugin
mwfrost pushed a commit to mwfrost/gatsby that referenced this pull request Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants