Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexer/walker: Avoid running jobs where not needed #1006

Merged
merged 7 commits into from
Aug 10, 2022

Conversation

radeksimko
Copy link
Member

@radeksimko radeksimko commented Jul 19, 2022

Closes #1002

Depends on #1027


Background

The removal of de-duplication upon job enqueuing is necessary to avoid various bugs and race conditions where we'd deduplicate jobs which appear to be same, but have different Defer or DependsOn, i.e. we'd be overly aggressive in the deduplication efforts.

It is worth clarifying here that this PR doesn't affect jobs which run as part of text synchronization - i.e. textDocument/didOpen, textDocument/didChange or workspace/didChangeWatchedFiles. It only aims to reduce the duplicated work as part of the walker (which is triggered by initialize), which would occur if the user has deep workspace which takes a while to index via walker, and opens a few (yet unindexed) modules, which get indexed as a priority.

For textDocument/didChange and workspace/didChangeWatchedFiles it is expected that the jobs do need to run since we know something has changed. We could attempt to do some de-duplication there as well, but it's probably going to have low impact.

For textDocument/didOpen - we treat any opened document as a change currently, because we have no way to tell whether it matches the contents on disk. This is IMO a common case, where user would start without any open files, walker indexes everything, and then they open any module and we re-index the whole module again, and we do it again for every single file they open. Reducing duplicated work here is just little more involved, so I filed a separate ticket for that: #1031

Benchmarks

I am (finally) updating the benchmarks as part of the PR, which may suggest that this PR has negative performance impact, but it's little more complicated as we're really reflecting a few PRs, each contributing to those numbers in slightly different way:

@radeksimko radeksimko added the enhancement New feature or request label Jul 19, 2022
@radeksimko radeksimko added this to the v0.29.0 milestone Jul 19, 2022
@radeksimko radeksimko force-pushed the f-job-ignore-state branch 2 times, most recently from 7cf644d to 044747d Compare July 19, 2022 11:42
@radeksimko radeksimko self-assigned this Jul 19, 2022
@radeksimko radeksimko force-pushed the f-job-ignore-state branch 7 times, most recently from e27d3d0 to e774671 Compare August 9, 2022 11:10
@radeksimko radeksimko marked this pull request as ready for review August 9, 2022 13:01
@radeksimko radeksimko requested a review from a team as a code owner August 9, 2022 13:01
@radeksimko radeksimko changed the title indexer: Avoid running jobs where not needed indexer/walker: Avoid running jobs where not needed Aug 9, 2022
Copy link
Member

@dbanck dbanck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work and thanks for the detailed PR description! 👍

I've found just two minor things

internal/indexer/walker.go Outdated Show resolved Hide resolved
internal/indexer/watcher.go Outdated Show resolved Hide resolved
@radeksimko
Copy link
Member Author

@dbanck I found a slightly different solution, although the difference is really minor given how rarely (if ever) scheduling of the job would fail - this would have something to do with memdb or the underlying indexes. So if/when that happens, it's more likely we'd be unable to schedule any/all jobs, so most of this conditional logic will likely never come into play.

PTAL

@radeksimko radeksimko requested a review from dbanck August 10, 2022 13:43
@dbanck
Copy link
Member

dbanck commented Aug 10, 2022

(not meant as part of this PR)

Maybe we could avoid all the checks when creating jobs and continue on empty job IDs here:

dependsOn := make(job.IDs, 0)
for _, jobId := range newJob.DependsOn {
isDone, err := js.isJobDone(txn, jobId)
if err != nil {
return "", err
}
if !isDone {
dependsOn = append(dependsOn, jobId)
}
}

@radeksimko
Copy link
Member Author

Maybe we could avoid all the checks when creating jobs and continue on empty job IDs here

Hmm, I like that idea! 🤔

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

terraform/module: Avoid running a job if nothing has changed
2 participants