indexer/walker: Avoid running jobs where not needed #1006

radeksimko · 2022-07-19T09:35:15Z

Depends on #1027

Background

The removal of de-duplication upon job enqueuing is necessary to avoid various bugs and race conditions where we'd deduplicate jobs which appear to be same, but have different Defer or DependsOn, i.e. we'd be overly aggressive in the deduplication efforts.

It is worth clarifying here that this PR doesn't affect jobs which run as part of text synchronization - i.e. textDocument/didOpen, textDocument/didChange or workspace/didChangeWatchedFiles. It only aims to reduce the duplicated work as part of the walker (which is triggered by initialize), which would occur if the user has deep workspace which takes a while to index via walker, and opens a few (yet unindexed) modules, which get indexed as a priority.

For textDocument/didChange and workspace/didChangeWatchedFiles it is expected that the jobs do need to run since we know something has changed. We could attempt to do some de-duplication there as well, but it's probably going to have low impact.

For textDocument/didOpen - we treat any opened document as a change currently, because we have no way to tell whether it matches the contents on disk. This is IMO a common case, where user would start without any open files, walker indexes everything, and then they open any module and we re-index the whole module again, and we do it again for every single file they open. Reducing duplicated work here is just little more involved, so I filed a separate ticket for that: #1031

Benchmarks

I am (finally) updating the benchmarks as part of the PR, which may suggest that this PR has negative performance impact, but it's little more complicated as we're really reflecting a few PRs, each contributing to those numbers in slightly different way:

walker: Index uninitialized modules #997 increased the total number of jobs in most cases
Parse provider versions from lock file before obtaining schema #1014 reduced time as we no longer run terraform providers schema -json when it's not necessary. Previously we'd run it on every indexed & initialized module.
- This impacts cases of deep workspace hierarchies with initialized modules which use the same provider versions, which also makes more effective use of the embedded schemas now.
Pick core schema only based on required_version constraint #1027 removed 1 job (terraform version) entirely from the walker indexing. It now runs only on the text synchronization methods.

dbanck

Nice work and thanks for the detailed PR description! 👍

I've found just two minor things

internal/indexer/walker.go

internal/indexer/watcher.go

radeksimko · 2022-08-10T13:43:54Z

@dbanck I found a slightly different solution, although the difference is really minor given how rarely (if ever) scheduling of the job would fail - this would have something to do with memdb or the underlying indexes. So if/when that happens, it's more likely we'd be unable to schedule any/all jobs, so most of this conditional logic will likely never come into play.

PTAL

dbanck · 2022-08-10T14:07:24Z

(not meant as part of this PR)

Maybe we could avoid all the checks when creating jobs and continue on empty job IDs here:

terraform-ls/internal/state/jobs.go

Lines 66 to 75 in a2fa3a0

    
           dependsOn := make(job.IDs, 0) 
        
           for _, jobId := range newJob.DependsOn { 
        
           	isDone, err := js.isJobDone(txn, jobId) 
        
           	if err != nil { 
        
           		return "", err 
        
           	} 
        
           	if !isDone { 
        
           		dependsOn = append(dependsOn, jobId) 
        
           	} 
        
           }

radeksimko · 2022-08-10T14:24:12Z

Maybe we could avoid all the checks when creating jobs and continue on empty job IDs here

Hmm, I like that idea! 🤔

github-actions · 2022-09-10T03:48:47Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

radeksimko added the enhancement New feature or request label Jul 19, 2022

radeksimko added this to the v0.29.0 milestone Jul 19, 2022

radeksimko force-pushed the f-job-ignore-state branch 2 times, most recently from 7cf644d to 044747d Compare July 19, 2022 11:42

radeksimko self-assigned this Jul 19, 2022

This was referenced Aug 1, 2022

state: Introduce DependsOn for N-to-1 job dependencies #1021

Merged

Parse provider versions from lock file before obtaining schema #1014

Merged

radeksimko force-pushed the f-job-ignore-state branch 7 times, most recently from e27d3d0 to e774671 Compare August 9, 2022 11:10

radeksimko mentioned this pull request Aug 9, 2022

Avoid parsing files if content has not changed #1031

Open

radeksimko marked this pull request as ready for review August 9, 2022 13:01

radeksimko requested a review from a team as a code owner August 9, 2022 13:01

radeksimko changed the title ~~indexer: Avoid running jobs where not needed~~ indexer/walker: Avoid running jobs where not needed Aug 9, 2022

radeksimko added 6 commits August 9, 2022 15:07

indexer cleanup: Replace Defer with DependsOn

ae14c64

Introduce StateIgnore & plumb context through

e71a213

state: Avoid deduplicating jobs on enqueuing

1ac6cae

indexer: use IgnoreState

15f0482

ci: bump benchmarks total timeout

0111077

update benchmarks thresholds

d425b9a

radeksimko force-pushed the f-job-ignore-state branch from e774671 to d425b9a Compare August 9, 2022 14:08

dbanck requested changes Aug 10, 2022

View reviewed changes

internal/indexer/walker.go Outdated Show resolved Hide resolved

internal/indexer/watcher.go Outdated Show resolved Hide resolved

indexer: run ObtainSchema even if scheduling of an upstream job fails

a2fa3a0

radeksimko requested a review from dbanck August 10, 2022 13:43

dbanck approved these changes Aug 10, 2022

View reviewed changes

radeksimko merged commit a12c9af into main Aug 10, 2022

radeksimko deleted the f-job-ignore-state branch August 10, 2022 14:24

radeksimko mentioned this pull request Aug 18, 2022

vsCode not formatting Terraform files hashicorp/vscode-terraform#1206

Closed

github-actions bot locked as resolved and limited conversation to collaborators Sep 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

indexer/walker: Avoid running jobs where not needed #1006

indexer/walker: Avoid running jobs where not needed #1006

radeksimko commented Jul 19, 2022 •

edited

Loading

dbanck left a comment

radeksimko commented Aug 10, 2022

dbanck commented Aug 10, 2022

radeksimko commented Aug 10, 2022

github-actions bot commented Sep 10, 2022

indexer/walker: Avoid running jobs where not needed #1006

indexer/walker: Avoid running jobs where not needed #1006

Conversation

radeksimko commented Jul 19, 2022 • edited Loading

Background

Benchmarks

dbanck left a comment

Choose a reason for hiding this comment

radeksimko commented Aug 10, 2022

dbanck commented Aug 10, 2022

radeksimko commented Aug 10, 2022

github-actions bot commented Sep 10, 2022

radeksimko commented Jul 19, 2022 •

edited

Loading