Refactor job scheduler to use memdb for jobs #782

radeksimko · 2022-02-03T22:02:35Z

This represents 2nd stage of the refactoring per #719

Depends on #771

Closes #719
Closes #768
Closes #775
Closes #800

Why

As described in #719 the existing logic is hard to test and requires time.Sleep in cases where dependent pieces of work are scheduled, i.e. synchronization is difficult/non-existent. This leads to a number of flakey tests, making testing of any other bug or feature more difficult.

Lack of synchronization also causes e.g. completion to use outdated data, as reported in #768 or #775 due to textDocument/completion running before textDocument/didChange had the chance to finish applying any changes to the document and re-parse anything.

New `job` package

This is to make enqueueing and consuming jobs from various places easier without running into import cycles.

type Job struct {
	// Func represents the job to execute
	Func func(ctx context.Context) error

	// Dir describes the directory which the job belongs to,
	// which is used for deduplication of queued jobs (along with Type)
	// and prioritization
	Dir document.DirHandle

	// Type describes type of the job (e.g. GetTerraformVersion),
	// which is used for deduplication of queued jobs along with Dir.
	Type string

	// Defer is a function to execute after Func is executed
	// and before the job is marked as done (StateDone).
	// This can be used to schedule jobs dependent on the main job.
	Defer DeferFunc
}

// DeferFunc represents a deferred function scheduling more jobs
// based on jobErr (any error returned from the main job).
// Newly queued job IDs should be returned to allow for synchronization.
type DeferFunc func(ctx context.Context, jobErr error) IDs

Example

id, err = w.jobStore.EnqueueJob(job.Job{
	Dir: modHandle,
	Func: func(ctx context.Context) error {
		return ParseModuleManifest(w.fs, w.modStore, dir)
	},
	Type:  op.OpTypeParseModuleManifest.String(),
	Defer: decodeCalledModulesFunc(w.fs, w.modStore, w.schemaStore, w.watcher, dir),
})

New `jobs` memdb table

A new memdb table was created to store queued/running jobs and to provide synchronization via watch channels in memdb.

type ScheduledJob struct {
	job.ID
	job.Job
	IsDirOpen bool
	State     State

	// JobErr contains error when job finishes (State = StateDone)
	JobErr error
	// DeferredJobIDs contains IDs of any deferred jobs
	// set when job finishes (State = StateDone)
	DeferredJobIDs job.IDs
}

New general-purpose `scheduler` (package)

Scheduling was previously done via moduleLoader, essentially implementation detail of ModuleManager, making everything depend on ModuleManager. moduleLoader internally also used priority queue implemented via container/heap. Queue had to be resorted on every insertion (even though no sorting took place most of the time, unless user opened/closed affected files). De-duplication was done via individual Module fields indicating the state of the operation, making the whole logic very verbose and error prone.

terraform-ls/internal/terraform/module/module_loader.go

Lines 192 to 217 in 526f22c

    
           if operationState(mod, modOp.Type) == op.OpStateQueued { 
        
           	// avoid enqueuing duplicate operation 
        
           	modOp.markAsDone() 
        
           	return nil 
        
           } 
        
           switch modOp.Type { 
        
           case op.OpTypeGetTerraformVersion: 
        
           	ml.modStore.SetTerraformVersionState(modOp.ModulePath, op.OpStateQueued) 
        
           case op.OpTypeObtainSchema: 
        
           	ml.modStore.SetProviderSchemaState(modOp.ModulePath, op.OpStateQueued) 
        
           case op.OpTypeParseModuleConfiguration: 
        
           	ml.modStore.SetModuleParsingState(modOp.ModulePath, op.OpStateQueued) 
        
           case op.OpTypeParseVariables: 
        
           	ml.modStore.SetVarsParsingState(modOp.ModulePath, op.OpStateQueued) 
        
           case op.OpTypeParseModuleManifest: 
        
           	ml.modStore.SetModManifestState(modOp.ModulePath, op.OpStateQueued) 
        
           case op.OpTypeLoadModuleMetadata: 
        
           	ml.modStore.SetMetaState(modOp.ModulePath, op.OpStateQueued) 
        
           case op.OpTypeDecodeReferenceTargets: 
        
           	ml.modStore.SetReferenceTargetsState(modOp.ModulePath, op.OpStateQueued) 
        
           case op.OpTypeDecodeReferenceOrigins: 
        
           	ml.modStore.SetReferenceOriginsState(modOp.ModulePath, op.OpStateQueued) 
        
           case op.OpTypeDecodeVarsReferences: 
        
           	ml.modStore.SetVarsReferenceOriginsState(modOp.ModulePath, op.OpStateQueued) 
        
           }

Instead each pending job now has its own entry in the new jobs memdb table, allowing for more efficient querying when checking for duplicate jobs and distinguishing between open and closed directory.

Relatedly, a (mostly) general purpose scheduler was created to simply execute arbitrary Funcs above Dirs and possibly execute any Defered functions (dependent jobs).

Removal of module manager

Module manager is no more as the individual responsibilities were split between scheduler + jobs memdb table.

In the old test we would be checking what *all* schemas are available for a given path after indexing via walker. This part of API is however being deprecated in favour of more straightforward one (see state.ProviderSchemaStore -> ProviderSchema()) which picks the single most appropriate schema from a list of candidates and retains the decision logic as an implementation detail within it, so the whole list of candidates is not really available from the outside (by design), hence not something we should test from the outside. ProviderSchemaStore itself already has plenty of tests testing the decision logic within, which is better place for testing this anyway.

jpogran

Works on my machine!

dbanck

Awesome work!

🚢 !

internal/langserver/handlers/command/modules.go

github-actions · 2022-03-21T12:08:21Z

This functionality has been released in v0.26.0 of the language server.
If you use the official Terraform VS Code extension, it will prompt you to upgrade to this version automatically upon next launch or within the next 24 hours.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

github-actions · 2022-04-21T03:30:19Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

radeksimko added the technical-debt label Feb 3, 2022

radeksimko force-pushed the f-jobs-memdb-refactoring branch from 9fd093d to 6cca57f Compare February 4, 2022 17:55

radeksimko marked this pull request as ready for review February 4, 2022 19:12

radeksimko force-pushed the f-jobs-memdb-refactoring branch 11 times, most recently from 9a89195 to 9223958 Compare February 11, 2022 15:07

radeksimko requested a review from a team February 11, 2022 16:55

radeksimko force-pushed the f-jobs-memdb-refactoring branch 4 times, most recently from 5da77d4 to ed4aa4d Compare February 18, 2022 08:22

radeksimko force-pushed the f-jobs-memdb-refactoring branch from ed4aa4d to 7f558b9 Compare February 22, 2022 13:56

radeksimko added 8 commits February 22, 2022 15:10

internal/job: Introduce Job & related types

606cd0b

internal/state: Introduce new 'jobs' table

c18b99e

internal/scheduler: Introduce general-purpose job scheduler

14421ca

internal/terraform/module(watcher+walker): Update references

4fe2b54

internal/langserver/handlers: Update references

58c6c70

internal/terraform/module: Remove module manager & loader & queue

84fd114

jobs benchmarks

ff634f4

radeksimko force-pushed the f-jobs-memdb-refactoring branch from 7f558b9 to ff634f4 Compare February 22, 2022 15:14

jpogran approved these changes Feb 22, 2022

View reviewed changes

radeksimko added this to the v0.26.0 milestone Feb 23, 2022

dbanck approved these changes Feb 24, 2022

View reviewed changes

internal/langserver/handlers/command/modules.go Show resolved Hide resolved

radeksimko merged commit 99e30e1 into main Feb 24, 2022

radeksimko deleted the f-jobs-memdb-refactoring branch February 24, 2022 16:52

This was referenced Feb 24, 2022

Auto completion doesn't work on Mac #768

Closed

Autocompletion is slow with all 0.25 versions on Windows #775

Closed

Formatting extremely slow #800

Closed

fix: decode all submodules with the right path #810

Merged

github-actions bot locked as resolved and limited conversation to collaborators Apr 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor job scheduler to use memdb for jobs #782

Refactor job scheduler to use memdb for jobs #782

radeksimko commented Feb 3, 2022 •

edited

Loading

jpogran left a comment

dbanck left a comment

github-actions bot commented Mar 21, 2022

github-actions bot commented Apr 21, 2022

	if operationState(mod, modOp.Type) == op.OpStateQueued {
	// avoid enqueuing duplicate operation
	modOp.markAsDone()
	return nil
	}

	switch modOp.Type {
	case op.OpTypeGetTerraformVersion:
	ml.modStore.SetTerraformVersionState(modOp.ModulePath, op.OpStateQueued)
	case op.OpTypeObtainSchema:
	ml.modStore.SetProviderSchemaState(modOp.ModulePath, op.OpStateQueued)
	case op.OpTypeParseModuleConfiguration:
	ml.modStore.SetModuleParsingState(modOp.ModulePath, op.OpStateQueued)
	case op.OpTypeParseVariables:
	ml.modStore.SetVarsParsingState(modOp.ModulePath, op.OpStateQueued)
	case op.OpTypeParseModuleManifest:
	ml.modStore.SetModManifestState(modOp.ModulePath, op.OpStateQueued)
	case op.OpTypeLoadModuleMetadata:
	ml.modStore.SetMetaState(modOp.ModulePath, op.OpStateQueued)
	case op.OpTypeDecodeReferenceTargets:
	ml.modStore.SetReferenceTargetsState(modOp.ModulePath, op.OpStateQueued)
	case op.OpTypeDecodeReferenceOrigins:
	ml.modStore.SetReferenceOriginsState(modOp.ModulePath, op.OpStateQueued)
	case op.OpTypeDecodeVarsReferences:
	ml.modStore.SetVarsReferenceOriginsState(modOp.ModulePath, op.OpStateQueued)
	}

Refactor job scheduler to use memdb for jobs #782

Refactor job scheduler to use memdb for jobs #782

Conversation

radeksimko commented Feb 3, 2022 • edited Loading

Why

New job package

Example

New jobs memdb table

New general-purpose scheduler (package)

Removal of module manager

jpogran left a comment

Choose a reason for hiding this comment

dbanck left a comment

Choose a reason for hiding this comment

github-actions bot commented Mar 21, 2022

github-actions bot commented Apr 21, 2022

radeksimko commented Feb 3, 2022 •

edited

Loading

New `job` package

New `jobs` memdb table

New general-purpose `scheduler` (package)