-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[gosrc2cpg] Partial gosrc2cpg frontend perforamnce optimisations #4668
[gosrc2cpg] Partial gosrc2cpg frontend perforamnce optimisations #4668
Conversation
1. Changed parsing of AST json inside AST Creator itself. 2. Made changes in download dependency processing to generate AST only for the used packages and subsequently only processing used packages. There was a change required inside `goastgen` utility to support the `-include` input option. TODO: At this moment `AstCreator` gets instantiated while first-level processing for building the cache and it stays in memory till `AstCreationPass` is done. In subsequent changes, we will split the processing in such a way that we create `AstCreator` for that pass and destroy it once it's used. Instantiate it again for `AstCreationPass`. The above change is done as part of the larger change.
astGenResult.parsedModFile.flatMap(modFile => GoAstJsonParser.readModFile(Paths.get(modFile)).map(x => x)) | ||
) | ||
new MethodAndTypeCacheBuilderPass(None, astGenResult.parsedFiles, depConfig, goMod, goGlobal).process() | ||
private class DependencyProcessorQueue extends Runnable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DavidBakerEffendi This dependency processor queue I created to start the processing of downloaded dependency in parallel to downloading other dependencies. Earlier, it was downloading one dependency process it and then pickup another dependency.
NOTE: We cannot parallelise downloading dependency as we use go get <depdency>
command, which can be executed one at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we would have to download directly from the repository otherwise, but this is fine, we don't want to put too much strain on the dependency server either as some have DDoS protection
1. `goastgen` version upgrade, which fixes exclusion rule handling for path as well and graceful handling of mod file parsing in the even module name is not present. (Earlier changes to include packages with include regex, didn't filter all the packages and had a limitation to include all sub folders/packages if root package is being used. However, that wasn't the case. Hence, we introduced a mechanism to check the entire package folder path with goastgen with a separate flag -includePacakges and updated integration.) 2. Changed parsing of AST json inside AST Creator itself. 3. Made changes in download dependency processing to generate AST only for the used packages and subsequently only processing used packages. There was a change required inside `goastgen` utility to support the `-include` input option. Restructured the code to isolate the common handling for main source code pre-processing to build the cache, downloaded dependencies and AST creation. With this change, we now have created separate traits which handle, main source code pre-processing, processing of downloaded dependencies for caching and AST Creation along with their respective passes. With this, we get the following benefits. 1. Clarity in the code as the processing is now isolated in different passes and traits. 2. Earlier we held the AstCreator instances for each file at the beginning of CPG generation while pre-processing the AST for building the metadata cache which can be used and referred to by the AST creator to resolve the types. Resuing that instance at the last stage inside AstCreationPass. Which blocks large amounts of memory which is noticeable with a big code base. With this change, we are creating an AstCreator instance and destroying it once the pass is done with its work. Earlier, if we pass the directory path which contains multiple go modules. Processing was done with all the .go files mapped to the single go.mod file. With this change, we have segregated the processing by first isolating all the files mapped to respective go.mod. This will make sure to clean up the memory footprint after every module is being processed. However, this will increase the processing time when used with a download dependency as it will process all the go.mod files by identifying and processing used dependencies. tests covering multiple module scenarios Original PRS joernio#4724 joernio#4703 joernio#4678 joernio#4668 joernio#4475
* Go frontend memory optimisation 1. `goastgen` version upgrade, which fixes exclusion rule handling for path as well and graceful handling of mod file parsing in the even module name is not present. (Earlier changes to include packages with include regex, didn't filter all the packages and had a limitation to include all sub folders/packages if root package is being used. However, that wasn't the case. Hence, we introduced a mechanism to check the entire package folder path with goastgen with a separate flag -includePacakges and updated integration.) 2. Changed parsing of AST json inside AST Creator itself. 3. Made changes in download dependency processing to generate AST only for the used packages and subsequently only processing used packages. There was a change required inside `goastgen` utility to support the `-include` input option. Restructured the code to isolate the common handling for main source code pre-processing to build the cache, downloaded dependencies and AST creation. With this change, we now have created separate traits which handle, main source code pre-processing, processing of downloaded dependencies for caching and AST Creation along with their respective passes. With this, we get the following benefits. 1. Clarity in the code as the processing is now isolated in different passes and traits. 2. Earlier we held the AstCreator instances for each file at the beginning of CPG generation while pre-processing the AST for building the metadata cache which can be used and referred to by the AST creator to resolve the types. Resuing that instance at the last stage inside AstCreationPass. Which blocks large amounts of memory which is noticeable with a big code base. With this change, we are creating an AstCreator instance and destroying it once the pass is done with its work. Earlier, if we pass the directory path which contains multiple go modules. Processing was done with all the .go files mapped to the single go.mod file. With this change, we have segregated the processing by first isolating all the files mapped to respective go.mod. This will make sure to clean up the memory footprint after every module is being processed. However, this will increase the processing time when used with a download dependency as it will process all the go.mod files by identifying and processing used dependencies. tests covering multiple module scenarios Original PRS joernio#4724 joernio#4703 joernio#4678 joernio#4668 joernio#4475 * compilation error fixes
goastgen
utility to support the-include
input option.TODO:
At this moment
AstCreator
gets instantiated while first-level processing for building the cache and it stays in memory tillAstCreationPass
is done. In subsequent changes, we will split the processing in such a way that we createAstCreator
for that pass and destroy it once it's used. Instantiate it again forAstCreationPass
.The above change is done as part of the larger change.