Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gosrc2cpg] Partial gosrc2cpg frontend perforamnce optimisations #4668

Merged
merged 5 commits into from
Jun 18, 2024

Conversation

pandurangpatil
Copy link
Contributor

  1. Changed parsing of AST json inside AST Creator itself.
  2. Made changes in download dependency processing to generate AST only for the used packages and subsequently only processing used packages. There was a change required inside goastgen utility to support the -include input option.

TODO:
At this moment AstCreator gets instantiated while first-level processing for building the cache and it stays in memory till AstCreationPass is done. In subsequent changes, we will split the processing in such a way that we create AstCreator for that pass and destroy it once it's used. Instantiate it again for AstCreationPass.

The above change is done as part of the larger change.

1. Changed parsing of AST json inside AST Creator itself.
2. Made changes in download dependency processing to generate AST only
for the used packages and subsequently only processing used packages.
There was a change required inside `goastgen` utility to support the
`-include` input option.

TODO:
At this moment `AstCreator` gets instantiated while first-level
processing for building the cache and it stays in memory till
`AstCreationPass` is done. In subsequent changes, we will split the
processing in such a way that we create `AstCreator` for that pass and
destroy it once it's used. Instantiate it again for `AstCreationPass`.

The above change is done as part of the larger change.
astGenResult.parsedModFile.flatMap(modFile => GoAstJsonParser.readModFile(Paths.get(modFile)).map(x => x))
)
new MethodAndTypeCacheBuilderPass(None, astGenResult.parsedFiles, depConfig, goMod, goGlobal).process()
private class DependencyProcessorQueue extends Runnable {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidBakerEffendi This dependency processor queue I created to start the processing of downloaded dependency in parallel to downloading other dependencies. Earlier, it was downloading one dependency process it and then pickup another dependency.

NOTE: We cannot parallelise downloading dependency as we use go get <depdency> command, which can be executed one at a time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we would have to download directly from the repository otherwise, but this is fine, we don't want to put too much strain on the dependency server either as some have DDoS protection

@DavidBakerEffendi DavidBakerEffendi merged commit 53c2c77 into joernio:master Jun 18, 2024
5 checks passed
@DavidBakerEffendi DavidBakerEffendi added performance Improves the performance of Joern go Relates to gosrc2cpg labels Jun 18, 2024
pandurangpatil added a commit to Privado-Inc/joern that referenced this pull request Jul 26, 2024
1. `goastgen` version upgrade, which fixes exclusion rule handling for
path as well and graceful handling of mod file parsing in the even
module name is not present. (Earlier changes to include packages with
include regex, didn't filter all the packages and had a limitation to
include all sub folders/packages if root package is being used. However,
that wasn't the case. Hence, we introduced a mechanism to check the
entire package folder path with goastgen with a separate flag
-includePacakges and updated integration.)
2. Changed parsing of AST json inside AST Creator itself.
3. Made changes in download dependency processing to generate AST only
for the used packages and subsequently only processing used packages.
There was a change required inside `goastgen` utility to support the
`-include` input option.

Restructured the code to isolate the common handling for main source
code pre-processing to build the cache, downloaded dependencies and AST
creation.

With this change, we now have created separate traits which handle, main
source code pre-processing, processing of downloaded dependencies for
caching and AST Creation along with their respective passes.

With this, we get the following benefits.

1. Clarity in the code as the processing is now isolated in different
passes and traits.
2. Earlier we held the AstCreator instances for each file at the
beginning of CPG generation while pre-processing the AST for building
the metadata cache which can be used and referred to by the AST creator
to resolve the types. Resuing that instance at the last stage inside
AstCreationPass. Which blocks large amounts of memory which is
noticeable with a big code base. With this change, we are creating an
AstCreator instance and destroying it once the pass is done with its
work.

Earlier, if we pass the directory path which contains multiple go
modules. Processing was done with all the .go files mapped to the single
go.mod file.

With this change, we have segregated the processing by first isolating
all the files mapped to respective go.mod. This will make sure to clean
up the memory footprint after every module is being processed. However,
this will increase the processing time when used with a download
dependency as it will process all the go.mod files by identifying and
processing used dependencies.

tests covering multiple module scenarios

Original PRS

joernio#4724
joernio#4703
joernio#4678
joernio#4668
joernio#4475
pandurangpatil added a commit to Privado-Inc/joern that referenced this pull request Jul 30, 2024
* Go frontend memory optimisation

1. `goastgen` version upgrade, which fixes exclusion rule handling for
path as well and graceful handling of mod file parsing in the even
module name is not present. (Earlier changes to include packages with
include regex, didn't filter all the packages and had a limitation to
include all sub folders/packages if root package is being used. However,
that wasn't the case. Hence, we introduced a mechanism to check the
entire package folder path with goastgen with a separate flag
-includePacakges and updated integration.)
2. Changed parsing of AST json inside AST Creator itself.
3. Made changes in download dependency processing to generate AST only
for the used packages and subsequently only processing used packages.
There was a change required inside `goastgen` utility to support the
`-include` input option.

Restructured the code to isolate the common handling for main source
code pre-processing to build the cache, downloaded dependencies and AST
creation.

With this change, we now have created separate traits which handle, main
source code pre-processing, processing of downloaded dependencies for
caching and AST Creation along with their respective passes.

With this, we get the following benefits.

1. Clarity in the code as the processing is now isolated in different
passes and traits.
2. Earlier we held the AstCreator instances for each file at the
beginning of CPG generation while pre-processing the AST for building
the metadata cache which can be used and referred to by the AST creator
to resolve the types. Resuing that instance at the last stage inside
AstCreationPass. Which blocks large amounts of memory which is
noticeable with a big code base. With this change, we are creating an
AstCreator instance and destroying it once the pass is done with its
work.

Earlier, if we pass the directory path which contains multiple go
modules. Processing was done with all the .go files mapped to the single
go.mod file.

With this change, we have segregated the processing by first isolating
all the files mapped to respective go.mod. This will make sure to clean
up the memory footprint after every module is being processed. However,
this will increase the processing time when used with a download
dependency as it will process all the go.mod files by identifying and
processing used dependencies.

tests covering multiple module scenarios

Original PRS

joernio#4724
joernio#4703
joernio#4678
joernio#4668
joernio#4475

* compilation error fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go Relates to gosrc2cpg performance Improves the performance of Joern
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants