indexing runs out of memory for large projects #1219

martinlippert · 2024-04-03T13:43:44Z

The indexing infrastructure is running out of memory when indexing projects with a large number of source files (as reported in #1212).

We need to improve the implementation to reduce the overall memory consumption, especially to decouple the memory consumption from the size of the project or the number of projects being parsed.

Step 1: we need to chunk the set of source files into well-defined smaller chunks in order to allow the garbage collection to free up while indexing.

Step 2: we need to cleanup the lookup environment of the parser after each parsing attempt in order to avoid leaking memory or keeping zip files open.

martinlippert · 2024-04-03T13:46:11Z

Inviting @licam to this issue in order to provide additional feedback and test early builds, once available.

…o smaller chunks to reduceo overall memory needs

… after bulk parsing to close zip files and free up memory

…s and arrays all the time + reusing common sets instead of creating new set objects all the time

martinlippert · 2024-04-04T15:25:26Z

@licam The latest pre-release builds for VSCode should already contain a few early optimizations. Would be interesting to hear whether that runs any better in your environment and with your large projects. You can switch to the pre-release in VSCode directly when you click on the Spring Boot Tools entry in the list extensions, and then switch to pre-release.

martinlippert · 2024-04-09T08:02:10Z

Here are some early rough results, measuring the progress here (using my sample project):

Version 1.53.0 is able to:

parse projects with 6.500 source code files
generate 100.000 symbols

Version 1.54.0 is able to:

parse projects with 65.000 source code files
generate 1.000.000 symbols

Both measurements used the default max heap setting of 512m for the language server process.
This is a 10x improvement, so quite a good step forward here, I think.

The exact numbers will vary quite a bit, depending on the size of the individual source code files and the number of symbols generated for the concrete project, of course.

If you have larger projects that this, you have to increase the heap space for the language server.

licam · 2024-04-09T08:08:37Z

@martinlippert Sounds promising. We will test and adapt the new version once it will be released. Thank you!

martinlippert added type: enhancement theme: spring index & symbols for: eclipse something that is specific for Eclipse for: vscode something that is specific for VSCode labels Apr 3, 2024

martinlippert added this to the 4.22.1.RELEASE milestone Apr 3, 2024

martinlippert self-assigned this Apr 3, 2024

martinlippert mentioned this issue Apr 3, 2024

Set the vmArg HeapDumpOnOutOfMemoryError to be optional #1212

Closed

martinlippert added a commit that referenced this issue Apr 3, 2024

GH-1219: java indexer now splits set of java source files to scan int…

0efe8cd

…o smaller chunks to reduceo overall memory needs

martinlippert added a commit that referenced this issue Apr 4, 2024

GH-1219: preparing the AST parsing for environment cleanup activities…

9c30e6c

… after bulk parsing to close zip files and free up memory

martinlippert added a commit that referenced this issue Apr 4, 2024

GH-1219: another small optimization to avoid converting supertype set…

88ae931

…s and arrays all the time + reusing common sets instead of creating new set objects all the time

martinlippert closed this as completed Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

indexing runs out of memory for large projects #1219

indexing runs out of memory for large projects #1219

martinlippert commented Apr 3, 2024

martinlippert commented Apr 3, 2024

martinlippert commented Apr 4, 2024

martinlippert commented Apr 9, 2024 •

edited

licam commented Apr 9, 2024

indexing runs out of memory for large projects #1219

indexing runs out of memory for large projects #1219

Comments

martinlippert commented Apr 3, 2024

martinlippert commented Apr 3, 2024

martinlippert commented Apr 4, 2024

martinlippert commented Apr 9, 2024 • edited

licam commented Apr 9, 2024

martinlippert commented Apr 9, 2024 •

edited