Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory crash pyright-langserver #3239

Closed
m-novikov opened this issue Mar 23, 2022 · 28 comments
Closed

Out of memory crash pyright-langserver #3239

m-novikov opened this issue Mar 23, 2022 · 28 comments
Labels
addressed in next version Issue is fixed and will appear in next published version bug Something isn't working

Comments

@m-novikov
Copy link

m-novikov commented Mar 23, 2022

Describe the bug

Traceback

pyright-langserver crashes when heap allocation exceeds 2GB.

[ERROR][2022-03-23 13:16:03] .../vim/lsp/rpc.lua:420    "rpc"   "pyright-langserver"    "stderr"    "\n<--- Last few GCs --->\n\n[218990:0x60e5f90]   111947 ms: Mark-sweep (reduce) 2016.8 (2082.9) -> 2015.4 (2082.7) MB, 3144.2 / 0.0 ms  (average mu = 0.123, current mu = 0.004) allocation failure scavenge might not succeed\n[218990:0x60e5f90]   115108 ms: Mark-sweep (reduce) 2016.6 (2082.7) -> 2015.7 (2082.9) MB, 3149.6 / 0.0 ms  (average mu = 0.067, current mu = 0.004) allocation failure scavenge might not succeed\n\n\n<--- JS stacktrace --->\n\nFATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory\n"
[ERROR][2022-03-23 13:16:03] .../vim/lsp/rpc.lua:420    "rpc"   "pyright-langserver"    "stderr"    " 1: 0xb17ec0 node::Abort() [node]\n"
[ERROR][2022-03-23 13:16:03] .../vim/lsp/rpc.lua:420    "rpc"   "pyright-langserver"    "stderr"    " 2: 0xa341f4 node::FatalError(char const*, char const*) [node]\n"
[ERROR][2022-03-23 13:16:03] .../vim/lsp/rpc.lua:420    "rpc"   "pyright-langserver"    "stderr"    " 3: 0xcfe71e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]\n"
[ERROR][2022-03-23 13:16:03] .../vim/lsp/rpc.lua:420    "rpc"   "pyright-langserver"    "stderr"    " 4: 0xcfea97 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]\n"
[ERROR][2022-03-23 13:16:03] .../vim/lsp/rpc.lua:420    "rpc"   "pyright-langserver"    "stderr"    " 5: 0xee8d35  [node]\n"
[ERROR][2022-03-23 13:16:03] .../vim/lsp/rpc.lua:420    "rpc"   "pyright-langserver"    "stderr"    " 6: 0xee987c  [node]\n"
[ERROR][2022-03-23 13:16:03] .../vim/lsp/rpc.lua:420    "rpc"   "pyright-langserver"    "stderr"    " 7: 0xef77b1 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]\n"
[ERROR][2022-03-23 13:16:03] .../vim/lsp/rpc.lua:420    "rpc"   "pyright-langserver"    "stderr"    " 8: 0xefad0c v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]\n"
[ERROR][2022-03-23 13:16:03] .../vim/lsp/rpc.lua:420    "rpc"   "pyright-langserver"    "stderr"    " 9: 0xec72bb v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]\n"
[ERROR][2022-03-23 13:16:03] .../vim/lsp/rpc.lua:420    "rpc"   "pyright-langserver"    "stderr"    "10: 0x123052b v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]\n"
[ERROR][2022-03-23 13:16:03] .../vim/lsp/rpc.lua:420    "rpc"   "pyright-langserver"    "stderr"    "11: 0x16147d9  [node]\n"

I think it happens after this commit c56d750

It uses process.memoryUsage().rss as an upper limit, which probably should be v8.getHeapStatistics().heap_size_limit I am not sure consulted this SO question

To Reproduce
Big enough project.
You can try forcing this issue using export NODE_OPTIONS="--max-old-space-size=1024"

Expected behavior
Crash doesn't occur, cache discarded before critical threshold is reached.

Additional context
I use this language server in neovim, so startup method probably differs from vscode extension.

@erictraut
Copy link
Collaborator

The heap can grow up to the resident size limit, which is why it's appropriate to use rss rather than heap_size_limit.

Which version of node do you have installed? Is it relatively recent? Which OS platform? And is it a 32-bit processor architecture or 64-bit?

@erictraut erictraut added the question Further information is requested label Mar 23, 2022
@m-novikov
Copy link
Author

m-novikov commented Mar 23, 2022

I have node 16.3 on 64-bit linux system.

I tired starting server with pyright-langserver --max-old-space-size=3072 --stdio and for some reason argument is not honored, only way to pass it to node is via env variable.

For me rss grows dynamically while total_available_size and heap_size_limit are static values inferred from the --max-old-space-size as far as I understood. When rss exceeds the total_available_size threshold crash occurs.

Here is the patch that worked for me. Note that comparing heapUsed with heap_size_limit also didn't work because my project memory profile was such that while heap consumed 1.2GB, but rss exceeded 2GB threshold and crashed node.

diff --git a/packages/pyright-internal/src/analyzer/program.ts b/packages/pyright-internal/src/analyzer/program.ts
index aabf36a5..08c0a18b 100644
--- a/packages/pyright-internal/src/analyzer/program.ts
+++ b/packages/pyright-internal/src/analyzer/program.ts
@@ -85,6 +85,7 @@ import { createTypeEvaluatorWithTracker } from './typeEvaluatorWithTracker';
 import { PrintTypeFlags } from './typePrinter';
 import { Type } from './types';
 import { TypeStubWriter } from './typeStubWriter';
+import * as v8 from 'v8';
 
 const _maxImportDepth = 256;
 
@@ -2106,10 +2107,10 @@ export class Program {
         // drop immediately after we empty the cache due to garbage collection timing.
         if (typeCacheSize > 750000 || this._parsedFileCount > 1000) {
             const memoryUsage = process.memoryUsage();
-
+            const heapStats = v8.getHeapStatistics();
             // If we use more than 90% of the available heap size, avoid a crash
             // by emptying the type cache.
-            if (memoryUsage.heapUsed > memoryUsage.rss * 0.9) {
+            if (memoryUsage.rss > heapStats.total_available_size * 0.9) {
                 const heapSizeInMb = Math.round(memoryUsage.rss / (1024 * 1024));
                 const heapUsageInMb = Math.round(memoryUsage.heapUsed / (1024 * 1024));

@erictraut erictraut added bug Something isn't working and removed question Further information is requested labels Mar 23, 2022
@erictraut
Copy link
Collaborator

Yeah, I think your patch makes sense. I've incorporated it, and this will be included in the next release.

@erictraut erictraut added the addressed in next version Issue is fixed and will appear in next published version label Mar 27, 2022
@m-novikov
Copy link
Author

I tested it with patch provided in 43b7459 commit. But it still crashes as RSS reaches threshold while heap it still sufficiently small to not trigger cleanup.
For example: RSS - 2Gb (Heap 1.2GB) - Crash occurs
#3266

@erictraut erictraut removed the addressed in next version Issue is fixed and will appear in next published version label Mar 30, 2022
@tobiasdiez
Copy link

For the record, we experience some similar issues when running pyright in a github actions workflow

Loading configuration file at /__w/sagetrac-mirror/sagetrac-mirror/pyrightconfig.json
Assuming Python platform Linux
stubPath /__w/sagetrac-mirror/sagetrac-mirror/typings is not a valid directory.
Searching for source files
Found 2390 source files
Emptying type cache to avoid heap overflow. Used 1887MB out of 2096MB
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

<--- Last few GCs --->

[22360:0x30eaef0]   271756 ms: Scavenge 1976.3 (2063.7) -> 1961.7 (2065.2) MB, 17.3 / 0.0 ms  (average mu = 0.216, current mu = 0.202) allocation failure 
[22360:0x30eaef0]   271893 ms: Scavenge 1977.5 (2065.2) -> 1962.4 (2065.4) MB, 15.1 / 0.0 ms  (average mu = 0.216, current mu = 0.202) allocation failure 
[22360:0x30eaef0]   271987 ms: Scavenge 1978.6 (2065.9) -> 1963.1 (2066.2) MB, 15.5 / 0.0 ms  (average mu = 0.216, current mu = 0.202) allocation failure 


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x140dff9]
Security context: 0x2392523808d1 <JSObject>
    1: push [0x23925239a281](this=0x2065088444e9 <JSArray[0]>,0x206508844721 <Object map = 0x327c4c398949>)
    2: /* anonymous */(aka /* anonymous */) [0x206508844509] [/__t/node/12.22.10/x64/lib/node_modules/pyright/dist/pyright-internal.js:1] [bytecode=0x9af9dd20649 offset=51](this=0x1dafbe8804b1 <undefined>,0x186a2fc428e1 <Object map = 0xa27353f7349>)
    3: ...

 1: 0xa1a640 node::Abort() [node]
 2: 0xa1aa4c node::OnFatalError(char const*, char const*) [node]
 3: 0xb9a9fe v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xb9ad79 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xd57ff5  [node]
 6: 0xd58686 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node]
 7: 0xd64f45 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
 8: 0xd65df5 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 9: 0xd688ac v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
10: 0xd2f2cb v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
11: 0x107189e v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]
12: 0x140dff9  [node]
Aborted (core dumped)

See, e.g., https://github.com/sagemath/sagetrac-mirror/runs/5779736990?check_suite_focus=true

@tobiasdiez
Copy link

I think the problem we encounter is different to the one @m-novikov experiences. For us, they first occurred with pyright@1.1.233, while 232. In fact, we have two runs of pyright on the same code that only differ in the (automatic) upgrade 232 > 233, see
https://github.com/sagemath/sage/runs/5710759985?check_suite_focus=true (works)
https://github.com/sagemath/sage/runs/5710763334?check_suite_focus=true (fails)

And, in fact, after downgrading to 232 everything works again. Hope that helps.

@erictraut
Copy link
Collaborator

@tobiasdiez, that's very strange. The code that manages heap space in pyright was unmodified for more than a year. I only recently made a change in 1.1.234 to try to address the problem @m-novikov reported above. I've since reverted that change because I think it was incorrect, so version 1.1.235 (once I publish it) will revert back to the older (pre-1.1.234) behavior. Are you sure that you didn't see the problem with 1.1.234? I can't explain why you would have seen a change with 1.1.233.

@tobiasdiez
Copy link

Yes, we are seeing this problem with both 233 and 234. Although there is a difference: with 233 it takes about 40mins before it stops, but with 234 only 4 mins (a normal successful run with 232 takes about 12mins). Could it be that in addition to the problem reported in this issue, 233 introduced an infinite-loop (or something similar) somewhere ?

Let me know if you want me to test some new versions/configs/combinations. However, at the moment the issue is best reproduced on github action, so extensive debugging is not really possible.

@erictraut
Copy link
Collaborator

I've done a bunch more investigation, and I think my earlier patch was correct. The current logic is not; it just happens to work some of the time.

I've added back my previous patch along with some additional logging when verboseOutput is enabled in pyrightconfig.json (or --verbose is specified on the command line). This should help us diagnose the problem if the new logic doesn't work.

@tobaisdiez, I think the problem you saw in 233 is related to this other issue, which was fixed in 234. It sounds like some of my recent perf optimizations have dropped your total analysis time from 12 min to 4 min, but 4 min is still a long time. It must be a big source base! Is it public, by any chance? If so, I can look at it to see if there are any further performance wins that could further decrease the time.

@erictraut erictraut added the addressed in next version Issue is fixed and will appear in next published version label Apr 3, 2022
@tobiasdiez
Copy link

Okay, so we will test the new version 235 as soon as its out, which should then fix both issues if I understand you correctly.

And yes, sage is a huge project and we only started recently to introduce type annotations. We happily serve as a test case: https://github.com/sagemath/sage. Feel free to contact me if things are unclear or you encounter any issues.

@erictraut
Copy link
Collaborator

@tobiasdiez, I found a few more optimizations that help reduce the analysis time for sage. It now takes < 100sec on my local machine to analyze the full source base. This time will continue to drop if you add type annotations because the analyzer won't need to do as much work to infer types.

Pyright currently emits 22K+ warnings across the entire sage source base. You could reduce this significantly by providing annotations for a few key decorator functions (most notably lazy_attribute) and by adding some type stub files that define the interfaces to classes and functions that are implemented in native code. This will allow you to focus on the actual bugs that pyright identifies. FWIW, I'm seeing many actual bugs in the code.

@erictraut
Copy link
Collaborator

This is addressed in pyright 1.1.235, which I just pblished. It will also be included in the next release of pylance.

If you still see crashes due to heap overflow, please enable verboseOutput in pyrightconfig.json (or use the --verbose command-line option) and report what it shows.

@tobiasdiez
Copy link

Thanks for your work and your suggestions. I can confirm that v235 is considerably quicker. Good job!

Sadly, the out of memory error persists though. The last few lines are

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

<--- Last few GCs --->

[22357:0x3d81f30]   180417 ms: Mark-sweep 1983.6 (2051.1) -> 1982.5 (2051.3) MB, 3058.8 / 0.0 ms  (average mu = 0.133, current mu = 0.047) allocation failure scavenge might not succeed
[22357:0x3d81f30]   182475 ms: Mark-sweep 1983.6 (2051.3) -> 1982.4 (2051.3) MB, 2039.5 / 0.0 ms  (average mu = 0.084, current mu = 0.009) allocation failure scavenge might not succeed


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x140dff9]
    1: StubFrame [pc: 0x1394d61]
Security context: 0x161f935808d1 <JSObject>
    2: /* anonymous */ [0x20abd9939ec1] [/__t/node/12.22.11/x64/lib/node_modules/pyright/dist/pyright-internal.js:~1] [pc=0xa1d1107ae7b](this=0x073ea7bb9391 <Object map = 0x11d5405e3ed9>,0x38f150171289 <Object map = 0x11d5405f4879>)
 1: 0xa1a640 node::Abort() [node]
    3: _makeStringNode [0x30474e9427d1] [/__t/node/12.22.11/x64/lib/node_modules/pyright/dist/pyrig...

 2: 0xa1aa4c node::OnFatalError(char const*, char const*) [node]
 3: 0xb9a9fe v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xb9ad79 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xd57ff5  [node]
 6: 0xd58686 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node]
 7: 0xd64f45 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
 8: 0xd65df5 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 9: 0xd688ac v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
10: 0xd2f2cb v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
11: 0x107189e v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]
12: 0x140dff9  [node]
Aborted (core dumped)

and the full log with --verbose is available at https://github.com/sagemath/sagetrac-mirror/runs/5820315673?check_suite_focus=true.

@erictraut erictraut reopened this Apr 4, 2022
@erictraut erictraut removed the addressed in next version Issue is fixed and will appear in next published version label Apr 4, 2022
@erictraut
Copy link
Collaborator

Reopening to investigate. Based on the logs, a few things are immediately evident:

  1. You're running pyright on a 32-bit system (or on a 64-bit system with a 32-bit version of node), and the default heap size is 2GB.
  2. Pyright correctly detected when it went past the 90% heap usage mark and dumped its type cache, but the process was terminated shortly thereafter. That seems to indicate that the type cache is not getting deallocated as intended (perhaps there's a dangling reference to it) or something other than the type cache is consuming significant memory. Those will be my next areas of investigation.

@erictraut
Copy link
Collaborator

This took a bit of spelunking, but I think I found the root cause. It appears that it's a bug in the v8 garbage collector. It's not able to detect that the type cache (which has many internal references) is no longer referenced. I needed to add code that manually breaks some of the references, and then it was able to collect the garbage.

I tested my fix with a 1GB heap limit, which is much lower than the default of 2GB for 32-bit systems and 4GB for 64-bit systems.

Thanks for your patience on this one. It was a tricky one.

@erictraut erictraut added the addressed in next version Issue is fixed and will appear in next published version label Apr 5, 2022
@m-novikov
Copy link
Author

With latest main (up to this commit c2eaa87)
I get following traceback

[ERROR][2022-04-05 12:47:26] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	"
    <--- Last few GCs --->
       [15996:0x56d9f00]    58061 ms: Scavenge 2021.1 (2076.1) -> 2015.2 (2077.8) MB, 4.7 / 0.0 ms           (average mu = 0.474, current mu = 0.273) allocation failure 
       [15996:0x56d9f00]    58101 ms: Scavenge 2023.4 (2078.3) -> 2017.7 (2080.1) MB, 4.0 / 0.0 ms  (average mu = 0.474, current mu = 0.273) allocation failure
       [15996:0x56d9f00]    58415 ms: Scavenge 2024.9 (2080.1) -> 2019.0 (2097.3) MB, 297.3 / 0.0 ms  (average mu = 0.474, current mu = 0.273) allocation failure 
    <--- JS stacktrace --->
    FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory\n"
[ERROR][2022-04-05 12:47:26] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 1: 0xb17ec0 node::Abort() [node]\n"
[ERROR][2022-04-05 12:47:26] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 2: 0xa341f4 node::FatalError(char const*, char const*) [node]\n"
[ERROR][2022-04-05 12:47:26] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 3: 0xcfe71e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]\n"
[ERROR][2022-04-05 12:47:26] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 4: 0xcfea97 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]\n"
[ERROR][2022-04-05 12:47:26] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 5: 0xee8d35  [node]\n"
[ERROR][2022-04-05 12:47:26] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 6: 0xef7ab1 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]\n"
[ERROR][2022-04-05 12:47:26] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 7: 0xefad0c v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]\n"
[ERROR][2022-04-05 12:47:26] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 8: 0xec72bb v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]\n"
[ERROR][2022-04-05 12:47:26] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 9: 0x123052b v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]\n"
[ERROR][2022-04-05 12:47:26] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	"10: 0x16147d9  [node]\n"

@erictraut
Copy link
Collaborator

Hmm, apparently there's more to the problem. I was able to repro the issue prior to my change, but it went away with my "fix". In my experimentation, I was trying to replicate the heap limits on your system (i.e. I ran node with the command-line parameter "--max-old-space-size=2000").

I'm running short on theories here.

@erictraut
Copy link
Collaborator

erictraut commented Apr 5, 2022

I'm no longer able to repro the problem if I constrain the max heap size to 2000MB, but I can still repro it if I constrain the heap to 1000MB. I was able to repro it on node v12, v14 and v16. However, with the latest version of node (v17.8), the problem goes away.

I'm pretty confident this is a bug in older versions of the v8 (Chromium) Garbage Collector. The Edge browser team recently found and fixed several bugs of this nature. They documented it in this blog post. These fixes are presumably in the latest version of node, which explains why the problem no longer occurs.

It's also interesting to see the performance improvements the v8 team has made over time. Here are the analysis times for sage using different versions of node.

Node 12 (2GB heap limit): 172.6s
Node 14 (2GB heap limit): 119.0s
Node 16 (2GB heap limit): 104.7s
Node 17.8 (2GB heap limit): 96.2s
Node 17.8 (4GB heap limit): 76.2s

If my theory above is correct, then I don't think there's anything more I can do within pyright to work around the issue.

Your options include:

  1. If you're on a 64-bit machine, switch from node v12 to v14 or newer. These newer versions of node default to 4GB of heap size on a 64-bit machine.
  2. When launching pyright, use the node command-line option "-max-old-space-size=3000" (or bigger) to give pyright at least 3GB of space. This appears to be enough to handle the full analysis of sage. It might not be sufficient for other (larger) code bases.
  3. Upgrade node to v17.8, which eliminates the problem entirely. This has the added advantage of additional speed increases.

Thanks to @jakebailey for suggesting that this might be an issue related to the node version.

@jakebailey
Copy link
Member

jakebailey commented Apr 5, 2022

@tobiasdiez I'd also recommend using https://github.com/jakebailey/pyright-action to run pyright rather than DIYing it; you'll end up with faster builds thanks to artifact caching, plus it uses Node 16.

Per the above, I'll see if I can get it up to Node 17 for the nice perf boosts and bug fixes. I already bumped the action up from Node 12 to Node 16 a few weeks ago which was a nice change to be able to do.

@m-novikov
Copy link
Author

@erictraut thank you for thorough investigation of this issue.

  1. If you're on a 64-bit machine, switch from node v12 to v14 or newer. These newer versions of node default to 4GB of heap size on a 64-bit machine.

I run server inside fedora toolbox container, it's has 64bit system, but still for some reason has a 2Gb memory limit even on newer node versions (v12, v14).

2. When launching pyright, use the node command-line option "-max-old-space-size=3000" (or bigger) to give pyright at least 3GB of space. This appears to be enough to handle the full analysis of `sage`. It might not be sufficient for other (larger) code bases.

This is the option I went with by setting NODE_OPTIONS environment variable.

3. Upgrade node to v17.8, which eliminates the problem entirely. This has the added advantage of additional speed increases.

Tried to run with node 17.8, crash still occurs when RSS reaches 2Gb. See traceback:

[ERROR][2022-04-06 17:46:01] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	"
<--- Last few GCs --->
[119949:0x5cf9e00]    52147 ms: Mark-sweep (reduce) 2034.4 (2083.2) -> 2033.0 (2083.5) MB, 1079.0 / 0.0 ms  (average mu = 0.293, current mu = 0.275) allocation failure scavenge might not succeed
[119949:0x5cf9e00]    53487 ms: Mark-sweep (reduce) 2034.1 (2083.5) -> 2033.0 (2083.7) MB, 1337.5 / 0.0 ms  (average mu = 0.156, current mu = 0.002) allocation failure scavenge might not succeed

<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
"
[ERROR][2022-04-06 17:46:01] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 1: 0xb30950 node::Abort() [/home/maksim.novikov/.nvm/versions/node/v17.8.0/bin/node]\n"
[ERROR][2022-04-06 17:46:01] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 2: 0xa4219e node::FatalError(char const*, char const*) [/home/maksim.novikov/.nvm/versions/node/v17.8.0/bin/node]\n"
[ERROR][2022-04-06 17:46:01] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 3: 0xd22ffe v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/home/maksim.novikov/.nvm/versions/node/v17.8.0/bin/node]\n"
[ERROR][2022-04-06 17:46:01] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 4: 0xd23377 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/home/maksim.novikov/.nvm/versions/node/v17.8.0/bin/node]\n"
[ERROR][2022-04-06 17:46:01] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 5: 0xedc7a5  [/home/maksim.novikov/.nvm/versions/node/v17.8.0/bin/node]\n"
[ERROR][2022-04-06 17:46:01] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 6: 0xeedbed v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/home/maksim.novikov/.nvm/versions/node/v17.8.0/bin/node]\n"
[ERROR][2022-04-06 17:46:01] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 7: 0xef091e v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/maksim.novikov/.nvm/versions/node/v17.8.0/bin/node]\n"
[ERROR][2022-04-06 17:46:01] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 8: 0xeb233a v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/home/maksim.novikov/.nvm/versions/node/v17.8.0/bin/node]\n"
[ERROR][2022-04-06 17:46:01] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	" 9: 0x1232798 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/home/maksim.novikov/.nvm/versions/node/v17.8.0/bin/node]\n"
[ERROR][2022-04-06 17:46:01] .../vim/lsp/rpc.lua:420	"rpc"	"pyright-langserver"	"stderr"	"10: 0x1635c39  [/home/maksim.novikov/.nvm/versions/node/v17.8.0/bin/node]\n"

@erictraut
Copy link
Collaborator

@m-novikov, it sounds like node thinks it should be able to allocate more memory, but it's being artificially limited to 2GB. You said that you're running this inside a container. Perhaps you need to configure your container runtime settings?

Alternately, you could tell node explicitly that it shouldn't grow its heap over 2GB. Try setting "-max-old-space-size=1500". This relies on proper GC behavior, so you would need to use node 17.8.

@m-novikov
Copy link
Author

@m-novikov, it sounds like node thinks it should be able to allocate more memory, but it's being artificially limited to 2GB. You said that you're running this inside a container. Perhaps you need to configure your container runtime settings?

I don't think this is the reason as I can set env NODE_OPTIONS variable value with and get a bigger heap and it works. If container had a memory limit it would be killed by OOM.

Alternately, you could tell node explicitly that it shouldn't grow its heap over 2GB. Try setting "-max-old-space-size=1500". This relies on proper GC behavior, so you would need to use node 17.8.

Setting value to 1500 doesn't help.

<--- Last few GCs --->
[124010:0x5daeda0]    46064 ms: Mark-sweep (reduce) 1526.7 (1562.2) -> 1525.4 (1562.2) MB, 715.5 / 0.0 ms  (average mu = 0.112, current mu = 0.050) allocation failure scavenge might not succeed
[124010:0x5daeda0]    47168 ms: Mark-sweep (reduce) 1526.9 (1562.4) -> 1525.7 (1562.7) MB, 1100.9 / 0.0 ms  (average mu = 0.049, current mu = 0.003) allocation failure scavenge might not succeed
<--- JS stacktrace --->
...

For me it seems like a race condition of some kind. Because when I set limit to 3072 it comes close but then drops to 700Mb or so, but this never happens with 2Gb it just grows until it crashes.

@jakebailey
Copy link
Member

jakebailey commented Apr 6, 2022

To test what the default limit is, you can try running this oneliner inside of your container:

$ node -e 'console.log(v8.getHeapStatistics().total_available_size / 1024 / 1024)'

For me on Node 14 on Windows, I get back ~4000 or so. Running:

$ node --max-old-space-size=1500 -e 'console.log(v8.getHeapStatistics().total_available_size / 1024 / 1024)'

Gives me back ~1500, so this should at least show that the old space size flag is working. (Technically, old space actually controls the space of freed but not yet GCd/reused/compacted/etc memory, not the heap limit, but they're similar enough.)

To be clear, is what we're trying to fix the fact that it crashes at any max heap size? i.e. there should be some change that ensures that it doesn't OOM but dumps caches in time? Or is this just a case of "give it more heap" and making sure that is working?

@m-novikov
Copy link
Author

m-novikov commented Apr 6, 2022

For me on Node 14 on Windows, I get back ~4000 or so.

For me on nodes 17.8 and 16.3 in both cases inside and outside of the container on 64-bit linux (Fedora 35) I get following:

v16.3.0 2093.7490234375
v17.8.0 2092.8155670166016

Providing flag with NODE_OPTIONS="--max-old-space-size=3072" adjusts values as expected. You can also observe values in the logs I provided.

v16.3.0 3117.747459411621
v17.8.0 3116.8155670166016

In my opinion it should work with default heap limit, but my case maybe an outlier.

@m-novikov
Copy link
Author

I also tested it with default memory limit outside the container on host system (Fedora 35), it crashes with similar log as I provided in earlier comment.
So container is not the culprit there.
On other hand it survives longer and there are successful cache cleanups as I observe drops in memory usage (e.g. 1.8Gb -> 700Mb).

@erictraut
Copy link
Collaborator

I'm out of ideas on this one. I haven't been able to repro the problem since my latest round of changes (when running node 17.8 on MacOS).

@m-novikov, I think it's somewhat suspicious that you're seeing a default heap limit of 2GB. On MacOS and Windows, the 64-bit version of node defaults to 4GB. Maybe the default is different on Linux?

I agree that pyright should work with the default heap limit configuration, but I don't have any further ideas about what pyright could do differently in this case. It is correctly detecting that it's running short on heap space and is eliminating references to its type cache, thus allowing the GC to free up space.

Am I correct in assuming that you have a viable workaround by manually configuring a larger heap size (say, 4GB)?

Let's see if anyone else reports a similar problem. That may provide additional clues.

@ragu-manjegowda
Copy link

@erictraut would it be possible to try this again on Ubuntu (instead of Mac) on a large project?

I saw OOM exception on Ubuntu 20.04 with node v19.4.0, as suggested in the comments above, setting NODE_OPTIONS = "--max-old-space-size=4096" made the error gone.

However, after a while completion just stops working (less than 10 minutes) and there are no errors/warnings in the log. Diagnosis and static checks still works fine, issue is only with the completion.

On Mac, completion works just fine no matter how many buffers (my editor is (n)vim) I open or how long the editor is kept open.

Please let me know if you need any other information.

My current work around it to mount the project on Mac and run LSP there. Would really love to see it working on Ubuntu instead of this hack.

@erictraut
Copy link
Collaborator

This issue was closed many months ago. If you would like to report a new problem, please create a new bug report with detailed repro steps. There is currently a separate open issue related to an out-of-memory crash, so if you have any repro steps to add to that one, please do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addressed in next version Issue is fixed and will appear in next published version bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants