New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of memory crash pyright-langserver #3239
Comments
The heap can grow up to the resident size limit, which is why it's appropriate to use Which version of node do you have installed? Is it relatively recent? Which OS platform? And is it a 32-bit processor architecture or 64-bit? |
I have node 16.3 on 64-bit linux system. I tired starting server with For me Here is the patch that worked for me. Note that comparing diff --git a/packages/pyright-internal/src/analyzer/program.ts b/packages/pyright-internal/src/analyzer/program.ts
index aabf36a5..08c0a18b 100644
--- a/packages/pyright-internal/src/analyzer/program.ts
+++ b/packages/pyright-internal/src/analyzer/program.ts
@@ -85,6 +85,7 @@ import { createTypeEvaluatorWithTracker } from './typeEvaluatorWithTracker';
import { PrintTypeFlags } from './typePrinter';
import { Type } from './types';
import { TypeStubWriter } from './typeStubWriter';
+import * as v8 from 'v8';
const _maxImportDepth = 256;
@@ -2106,10 +2107,10 @@ export class Program {
// drop immediately after we empty the cache due to garbage collection timing.
if (typeCacheSize > 750000 || this._parsedFileCount > 1000) {
const memoryUsage = process.memoryUsage();
-
+ const heapStats = v8.getHeapStatistics();
// If we use more than 90% of the available heap size, avoid a crash
// by emptying the type cache.
- if (memoryUsage.heapUsed > memoryUsage.rss * 0.9) {
+ if (memoryUsage.rss > heapStats.total_available_size * 0.9) {
const heapSizeInMb = Math.round(memoryUsage.rss / (1024 * 1024));
const heapUsageInMb = Math.round(memoryUsage.heapUsed / (1024 * 1024)); |
Yeah, I think your patch makes sense. I've incorporated it, and this will be included in the next release. |
For the record, we experience some similar issues when running pyright in a github actions workflow
See, e.g., https://github.com/sagemath/sagetrac-mirror/runs/5779736990?check_suite_focus=true |
I think the problem we encounter is different to the one @m-novikov experiences. For us, they first occurred with pyright@1.1.233, while 232. In fact, we have two runs of pyright on the same code that only differ in the (automatic) upgrade 232 > 233, see And, in fact, after downgrading to 232 everything works again. Hope that helps. |
@tobiasdiez, that's very strange. The code that manages heap space in pyright was unmodified for more than a year. I only recently made a change in 1.1.234 to try to address the problem @m-novikov reported above. I've since reverted that change because I think it was incorrect, so version 1.1.235 (once I publish it) will revert back to the older (pre-1.1.234) behavior. Are you sure that you didn't see the problem with 1.1.234? I can't explain why you would have seen a change with 1.1.233. |
Yes, we are seeing this problem with both 233 and 234. Although there is a difference: with 233 it takes about 40mins before it stops, but with 234 only 4 mins (a normal successful run with 232 takes about 12mins). Could it be that in addition to the problem reported in this issue, 233 introduced an infinite-loop (or something similar) somewhere ? Let me know if you want me to test some new versions/configs/combinations. However, at the moment the issue is best reproduced on github action, so extensive debugging is not really possible. |
I've done a bunch more investigation, and I think my earlier patch was correct. The current logic is not; it just happens to work some of the time. I've added back my previous patch along with some additional logging when @tobaisdiez, I think the problem you saw in 233 is related to this other issue, which was fixed in 234. It sounds like some of my recent perf optimizations have dropped your total analysis time from 12 min to 4 min, but 4 min is still a long time. It must be a big source base! Is it public, by any chance? If so, I can look at it to see if there are any further performance wins that could further decrease the time. |
Okay, so we will test the new version 235 as soon as its out, which should then fix both issues if I understand you correctly. And yes, sage is a huge project and we only started recently to introduce type annotations. We happily serve as a test case: https://github.com/sagemath/sage. Feel free to contact me if things are unclear or you encounter any issues. |
@tobiasdiez, I found a few more optimizations that help reduce the analysis time for sage. It now takes < 100sec on my local machine to analyze the full source base. This time will continue to drop if you add type annotations because the analyzer won't need to do as much work to infer types. Pyright currently emits 22K+ warnings across the entire sage source base. You could reduce this significantly by providing annotations for a few key decorator functions (most notably |
This is addressed in pyright 1.1.235, which I just pblished. It will also be included in the next release of pylance. If you still see crashes due to heap overflow, please enable |
Thanks for your work and your suggestions. I can confirm that v235 is considerably quicker. Good job! Sadly, the out of memory error persists though. The last few lines are
and the full log with |
Reopening to investigate. Based on the logs, a few things are immediately evident:
|
This took a bit of spelunking, but I think I found the root cause. It appears that it's a bug in the v8 garbage collector. It's not able to detect that the type cache (which has many internal references) is no longer referenced. I needed to add code that manually breaks some of the references, and then it was able to collect the garbage. I tested my fix with a 1GB heap limit, which is much lower than the default of 2GB for 32-bit systems and 4GB for 64-bit systems. Thanks for your patience on this one. It was a tricky one. |
With latest main (up to this commit c2eaa87)
|
Hmm, apparently there's more to the problem. I was able to repro the issue prior to my change, but it went away with my "fix". In my experimentation, I was trying to replicate the heap limits on your system (i.e. I ran node with the command-line parameter "--max-old-space-size=2000"). I'm running short on theories here. |
I'm no longer able to repro the problem if I constrain the max heap size to 2000MB, but I can still repro it if I constrain the heap to 1000MB. I was able to repro it on node v12, v14 and v16. However, with the latest version of node (v17.8), the problem goes away. I'm pretty confident this is a bug in older versions of the v8 (Chromium) Garbage Collector. The Edge browser team recently found and fixed several bugs of this nature. They documented it in this blog post. These fixes are presumably in the latest version of node, which explains why the problem no longer occurs. It's also interesting to see the performance improvements the v8 team has made over time. Here are the analysis times for Node 12 (2GB heap limit): 172.6s If my theory above is correct, then I don't think there's anything more I can do within pyright to work around the issue. Your options include:
Thanks to @jakebailey for suggesting that this might be an issue related to the node version. |
@tobiasdiez I'd also recommend using https://github.com/jakebailey/pyright-action to run pyright rather than DIYing it; you'll end up with faster builds thanks to artifact caching, plus it uses Node 16. Per the above, I'll see if I can get it up to Node 17 for the nice perf boosts and bug fixes. I already bumped the action up from Node 12 to Node 16 a few weeks ago which was a nice change to be able to do. |
@erictraut thank you for thorough investigation of this issue.
I run server inside fedora toolbox container, it's has 64bit system, but still for some reason has a 2Gb memory limit even on newer node versions (v12, v14).
This is the option I went with by setting NODE_OPTIONS environment variable.
Tried to run with node 17.8, crash still occurs when RSS reaches 2Gb. See traceback:
|
@m-novikov, it sounds like node thinks it should be able to allocate more memory, but it's being artificially limited to 2GB. You said that you're running this inside a container. Perhaps you need to configure your container runtime settings? Alternately, you could tell node explicitly that it shouldn't grow its heap over 2GB. Try setting "-max-old-space-size=1500". This relies on proper GC behavior, so you would need to use node 17.8. |
I don't think this is the reason as I can set env NODE_OPTIONS variable value with and get a bigger heap and it works. If container had a memory limit it would be killed by OOM.
Setting value to 1500 doesn't help.
For me it seems like a race condition of some kind. Because when I set limit to 3072 it comes close but then drops to 700Mb or so, but this never happens with 2Gb it just grows until it crashes. |
To test what the default limit is, you can try running this oneliner inside of your container:
For me on Node 14 on Windows, I get back ~4000 or so. Running:
Gives me back ~1500, so this should at least show that the old space size flag is working. (Technically, old space actually controls the space of freed but not yet GCd/reused/compacted/etc memory, not the heap limit, but they're similar enough.) To be clear, is what we're trying to fix the fact that it crashes at any max heap size? i.e. there should be some change that ensures that it doesn't OOM but dumps caches in time? Or is this just a case of "give it more heap" and making sure that is working? |
For me on nodes 17.8 and 16.3 in both cases inside and outside of the container on 64-bit linux (Fedora 35) I get following:
Providing flag with
In my opinion it should work with default heap limit, but my case maybe an outlier. |
I also tested it with default memory limit outside the container on host system (Fedora 35), it crashes with similar log as I provided in earlier comment. |
I'm out of ideas on this one. I haven't been able to repro the problem since my latest round of changes (when running node 17.8 on MacOS). @m-novikov, I think it's somewhat suspicious that you're seeing a default heap limit of 2GB. On MacOS and Windows, the 64-bit version of node defaults to 4GB. Maybe the default is different on Linux? I agree that pyright should work with the default heap limit configuration, but I don't have any further ideas about what pyright could do differently in this case. It is correctly detecting that it's running short on heap space and is eliminating references to its type cache, thus allowing the GC to free up space. Am I correct in assuming that you have a viable workaround by manually configuring a larger heap size (say, 4GB)? Let's see if anyone else reports a similar problem. That may provide additional clues. |
@erictraut would it be possible to try this again on Ubuntu (instead of Mac) on a large project? I saw OOM exception on Ubuntu 20.04 with node v19.4.0, as suggested in the comments above, setting However, after a while completion just stops working (less than 10 minutes) and there are no errors/warnings in the log. Diagnosis and static checks still works fine, issue is only with the completion. On Mac, completion works just fine no matter how many buffers (my editor is Please let me know if you need any other information. My current work around it to mount the project on Mac and run LSP there. Would really love to see it working on Ubuntu instead of this hack. |
This issue was closed many months ago. If you would like to report a new problem, please create a new bug report with detailed repro steps. There is currently a separate open issue related to an out-of-memory crash, so if you have any repro steps to add to that one, please do so. |
Describe the bug
Traceback
pyright-langserver crashes when heap allocation exceeds 2GB.
I think it happens after this commit c56d750
It uses
process.memoryUsage().rss
as an upper limit, which probably should bev8.getHeapStatistics().heap_size_limit
I am not sure consulted this SO questionTo Reproduce
Big enough project.
You can try forcing this issue using
export NODE_OPTIONS="--max-old-space-size=1024"
Expected behavior
Crash doesn't occur, cache discarded before critical threshold is reached.
Additional context
I use this language server in neovim, so startup method probably differs from vscode extension.
The text was updated successfully, but these errors were encountered: