Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The current out-of-memory support is unreliable #96

Closed
itamarst opened this issue Nov 12, 2020 · 3 comments · Fixed by #110
Closed

The current out-of-memory support is unreliable #96

itamarst opened this issue Nov 12, 2020 · 3 comments · Fixed by #110
Labels
enhancement New feature or request ux

Comments

@itamarst
Copy link
Collaborator

Currently we do two things:

  1. On failed malloc(), free up 16 previously-allocated 16MB, which in theory gives us memory for step 2.
  2. Dump current allocations to disk as SVG.

The problems with this are:

  1. Merely allocating 16MB doesn't guarantee it will be available later when free()ed. It may never have been reserved meaningfully at all if it's all zeros; this can be fixed, but even then when freed some other program might grab it.
  2. 16MB might not suffice to generate the SVG.
  3. By the time malloc() fails the computer might be locked up to the point of being unusable due to swapping and the like.

I expect to be ripping out the 16MB thing in #95, probably, because it's yet another if statement to slow things down, and it's not clear it adds anything.

Some potential solutions:

  1. Use rusage to limit memory available to the program; this way the program will run out of memory while there's still a safety margin. This will mitigate but not prevent the above problems.
  2. Make sure the information necessary to dump current allocations is written to disk (or, more plausibly given performance needs, mmap() with lazy disk syncing). If memory runs out, the necessary info will still be on disk and a report can be generated post-crash.
@itamarst
Copy link
Collaborator Author

This is probably going to be in Fil for Pipelines, a paid product, but still going to need to add some infrastructure here to support hooking that in.

@itamarst
Copy link
Collaborator Author

itamarst commented Nov 25, 2020

And, after further thought, can actually probably be done reasonably in open source version, albeit less reliably.

@itamarst
Copy link
Collaborator Author

itamarst commented Dec 10, 2020

rusage isn't good enough because it won't prevent failed allocations in Rust code, thus blowing up everything.

A better mechanism:

  1. Counter showing how many allocations since last check.
  2. Right before each allocation, increment counter. Frees are not tracked! and that's OK, goal is just to spread out checks sufficiently.
  3. After N bytes have been allocated (which means memory usage is at most N bytes higher, might be less if there were frees) check how much memory is free on computer/cgroup. If it's >100MB, do OOM handling.

OOM handling actually has two modes we'd like:

  1. When Fil is loaded but inactive, just exit with special exit code, so caller can know to retry with profiler.
  2. When Fil is actively tracing, dump all allocations. In Jupyter (or other modes where tracig is only partially inactive) this will have to include all non-tracked memory as "Untracked", whcih doesn't happen now. After dump, exit with special exit code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ux
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant