Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: use mremap to move "big" stacks when growing on linux #52910

Closed
Jorropo opened this issue May 15, 2022 · 8 comments
Closed

runtime: use mremap to move "big" stacks when growing on linux #52910

Jorropo opened this issue May 15, 2022 · 8 comments
Labels
NeedsInvestigation Performance
Milestone

Comments

@Jorropo
Copy link
Contributor

@Jorropo Jorropo commented May 15, 2022

mremap allows to move a part of the virtual address space while leaving the physical one unchanged, it also allows growing the moved space with a zero fill.

This is a perfect candidate for resizing big stacks, as this is essentially an ~O(1) operation (it's O(N) on the number of pages, but this cost is minimal and this is heavly dominated by the kernel entry overhead).
This could even maybe allows us to skip taking a few locks because we wouldn't have to interact with the shared memory pool (altho maybe still to keep the address space coherent ?).

This would be a non portable optimization as this syscall is linux specific.

@mknyszek mknyszek added Performance NeedsInvestigation labels May 17, 2022
@mknyszek mknyszek added this to the Backlog milestone May 17, 2022
@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented May 17, 2022

CC @golang/runtime

@randall77
Copy link
Contributor

@randall77 randall77 commented May 17, 2022

I'm not sure exactly how this would help. We wouldn't have to copy the memory, sure, but we still have to go through the stack and update any pointers into the stack. I believe that is the much more expensive part, because it requires unwinding the stack, looking up funcdata, trapsing though pointer bitmaps, and all that.

@Jorropo
Copy link
Contributor Author

@Jorropo Jorropo commented May 17, 2022

Ok thx, I didn't knew about that part (I naivly assumed everything that wasn't accessed using a stack relative pointer was moved to heap).
This probably wouldn't have a significant impact then.

@Jorropo Jorropo closed this as completed May 17, 2022
@CAFxX
Copy link
Contributor

@CAFxX CAFxX commented May 18, 2022

Just as food for thought, mremap may be still helpful when a large slice containing noscan elements needs to grow during append. This may be a somewhat niche use-case (especially as we may need to prove that the original array is dead after the append), but for the cases where it would be applicable it would likely be very effective.

This may warrant a different ticket though.

(Also worth pointing out that while mremap is linux-specific, other OSes like windows should allow achieving the same effect with different APIs)

@Jorropo
Copy link
Contributor Author

@Jorropo Jorropo commented May 18, 2022

@CAFxX I don't think it can work in current conditions, because mremap unmaps the old range, so if an other reference exist it's gonna sigsev when accessed after the remap.
So code like that:

s1, s2 := getSlices()
s3 := append(s1, s2...) // assume s1 is big and remap happen.
s1[0] = 1 // SIGSEV because s1's mmaping doesn't exists anymore and has been remmaped to s3.

There are options about configuring that:

  • MREMAP_DONTUNMAP (linux 5.7) will not unmap the previous range, instead it's gonna be trapped with a zero fill.
    I don't think that helps much alone. Ok it doesn't sigsev, but we can't replace data by zeros either.
  • MREMAP_DONTUNMAP + userfaultfd, this would allows to write a custom page interupt handler that would do a COW of the slice when used, however this solution sounds really complex (only the old mapping is trapped so we would need to reset up a trap on the new one too, ...) and I don't know the cost of userfaultfd to begin with, and that would only be faster assuming not all of the mapping is written too (so I would rename this optimisation lazy slice copy).
  • A third option would be to add a MREMAP_COW option in linux, which does what you think it do.
    But given release cycles, the speed at which people update their software, ... this would mean this would be realistically usable in a few years.

@CAFxX
Copy link
Contributor

@CAFxX CAFxX commented May 18, 2022

@Jorropo sure, that's why I mentioned

we may need to prove that the original array is dead after the append

if the array is not dead it becomes much more complicated (and probably not worth the complexity) as you correctly point out

@Jorropo
Copy link
Contributor Author

@Jorropo Jorropo commented May 18, 2022

@CAFxX

mb, but if we are capable to prove that I think better optimisations are possible:

  • Removing the growslice to begin with (or coalescing them together).
  • Using stack allocations.

There would be counter examples like when too much conditionals are involved (then we couldn't do the optimizations I'm thinking about but we could remap). But I doubt this shows up in real code.

If you think this is worthwhile, I think you should open a new issue. :)

@CAFxX
Copy link
Contributor

@CAFxX CAFxX commented May 18, 2022

Sure.

Just to address your points: proving the array is dead unfortunately would not automatically mean we can coalesce (e.g. if we're appending data coming from external sources), and definitely stack allocations would not help under my assumption of large slices (both because of the size threshold for stack allocations, and, more in general, because we would again run into the same problem mentioned above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Performance
Projects
Status: Done
Development

No branches or pull requests

4 participants