New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack integrity during EGC #2881
Comments
How sure are you about the complexity of (2)? It would seem, naively, fairly simple to leave __gc-bearing garbage as garbage during a sweep? I think it's likely going to be complicated to explain which C API functions can cause egc passes and so need special handling w.r.t. local C variables. |
The issue is that this involves some changes to the sweep algo which might need us to add an extra pass, and we also need to add extra state to the sweep process. The eLua approach attempted this and missed some -- hence the DiUS issue. The API changes are reasonably small and internal to the Lua components: |
Option 3 would seem to be the "correct" approach, and one which we might get traction with upstream as well(?). |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This addressed architecturally in Lua53 in that finalisers are not called by the GC during an emergency sweep, so it makes sense to close this IMO. |
In that case I think we should apply at least one of the workarounds to 5.1 before we freeze it (if that didn't already happen when I wasn't paying attention?) Here at work we're still using our initial workaround so it won't bother us (as we're hoping to do the switch to 5.3 anyway next time we get to rebase), but it's such a nasty little surprise I don't want anyone else having to debug it unnecessarily! |
The issue
If a new resource is allocated, then the VM adds an extra step in the malloc process. If the malloc fails returning NULL, then the VM carries out a GC sweep to free up garbage memory and retries the malloc before aborting the allocation request with an out of memory error thrown.
Expected behaviour
This process should maintain internal integrity and either succeed or correctly throw an out-of-memory error which can be handled and reported by the application. It should also be economic as practical -- that is avoid throwing an error when GC could make sufficient available.
Actual behaviour
The complication arise here if collectable objects have an associated
_gc
methamethod. This means the the GC can end up calling Lua functions during the GC sweep. Such calls can trigger a Lua stack resize. This occurs very rarely and when it does the Lua VM internally handles all of the housekeeping to update the stack references in thelua_State
, etc. preserving data integrity. Issues can occur if the API code does something like:We have hit and tracked down one example of this in a live DiUS application. The manifestation is that the VM will throw a H/W exception when this happens. This is very occasionally and seemingly randomly, and so is very hard to reproduce systematically and therefore is hard to identify.
A specific example is the use of the
seterrorobj()
function in thelua53/ldo.c
:Here
seterrobj()
can push and error message as a TString to the stack when can trigger a GC and if it does thenoldtop
is still pointing to the old stack and not the new one. If seterrobj used aptrdiff_t
as its third argumentseterrorobj(L, status, oldtop - L->stack)
then this offset reference is stack relative and won't cause an issue if the stack is resized and relocated.So what am I asking here?
Standard Lua doesn't experience this because it is working in a large RAM heap and recreating this issue is almost impossible, IMO. This isn't the case when you've got 10s of Kb RAM available. The eLua patch addressed on of the cases but missed the one which tripped up DiUS. So what our approach to this architectural issue. I think that we have three broad responses:
__gc
method. This is going to hard to implement well, I think.StkId
pointers and use stack-offsets where necessary.My preference is for (3) and to do this as a separate commit pass so that it is more easily reviewable. That means that in implementation (1) and (3) are the same in the first instance.
Comments?
The text was updated successfully, but these errors were encountered: