Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unref default file descriptor while still in use #2818

Merged
merged 1 commit into from
Jul 5, 2019

Conversation

devsaurus
Copy link
Member

  • This PR is for the dev branch rather than for master.
  • This PR is compliant with the other contributing guidelines as well (if not, please describe why).
  • I have thoroughly tested my contribution.

When executing file.close() for the basic model, the default file descriptor object is unreferenced even though it is still in use. This is opens up a potential race condition.

I discovered this issue on esp32 where the race recently resulted in a heap corruption. I never observed this on esp8266 - probably due to strictly sequential execution of events. This PR prevents the race condition in any case.

@marcelstoer marcelstoer added this to the Next release milestone Jul 5, 2019
@marcelstoer marcelstoer merged commit bc7ffb3 into nodemcu:dev Jul 5, 2019
@devsaurus devsaurus deleted the fix_file_close_unref branch July 5, 2019 21:46
@TerryE
Copy link
Collaborator

TerryE commented Jul 7, 2019

@devsaurus Arnim, to be honest this one has me confused.

Userdata are collectable types and so if the GC fires and there are no live references then the variable's data will be collected. Yes, Registry[file_fd_ref] references the UD preventing its GC when not otherwise referenced, but in the case of the old code doing the luaL_unref() before the vfs_close(ud->fd) is perfectly valid as the UD is also currently referenced on the Lua stack at L->base[0] (pos 1) and so it can't be available for collection until after the routine returns at which point the LVM frees the stack call frame, and after any references in Lua variable are also derefrenced.

Whatever is causing your heap corruption on the ESP32, it isn't this valid usage.

BTW, using the stack to reference userdata is a good alternative to alloca() for temporary data where you want to maintain malloc / free integrity in the case where the routine generates a Lua throw. Whilst on the stack the UD won't be collected but once the routine exists (either by normal return or throwing an error) then the Lua GC will collect the area on its next sweep.

Does this explanation make sense?

@devsaurus
Copy link
Member Author

but in the case of the old code doing the luaL_unref() before the vfs_close(ud->fd) is perfectly valid as the UD is also currently referenced on the Lua stack

Generally yes, but in this particular case I intended to balance both scenarios first before calling vfs_close(). That's why the UD is popped from the stack immediately at

lua_pop( L, 1 );

unref'ing UD then finally removes the last reference to it 😞

Alternatively, the UD could have been kept on the stack all the way and it's popped at the end inside another if condition. Is this final pop is even required - i.e. would the return pop it implicitly?

@TerryE
Copy link
Collaborator

TerryE commented Jul 7, 2019

That's why the UD is popped from the stack immediately

Can't get the reasoning here. The return 0 returns control to ldo.c which pops the stack so whatever level the lua_call() requires padding out with extra nils if required. IIRC, returning also triggers a GC sweep in some circumstances. You avoid these edge effects if you just leave GC to the Lua GC engine.

That being said, it took me about 3 years to get my head around the subtleties of the Lua VM and the "right way" 🤣

PS. It's not quite as simple as popping. If you have, say, 5 variables on the current call frame stack and do a return 2 then entries -5 through -3 are deleted leaving -2 and -1. However if the caller only wanted 1 result -1 is also deleted and the -2 slot becomes -1 in the callers stack frame. If the caller wanted 3 results then a nil is pushed so that the top 3 get moved to the callers stack frame. (Well they don't get moved since it as proper stack and the L->base and L->top are just reset on return to the calling routine.)

@devsaurus
Copy link
Member Author

The return 0 returns control to ldo.c which pops the stack so whatever level the lua_call() requires padding out with extra nils if required

Hm, does this mean that the userdata which was pushed onto the stack with lua_rawgeti() can remain there until the return 0? I assumed that a C function is required to avoid adding objects to the stack without cleaning up afterwards. Probably I was too pessimistic about this.

It would simplify things a lot if the Lua VM handles that for me 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants