-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net_midi/net_hid linked list implementation bug? #272
Comments
OK so as I understand it, net_midi.c is a doubly linked list. The active midi ops in there make a circular linked list structure. The error condition 'error unlinking midi operator' should only ever fire if net_midi_list_remove is called twice on the same midi op. That should surely never happen - possibly gives another clue to what might be going on... |
oy sorry, i didn't see this earlier. the list managemetn stuff itself is awfully simple. |
Just added two commits with some more debug - now I'm getting a crash like this:
that 'bail on midi op already in list' was silent in the original code. Needs some more debug on entering/leaving net_midi_list_push & net_midi_list_remove... |
yeah.. something funny is happening when
(sorry i know this is obvious, just going through it) |
Very weird! In fact I'm going to try printing the hex address of new midi ops - most obvious possibility would be that the new op pool code is playing up, allocating the same memory chunk twice... It doesn't look like deinit is called twice... |
With the mem address printed on every entry to midi_list_push & midi_list_remove:
If I didn't know better, I'd say there must be a bug in these four lines:
Obviously this conclusion makes no sense, those 4 lines look correct! Gonna write a debug function to print out the entire contents of ml, call it on entry/exit from push/remove midi_op, try & see what's going wrong there... |
think I figured out the corner case that was causing this bug! https://github.com/rick-monster/aleph/commit/0680f00bac21ca697b052d6e7c635310c5c90c19 So the problem with the original code is, what happens if you de-init ml.top when the list is not of length 1? the op to be deleted (ml.top) gets 'squashed out' of the doubly linked list, then ml.top is still left pointing to the disembodied op linked list element that just got deleted! According to my reasoning on this - this bug must also have been present in the old code before arbitrary op deletion/insertion. (but not dead sure about that) Whatever - anyway now the bug seems to be fixed! This stress test: Has been running now for over 5 minutes, whereas it would crash out before https://github.com/rick-monster/aleph/commit/0680f00bac21ca697b052d6e7c635310c5c90c19 after only 30s! |
yep, that's a bug alright! sorrrry... |
Well I somehow doubt that's the only remaining corner-case crash in BEES - there are a lot of lines in there! |
Seeing a crash specific to operator insertion/deletion stress test with ops that use net_midi/net_hid (see https://github.com/rick-monster/aleph/issues/21 & #267).
Reading through the code now in net_midi.c - seems like a linked list-ish thing. Also seems possible that algorithm assumes operators are allocated last-in-first-out, and seems plausible to me this may blow up with the observed error message 'error unlinking midi operator' in case the allocation order is broken when de-allocating...
@catfact do you remember writing the code for midi ops? Haven't fully understood the code in net_midi.c yet - does the above explanation hold water?
Seems HID ops also have net_hid.c, similar-ish code - makes sense that both would be blowing up in the same way...
The text was updated successfully, but these errors were encountered: