Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upProgram gets stuck attempting to swap Subs #612
Comments
jvoigtlaender
changed the title from
Can't swap Subs
to
Program gets stuck attempting to swap Subs
Jun 3, 2016
jvoigtlaender
referenced this issue
Jun 26, 2016
Closed
Runtime Error: Subscription to Time.every x where x is computed based on the model #1426
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
OvermindDL1
Aug 4, 2016
Issue https://github.com/elm-lang/html/issues/58 is the same, issue elm/compiler#1426 is the same, issue https://github.com/elm-lang/core/issues/628 is the same, and issue elm-lang/animation-frame#7 is the same.
My findings about the root cause on the javascript side, it appears to be caused by tasks/subs/process being killed 'while' they are in the workQueue. My overall findings below from the mailing list and other issue:
As per the mailing list at: https://groups.google.com/forum/#!topic/elm-discuss/JJaWcxKy6L4
This might be a bug in core or even the compiler who knows, but here it happened.
I am still trying to reduce it, but so far in my giant application if I use animation-frame then it randomly dies in Chrome.
Do note, I subscribe and unsubscribe fairly rapidely to AnimationFrame via code kind of like:
subscriptions model =
Sub.batch
[ if List.isEmpty model.onNextFrame then
Sub.none
else
AnimationFrame.times HelpersMsg_Frame
]So I am guessing that something is not being cleaned up or so in the right order maybe? I really really do not want to keep it always subscribed when there are no updates that need to be done...
I tested it by removing the conditional subscription and instead always subscribing (ohcrapmylog) and I've not hit the issue yet. I need to switch it back though as this constant polling is hurting the site performance...
Here is the overall message, lot of hairy Elm internals, do not know enough about it to simplify yet:
The message that contains the one that gets wiped is id: 38 as stated before, it is a "_Task_andThen" with a task of what will become id: 39.
When id: 38 gets processed it enqueues id: 39, at this time the cancel key on the task that becomes Process id: 39 is null, but walking back up the parent stack of id: 38 shows it calls the callback on id: 4 while creating the cancel key in the step function when if (ctor === '_Task_nativeBinding') {.
Traced through the entire id: 38 path, it ends up creating id: 39 when it calls Native_AnimationFrame native callback on rAF, which then ends up calling the callback given to requestAnimationFrame, which then calls callback(_elm_lang$core$Native_Scheduler.succeed(timeNow));, which then calls "_Task_succeed" to be called on id: 38, which then calls the callback on its 'stack' key, which ends up calling sendToSelf with the animation frame time value, and id: 38 gets called more times on the work loop because it ended up getting queued up a few times earlier, but these next ones do not do anything of importance since its stack is null (it early breaks). Process id: 4 appears to get fed the message to queue up the task for the animationframe callback.
When id: 39 does work and there is no exception in a run it is filled with the animation frame callback succeed task and then message.
When id: 39 does not work, I.E. root is null during the 'work' call, root is not null and is assigned an object when id: 39 is initially created, and that object is a "_Task_andThen" that will do a "_Task_nativeBinding" to the animation frame "callback' function.
It gets cleared during the step of id: 4, which involves a "_Task_succeed", that then calls the callback on the stack, which is a "_Task_andThen", which then immediately gets called in the step loop again (did not exit as it was via the internal loop), which then process as a "_Task_andThen" this time, which then bumps on a "_Task_succeed", which calls the list 'Cons' operator (empty list and a tuple0) and gives the result back as a "_Task_succeed", which then gets called via the id: 4 loop to get wrapped up in a 'Just', sent to the animation frame native code that then packages up the time information into another "_Task_succeed", which then loops again and passes that value into the main application spawnLoop loop function and stuffs things into a ready to handle onMessage, then id: 4 loops again to handle the "_Task_andThen" that the loop put into it just now, which then stuffs the loop callback back onto itself and loops again to handle the "_Task_receive" for the onMessage callbacks (the id: 4 process at this point has 3 things in its mailbox), the first message is the animationframe callback, which ends up calling down to _elm_lang$animation_frame$AnimationFrame$onEffects of which is passed in is a subscription object that holds id: 39 (whoo found it! ow...), which stuffs our id: 39 object into a return object on the request._0 key, which all gets bundled into a "_Task_succeed", and processed via the loop for id: 4 and passing that to the main loop callback again that stuffs a "_Task_receive" onto id: 4 (the elm compiler could really optimize a LOT of this, maybe translate elm to llvm, optimize it, then translate to javascript as a start, like holy heck..., hmm webassembly is not a bad idea for elm at all...), so then id: 4 handles the "_Task_andThen" which then puts on then handles the "_Task_receive", then it pops the second of the originally 3 messages on id: 4 and gives it to the callback that then calls onMessage with the animationframe time stuff again, which calls back into _elm_lang$animation_frame$AnimationFrame$onEffects again and OhHeyLook id: 39 Again (>.>), which stuffs it into the request._0 key again on the return object so we loop around id: 4 a couple more times again while a "_Task_receive" shuffles to the top yet again (really, elm compiler, optimization, maybe llvm to output to both javascript and webassembly for options...), and we receive the last message on the mailbox which is all the same stuff through _elm_lang$animation_frame$AnimationFrame$onEffects yet again except this last time instead of returning a "_Task_succeed" it instead goes down the other path where a kill function is called's callback for a native binding within the process of id: 39 (Oh hey there it is again!), which is then pushed on to be called so yet again id: 4 gets looped around again to handle a "_Task_andThen" that then handles the prior "_Task_nativeBinding" that builds the cancel function on the root via the callback function on that same root (which is the prior kill returned callback function that was on id: 39), this function does:
function kill(process) {
return nativeBinding(function (callback) {
var root = process.root;
if (root.ctor === '_Task_nativeBinding' && root.cancel) {
root.cancel();
}
process.root = null;
callback(succeed(_elm_lang$core$Native_Utils.Tuple0));
});
}Where 'process' is the process with id: 39, so you see here the function (callback) function that was passed into a nativeBinding function (that returned the nativeBinding task message that is 'now' being processed) is being called, so first it does var root = process.root, so far so good, it then calls root.cancel if root.ctor === '_Task_nativeBinding', which it is not (it is a "_Task_andThen"), so it skips that if entirely and continues on down to process.root = null, and BOOM there id: 39 was just corrupted, so when id: 39 is run through later then its root being null causes it to die when it is trying to figure out what to do.
So yes, this is a bug, probably in AnimationFrame (maybe in core, I don't know).
And this hurts... How do I work around this bug for the time being until it is fixed?
On Wednesday, August 3, 2016 at 12:28:47 PM UTC-6, OvermindDL1 wrote:
Sometime between when id: 39 and id: 40 is created during rawSpawn the root key in the object goes null. Still trying to find what code is wiping it...
On Wednesday, August 3, 2016 at 12:21:18 PM UTC-6, OvermindDL1 wrote:
The task object with id: 39 does not have a null root at the time it is put into workQueue. At the time it is put into the workQueue it is: {callback : function(b), ctor: "_Task_andThen", task: {callback: function(callback), cancel: function(), ctor: "_Task_nativeBinding"}}
The root.task.callback function seems to have one interesting closure that has a key/value of navStart:1470247637381, and there is another link in the cancel to the process with id: 38 so it appears to be a continuation of that one. I am not sure of the internal structure of Elm so I am not sure what Native Binding it is called, and I use no Native Bindings in my project (only what comes with Elm core libraries is what exists here at all).
A little further digging and I am seeing _elm_lang$animation_frame$Native_AnimationFrame in the stack, further digging makes it seem (although I am unsure) that the root.task.callback is the same function as defined at: https://github.com/elm-lang/animation-frame/blob/master/src/Native/AnimationFrame.js#L13
I am not yet seeing how the root key on this "_Process" object is getting cleared yet before it has a chance to be processed, still tracing...
On Wednesday, August 3, 2016 at 11:59:23 AM UTC-6, OvermindDL1 wrote:
Debugged into it and caught the exception at the point to get the stack values:
numSteps = 403
process = Object {ctor: "_Process", id: 39, root: null, stack: null, mailbox: Array[0]}So root is null, why would it be trying to access a null value without checking if null?
Any ideas how to work around this in this project so I can at least keep working in chrome?
On Wednesday, August 3, 2016 at 11:52:36 AM UTC-6, OvermindDL1 wrote:
I keep getting this exception thrown from inside Elm, so far only from Chrome Version 51.0.2704.103 m
elm.js:2417 Uncaught TypeError: Cannot read property 'ctor' of null
Where that line and the surrounding context is purely Elm generated code, and is:
// STEP PROCESSES // Line 2411
function step(numSteps, process)
{
while (numSteps < MAX_STEPS)
{
var ctor = process.root.ctor; // Line 2417 -- This is the error: Uncaught TypeError: Cannot read property 'ctor' of null
if (ctor === '_Task_succeed')
{
while (process.stack && process.stack.ctor === '_Task_onError')
{
process.stack = process.stack.rest;
}
if (process.stack === null)The same javascript seems to run fine in firefox, IE, and edge, this only seems to happen in Chrome and only 'sometimes'. It seems to happen pretty quickly during loading and if it does not happen at the start then it does not seem to happen. I've not been able to find code to whittle down that lets my app still do anything while still causing this error.
The 'step' function is being called from the 'work' function of (and with context):
javascript
// WORK QUEUE
var working = false;
var workQueue = [];
function enqueue(process) {
workQueue.push(process);
if (!working) {
setTimeout(work, 0);
working = true;
}
}
function work() {
var numSteps = 0;
var process;
while (numSteps < MAX_STEPS && (process = workQueue.shift())) {
numSteps = step(numSteps, process); // This is the place in the callstack before step
}
if (!process) {
working = false;
return;
}
setTimeout(work, 0);
}
Chrome is not reporting anything in the stack below work so this appears to be during the setTimeout callback set a few lines prior to work that calls work.
Any thoughts as to the cause?
OvermindDL1
commented
Aug 4, 2016
|
Issue https://github.com/elm-lang/html/issues/58 is the same, issue elm/compiler#1426 is the same, issue https://github.com/elm-lang/core/issues/628 is the same, and issue elm-lang/animation-frame#7 is the same. My findings about the root cause on the javascript side, it appears to be caused by tasks/subs/process being killed 'while' they are in the workQueue. My overall findings below from the mailing list and other issue: As per the mailing list at: https://groups.google.com/forum/#!topic/elm-discuss/JJaWcxKy6L4 This might be a bug in core or even the compiler who knows, but here it happened. I am still trying to reduce it, but so far in my giant application if I use animation-frame then it randomly dies in Chrome. Do note, I subscribe and unsubscribe fairly rapidely to AnimationFrame via code kind of like: subscriptions model =
Sub.batch
[ if List.isEmpty model.onNextFrame then
Sub.none
else
AnimationFrame.times HelpersMsg_Frame
]So I am guessing that something is not being cleaned up or so in the right order maybe? I really really do not want to keep it always subscribed when there are no updates that need to be done... I tested it by removing the conditional subscription and instead always subscribing (ohcrapmylog) and I've not hit the issue yet. I need to switch it back though as this constant polling is hurting the site performance... Here is the overall message, lot of hairy Elm internals, do not know enough about it to simplify yet: The message that contains the one that gets wiped is When Traced through the entire When When It gets cleared during the function kill(process) {
return nativeBinding(function (callback) {
var root = process.root;
if (root.ctor === '_Task_nativeBinding' && root.cancel) {
root.cancel();
}
process.root = null;
callback(succeed(_elm_lang$core$Native_Utils.Tuple0));
});
}Where 'process' is the process with So yes, this is a bug, probably in AnimationFrame (maybe in core, I don't know). And this hurts... How do I work around this bug for the time being until it is fixed? On Wednesday, August 3, 2016 at 12:28:47 PM UTC-6, OvermindDL1 wrote: On Wednesday, August 3, 2016 at 12:21:18 PM UTC-6, OvermindDL1 wrote: The root.task.callback function seems to have one interesting closure that has a key/value of A little further digging and I am seeing I am not yet seeing how the On Wednesday, August 3, 2016 at 11:59:23 AM UTC-6, OvermindDL1 wrote: numSteps = 403
process = Object {ctor: "_Process", id: 39, root: null, stack: null, mailbox: Array[0]}So root is null, why would it be trying to access a null value without checking if null? Any ideas how to work around this in this project so I can at least keep working in chrome? On Wednesday, August 3, 2016 at 11:52:36 AM UTC-6, OvermindDL1 wrote:
Where that line and the surrounding context is purely Elm generated code, and is: // STEP PROCESSES // Line 2411
function step(numSteps, process)
{
while (numSteps < MAX_STEPS)
{
var ctor = process.root.ctor; // Line 2417 -- This is the error: Uncaught TypeError: Cannot read property 'ctor' of null
if (ctor === '_Task_succeed')
{
while (process.stack && process.stack.ctor === '_Task_onError')
{
process.stack = process.stack.rest;
}
if (process.stack === null)The same javascript seems to run fine in firefox, IE, and edge, this only seems to happen in Chrome and only 'sometimes'. It seems to happen pretty quickly during loading and if it does not happen at the start then it does not seem to happen. I've not been able to find code to whittle down that lets my app still do anything while still causing this error. The 'step' function is being called from the 'work' function of (and with context):
Chrome is not reporting anything in the stack below Any thoughts as to the cause? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
OvermindDL1
Aug 4, 2016
Managed to reduce a program, it happens for me quite reliably in Chrome, just click On then Off a few times and you will eventually see
Main.elm:2417 Uncaught TypeError: Cannot read property 'ctor' of null
step @ Main.elm:2417
work @ Main.elm:2533pop up into your javascript console fairly quickly and the entire Elm app will stop functioning, thus killing the page. If you attach a debugger in chrome to the javascript and have it stop on unhandled exceptions then you will see exactly where.
The simplified code is (add a dependency on elm-lang/animation-frame as it supplies a wonderfully easy way to reproduce this):
port module Main exposing (..)
import AnimationFrame
import Html
import Html.App
import Html.Events
main =
Html.App.program
{ init = init
, view = view
, update = update
, subscriptions = subscriptions
}
init = (0, Cmd.none)
type Msg
= On
| Off
| Tick Float
update msg model =
case msg of
On -> if model == 0 then (1, Cmd.none) else (model, Cmd.none)
Off -> if model >= 1 then (0, Cmd.none) else (model, Cmd.none)
Tick t -> if model >= 1 then (model+1, Cmd.none) else (model, Cmd.none)
subscriptions model = if model >= 1 then AnimationFrame.times Tick else Sub.none
view model = Html.div []
[ Html.text (toString model)
, Html.button [Html.Events.onClick On] [Html.text "On"]
, Html.button [Html.Events.onClick Off] [Html.text "Off"]
]
OvermindDL1
commented
Aug 4, 2016
|
Managed to reduce a program, it happens for me quite reliably in Chrome, just click On then Off a few times and you will eventually see Main.elm:2417 Uncaught TypeError: Cannot read property 'ctor' of null
step @ Main.elm:2417
work @ Main.elm:2533pop up into your javascript console fairly quickly and the entire Elm app will stop functioning, thus killing the page. If you attach a debugger in chrome to the javascript and have it stop on unhandled exceptions then you will see exactly where. The simplified code is (add a dependency on elm-lang/animation-frame as it supplies a wonderfully easy way to reproduce this): port module Main exposing (..)
import AnimationFrame
import Html
import Html.App
import Html.Events
main =
Html.App.program
{ init = init
, view = view
, update = update
, subscriptions = subscriptions
}
init = (0, Cmd.none)
type Msg
= On
| Off
| Tick Float
update msg model =
case msg of
On -> if model == 0 then (1, Cmd.none) else (model, Cmd.none)
Off -> if model >= 1 then (0, Cmd.none) else (model, Cmd.none)
Tick t -> if model >= 1 then (model+1, Cmd.none) else (model, Cmd.none)
subscriptions model = if model >= 1 then AnimationFrame.times Tick else Sub.none
view model = Html.div []
[ Html.text (toString model)
, Html.button [Html.Events.onClick On] [Html.text "On"]
, Html.button [Html.Events.onClick Off] [Html.text "Off"]
] |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
OvermindDL1
Aug 4, 2016
elm-lang@9320969 should have fixed this issue too. It did for me. Can anyone else check their issues too, such as @Ryan1729 ?
OvermindDL1
commented
Aug 4, 2016
|
elm-lang@9320969 should have fixed this issue too. It did for me. Can anyone else check their issues too, such as @Ryan1729 ? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Better to follow in #628 |
Ryan1729 commentedMay 18, 2016
The following program gets stuck displaying
Trueonce the mouse is clicked instead of the expected 'Trueon mouse down,Falseon mouse up' behaviour.Interestingly enough if you change the
Falsein theinittoTrueit manages to go fromTruetoFalsethen get stuck onTrueagain.