Fix simulation of return from unwind block like in following example:… #8509

dionisiydk · 2021-02-04T23:32:32Z

Fix simulation of return from unwind block like in following example:

[ #some ] ensure: [ ^#unwindReturn ]

… [ #some ] ensure: [ ^#unwindReturn ]

dionisiydk · 2021-02-04T23:32:49Z

Fixes #8443

dionisiydk · 2021-02-05T00:16:59Z

While it fixes original example it is more complex problem.
I have another similar test which also hangs the image.

But if tests are fine PR can be integrated. It's a good step anyway

StevenCostiou · 2021-02-05T17:08:39Z

src/Kernel/Context.class.st

+	(context notNil and: [ context unwindBlock = closureOrNil ]) ifTrue: [
+		"Here the return happens inside the unwind block like in following example: 
+				[ #test ] ensure: [ ^ #returnFromUnwindBlock ]
+		In such cases #findNextUnwindContextUpTo: returns the corresponding #ensure context
+		because it is always a sender of unwind block.
+		Here we detect this case and lookup the next unwind context after the current ensure section (current unwind)"
+		context := context findNextUnwindContextUpTo: newTop ].


If I understand correctly, before your fix the aboutToReturn:through: is sent with the ensure: context as context.
Then the fix makes sure that the returned context is the next context to unwind, which is the return in the ensure block.

Did you identify what causes the hanging in the first case?

The real bug still exists: when simulation is going into #aboutToReturn:through: the image hangs.
My fix just removes the one specific case where simulation should not go this way. But the real problem is still here. And it's easy to reproduce:

[ [ 100 + 200 ] ensure: [^600] ] ensure: [ #done ]

Step inside [^600] block. Then step over it will hang the image.

So the situation is much worser. We can't step over return inside ensure:

[ ^100 ] ensure: [ 10+30]

Step into block [^100]. Then step over it will hang the image.
I checked also Pharo 7 - same issue

So why the image hangs in all those cases? This is very interesting story :)
The short answer: bugs in the debugger lead to the state of target process where it never ends.

The last call of any process is the termination which ends with #suspend.

newProcess ^Process forContext: [self value. Processor terminateActive] asContext priority: Processor activePriority

The block here is actually never returned and it must not return. If you will remove the #terminateActive call then processes will end with BlockCannotReturn error (see cannotReturn:) because VM will not find the sender to continue the process.

And it is what really happens when we got hanging in the debugger:

#terminateActive call does not suspend the current process and simply returns.

VM calls #cannotReturn: which triggers the debugger:

new debugger is requested

debugger tries to #suspend activeProcess but it does nothing (see below) and process continues.

as process continues the method #cannotReturn: returns.

cannotReturn: has no sender and therefore VM calls cannotReturn: again - Infinite recursions began.

Now why #terminateActive and #suspend can be not working.

The problem is how debugger simulates the #activeProcess calls during stepping operations:.

process := Process activeProcess

The StepOver this expression is evaluated by the debugger process (UIProcess) but still the result will be the process under the debugger (not a UIProcess).
To achieve this the debugger wraps all stepping operations using #effectiveProcess trick:

step ^Processor activeProcess evaluate: [suspendedContext := suspendedContext step] onBehalfOf: self

The active process here is UIProcess (which handles StepOver click). It sets the variable #effectiveProcess to the process under the debugger. During the block execution the Processor will always use #effectiveProcess as an active one instead of the real active process which executes the debugger command.
For example you can step over "Processor terminateActive" and it will not terminate the UIProcess while the command is really executed by UIProcess.

Now it should be clear how the normal process completion becomes broken: if #effectiveProcess variable is not reset by the debugger the #terminateActive call will never kill the current process.
And same issue happens with #suspend call. When debugger is requested it uses "Processor activeProcess" to get a process for debugging and to suspend it but it will never return the actual current process. Therefore the #suspend from the debugger will not stop the current process (it will suspend some other process) and the current process will continue execution which will cause the buggy recursion.

Now why effectiveProcess can be not reset? Method #evaluate:onBehalfOf: contains the ensure logic to reset the variable back.
The block here is supposed to do the context/stack manipulations. And some of them are really not trivial. The processing of unwinds (ensure parts) are their responsibility. So it's obvious that there can be some bugs or missing parts. Especially that we don't have good tests for simulation logic.

For this concrete issue with StepOver the problem is in the way how debugger intercepts the return from the callee. The current trap is not enough for the returns inside unwind contexts. And therefore the simulated code is not stopped on the current context and completely finishes the execution up to last #terminateActive in the #newProcess. So #terminateActive is executed with effective process and it hangs the image.

The trap needs to be improved and maybe it's not really easy to do. But what is more important is to find a general fix for described problem.
I hope you understand my explanation. I will push some obvious changes soon: #newProcess should ensure the termination of actual active process despite on any assigned effectiveProcess.

StevenCostiou · 2021-02-05T17:10:07Z

I have another similar test which also hangs the image.
Could you share that other test?

But if tests are fine PR can be integrated. It's a good step anyway
I approved because, as far as I understand, it looks good and all tests are passing.
Perhaps @guillep or @tesonep should have a look?

Ducasse · 2021-02-17T21:35:57Z

Hi denis
Just that you know steven is dead sick at home.
S.

StevenCostiou · 2021-02-26T20:33:52Z

Hi, sorry I am now on holidays even if I keep an eye from time to time.
After I come back on March 8th I will review if this is still open.

isCzech · 2021-11-18T20:29:58Z

Hi, please check PR #10383; I think the PR fixes the issue. Thanks.

guillep · 2021-11-24T09:26:09Z

Hi all, it's been really difficult to review the batch of issues (#8567 , #8845, #9137, #8993, #8509).
We have spent in the last internal sprints several hours taking a look at them, but it is a reality that those changes are difficult to review, some like #8567 are very big, their effects in the system are difficult to foresee independently and even worse in combination.

@tesonep is correctly (!!) proposing to my ear that we make a single PR joining all of them to reduce the merge noise, let the CI test the combination of all issues. I then propose that we eagerly merge this (meta) PR and be ready to quickly apply hot-fixes as soon as possible or revert it in case of fire.

Thoughts? @dionisiydk @Ducasse @isCzech @estebanlm

guillep · 2021-11-24T10:12:53Z

Superseeded by #10429

Fix simulation of return from unwind block like in following example:…

820a6ee

… [ #some ] ensure: [ ^#unwindReturn ]

StevenCostiou added the Status: Tests passed please review! label Feb 5, 2021

StevenCostiou approved these changes Feb 5, 2021

View reviewed changes

StevenCostiou added this to Open PR in Debugger and Debugging via automation Feb 5, 2021

StevenCostiou added the Project: Debugger label Feb 5, 2021

MarcusDenker closed this Feb 17, 2021

Debugger and Debugging automation moved this from Open PR to Closed PR Feb 17, 2021

MarcusDenker reopened this Feb 17, 2021

Debugger and Debugging automation moved this from Closed PR to Open PR Feb 17, 2021

isCzech mentioned this pull request Nov 19, 2021

Fix #8443 Crash on step over #10383

Closed

guillep mentioned this pull request Nov 24, 2021

[Very critical fix] Processor>>#terminateRealActive #8567

Closed

guillep closed this Nov 24, 2021

Debugger and Debugging automation moved this from Open PR to Closed PR Nov 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix simulation of return from unwind block like in following example:… #8509

Fix simulation of return from unwind block like in following example:… #8509

dionisiydk commented Feb 4, 2021

dionisiydk commented Feb 4, 2021

dionisiydk commented Feb 5, 2021

StevenCostiou Feb 5, 2021

dionisiydk Feb 6, 2021

dionisiydk Feb 6, 2021

dionisiydk Feb 8, 2021

StevenCostiou commented Feb 5, 2021

Ducasse commented Feb 17, 2021

StevenCostiou commented Feb 26, 2021

isCzech commented Nov 18, 2021

guillep commented Nov 24, 2021

guillep commented Nov 24, 2021

Fix simulation of return from unwind block like in following example:… #8509

Fix simulation of return from unwind block like in following example:… #8509

Conversation

dionisiydk commented Feb 4, 2021

dionisiydk commented Feb 4, 2021

dionisiydk commented Feb 5, 2021

StevenCostiou Feb 5, 2021

Choose a reason for hiding this comment

dionisiydk Feb 6, 2021

Choose a reason for hiding this comment

dionisiydk Feb 6, 2021

Choose a reason for hiding this comment

dionisiydk Feb 8, 2021

Choose a reason for hiding this comment

StevenCostiou commented Feb 5, 2021

Ducasse commented Feb 17, 2021

StevenCostiou commented Feb 26, 2021

isCzech commented Nov 18, 2021

guillep commented Nov 24, 2021

guillep commented Nov 24, 2021