Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix simulation of return from unwind block like in following example:… #8509

Closed

Conversation

dionisiydk
Copy link
Contributor

Fix simulation of return from unwind block like in following example:

[ #some ] ensure: [ ^#unwindReturn ]

@dionisiydk
Copy link
Contributor Author

Fixes #8443

@dionisiydk
Copy link
Contributor Author

While it fixes original example it is more complex problem.
I have another similar test which also hangs the image.

But if tests are fine PR can be integrated. It's a good step anyway

Comment on lines +1723 to +1729
(context notNil and: [ context unwindBlock = closureOrNil ]) ifTrue: [
"Here the return happens inside the unwind block like in following example:
[ #test ] ensure: [ ^ #returnFromUnwindBlock ]
In such cases #findNextUnwindContextUpTo: returns the corresponding #ensure context
because it is always a sender of unwind block.
Here we detect this case and lookup the next unwind context after the current ensure section (current unwind)"
context := context findNextUnwindContextUpTo: newTop ].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, before your fix the aboutToReturn:through: is sent with the ensure: context as context.
Then the fix makes sure that the returned context is the next context to unwind, which is the return in the ensure block.

Did you identify what causes the hanging in the first case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real bug still exists: when simulation is going into #aboutToReturn:through: the image hangs.
My fix just removes the one specific case where simulation should not go this way. But the real problem is still here. And it's easy to reproduce:

[
   [ 100 + 200 ] ensure: [^600]
] ensure: [ #done ]

Step inside [^600] block. Then step over it will hang the image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the situation is much worser. We can't step over return inside ensure:

[ ^100 ] ensure: [ 10+30]

Step into block [^100]. Then step over it will hang the image.
I checked also Pharo 7 - same issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why the image hangs in all those cases? This is very interesting story :)
The short answer: bugs in the debugger lead to the state of target process where it never ends.

The last call of any process is the termination which ends with #suspend.

newProcess
	^Process
		forContext: 
			[self value.
			Processor terminateActive] asContext
		priority: Processor activePriority

The block here is actually never returned and it must not return. If you will remove the #terminateActive call then processes will end with BlockCannotReturn error (see cannotReturn:) because VM will not find the sender to continue the process.

And it is what really happens when we got hanging in the debugger:

  • #terminateActive call does not suspend the current process and simply returns.
  • VM calls #cannotReturn: which triggers the debugger:
    • new debugger is requested
    • debugger tries to #suspend activeProcess but it does nothing (see below) and process continues.
    • as process continues the method #cannotReturn: returns.
  • cannotReturn: has no sender and therefore VM calls cannotReturn: again - Infinite recursions began.

Now why #terminateActive and #suspend can be not working.

The problem is how debugger simulates the #activeProcess calls during stepping operations:.

process := Process activeProcess

The StepOver this expression is evaluated by the debugger process (UIProcess) but still the result will be the process under the debugger (not a UIProcess).
To achieve this the debugger wraps all stepping operations using #effectiveProcess trick:

step
	^Processor activeProcess
		evaluate: [suspendedContext := suspendedContext step]
		onBehalfOf: self

The active process here is UIProcess (which handles StepOver click). It sets the variable #effectiveProcess to the process under the debugger. During the block execution the Processor will always use #effectiveProcess as an active one instead of the real active process which executes the debugger command.
For example you can step over "Processor terminateActive" and it will not terminate the UIProcess while the command is really executed by UIProcess.

Now it should be clear how the normal process completion becomes broken: if #effectiveProcess variable is not reset by the debugger the #terminateActive call will never kill the current process.
And same issue happens with #suspend call. When debugger is requested it uses "Processor activeProcess" to get a process for debugging and to suspend it but it will never return the actual current process. Therefore the #suspend from the debugger will not stop the current process (it will suspend some other process) and the current process will continue execution which will cause the buggy recursion.

Now why effectiveProcess can be not reset? Method #evaluate:onBehalfOf: contains the ensure logic to reset the variable back.
The block here is supposed to do the context/stack manipulations. And some of them are really not trivial. The processing of unwinds (ensure parts) are their responsibility. So it's obvious that there can be some bugs or missing parts. Especially that we don't have good tests for simulation logic.

For this concrete issue with StepOver the problem is in the way how debugger intercepts the return from the callee. The current trap is not enough for the returns inside unwind contexts. And therefore the simulated code is not stopped on the current context and completely finishes the execution up to last #terminateActive in the #newProcess. So #terminateActive is executed with effective process and it hangs the image.

The trap needs to be improved and maybe it's not really easy to do. But what is more important is to find a general fix for described problem.
I hope you understand my explanation. I will push some obvious changes soon: #newProcess should ensure the termination of actual active process despite on any assigned effectiveProcess.

@StevenCostiou
Copy link
Collaborator

I have another similar test which also hangs the image.
Could you share that other test?

But if tests are fine PR can be integrated. It's a good step anyway
I approved because, as far as I understand, it looks good and all tests are passing.
Perhaps @guillep or @tesonep should have a look?

@StevenCostiou StevenCostiou added this to Open PR in Debugger and Debugging via automation Feb 5, 2021
Debugger and Debugging automation moved this from Open PR to Closed PR Feb 17, 2021
@MarcusDenker MarcusDenker reopened this Feb 17, 2021
Debugger and Debugging automation moved this from Closed PR to Open PR Feb 17, 2021
@Ducasse
Copy link
Member

Ducasse commented Feb 17, 2021

Hi denis
Just that you know steven is dead sick at home.
S.

@StevenCostiou
Copy link
Collaborator

Hi, sorry I am now on holidays even if I keep an eye from time to time.
After I come back on March 8th I will review if this is still open.

@isCzech
Copy link
Contributor

isCzech commented Nov 18, 2021

Hi, please check PR #10383; I think the PR fixes the issue. Thanks.

@guillep
Copy link
Member

guillep commented Nov 24, 2021

Hi all, it's been really difficult to review the batch of issues (#8567 , #8845, #9137, #8993, #8509).
We have spent in the last internal sprints several hours taking a look at them, but it is a reality that those changes are difficult to review, some like #8567 are very big, their effects in the system are difficult to foresee independently and even worse in combination.

@tesonep is correctly (!!) proposing to my ear that we make a single PR joining all of them to reduce the merge noise, let the CI test the combination of all issues. I then propose that we eagerly merge this (meta) PR and be ready to quickly apply hot-fixes as soon as possible or revert it in case of fire.

Thoughts? @dionisiydk @Ducasse @isCzech @estebanlm

@guillep
Copy link
Member

guillep commented Nov 24, 2021

Superseeded by #10429

@guillep guillep closed this Nov 24, 2021
Debugger and Debugging automation moved this from Open PR to Closed PR Nov 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants