Skip to content
This repository

Blocker: Qt console broken after "all magics" menu became dynamic #1057

Closed
fperez opened this Issue November 27, 2011 · 17 comments

4 participants

Fernando Perez Thomas Kluyver Matthias Bussonnier Min RK
Fernando Perez
Owner

It turns out that the merge of #956 introduced a serious, major problem with the Qt console. Unfortunately it's intermittent, which is why we didn't notice it during review, but we do need to fix it.

Symptoms: sometimes, perhaps one out of every four or five invocations, the qt console will not display the first prompt. It shows the banner but the input prompt never appears. The only solution is to kill it and try again, hoping for better. From a git bisection I did, it appears the first bad commit was e9402d7. The way to test this is simply to reset to that commit, make sure you remove all .pyc files just in case and then try opening multiple times the qt console. Eventually, you'll see an instance where it comes out with no input prompt.

@Carreau, I hope you'll be able to figure out a solution for this one... Some detective work by @takluyver pinpointed the issue somewhat, here's our IRC log:

<takluyver> with the Qt one, I think it's to do with the order of requests
<fperez__> mh
<takluyver> there are now 3 requests getting made during startup
<takluyver> one for the prompt number, one for the history, and the one that commit added for the list of magics, which is triggered by the first reply on the shell channel.
<fperez__> yes
<fperez__> is the callback triggering the problem?
<takluyver> Not quite sure
<takluyver> when I tried putting print statements in, the problem stopped happening
<takluyver> I also can't get it to happen with --plain or --pure
<takluyver> damn thing's quantum, just observing it affects it ;)
<fperez__> it's some kind of race condition, so the smallest change to the code will make it go away.
<fperez__> yes, they're called heisenbugs
<takluyver> aha
<fperez__> the very worst kind.

In any case, this is a major blocker. If a full solution isn't found, we'll have to revert the dynamic magic menu out altogether, but I hope we can find a fix.

Thomas Kluyver
Collaborator

Using the commit mentioned above, I'm finding the frequency is a bit lower than I seemed to be getting yesterday. It also seems like it happens a bit more often when the memory is full (i.e. after loading a load of big applications), although this could be a red herring. Yesterday when I tried, I couldn't get it to happen when I had both cores running makework.

Fernando Perez fperez referenced this issue from a commit November 27, 2011
Fernando Perez Temporary fix to work around #1057.
Basically it reverts the effect of #956 and goes back to a static list
for the 'all magics' menu.  I tried to mark very clearly the new code
so we can disable it once a proper fix for #1057 is committed, but we
can't have a broken qt console in master.
65546bf
Fernando Perez
Owner

I've just committed to master a temporary fix that effectively disables #956 without actually deleting @Carreau's code. I simply copied back the old static list and commented out the call to the new dynamic function. @Carreau, if you can find a proper fix, start off this and simply delete the code I added in 65546bf, I made it be a minimal amount that only fixed the current problem and nothing else.

With this workaround, if push comes to shove we could release 0.12 as-is and push this issue back to 0.13. I'd prefer to have a proper fix, but now at least we don't have a completely broken qt console in master.

For that reason I'm downgrading the priority to just 'high', as it's not really a blocker anymore. But I'd much prefer to have a real fix than this band-aid I just put in. @Carreau, let us know if we can help in some way.

Matthias Bussonnier
Collaborator

Hum, I'm totally unable to reproduce this bug (on my debian at least, i'll try on mac). even after 25 launch in a row.
I'm ok to try 100, maybe the 25 last will all crash, we never know with randomness.

I've already connect the action to kernel firstResponse to avoid this bug that otherwise appear 100% of the time. So maybe it's a deeper issue.
I'm not familiar with the kerner startup internals. Do you think it can get confuse if it get the request for the list of magic too early ? Does the bug still appear if you change get_ipython().lsmagic() (mainwindow.py L619) to a more simple expression like ["item1","item2"] ?

Do you think just setting a timer after which updating the menu might be enough or do you want something more specific.
like updating after history request...?

I can also propose to leave the "static" menu at startup but put a 'update menu' at the top of it (ie, just don't trigger the auto update as startup).

Fernando Perez
Owner
Matthias Bussonnier
Collaborator

@fperez
I did test without updating to master, and I tempered with the code to try to reproduce it, so i would have noticed your fix.

Actually, If a dirrectly trigger update_magic_menu() even before kernel's first response, and changed the request kernel from get_ipython().lsmagic() to a static list, then the bug is gone. So my guess is that there might be a deeper issue in a 'too early' request for the list of magics.

I'll try to get it not work on Ubuntu at work....

And BTW, this afternoon is durring the night for me :-)

Fernando Perez
Owner
Matthias Bussonnier
Collaborator

@fperez

Note that I did add this afternoon a fix to master that effectively[...]

So yeah, you kind of mentioned it... :-)

Fernando Perez
Owner
Matthias Bussonnier
Collaborator

I know, don't worry, I was just messing with you. Now it's my turn to be awake while most of you will be asleep if I'm right.

Fernando Perez
Owner
Matthias Bussonnier
Collaborator

I think I got it... (the why, not the how to fix it)
in frontendwidget.py I wrote

    self._request_info['execute'] = self._ExecutionRequest(msg_id, 'silent_exec_callback')

Instead of "appending" another execute request. Then the _handle_execute_callback doesn't see the previous one as assigned to it ...But _ExecutionRequest is a named tuple and the current logic seem to not allow(not be written for ?) multiple execution request waiting. (I'm not quite familiar with the whole logic and named tuple, so I may be wrong)

I might not have have time to deeply look at it during my day/your night. I'll try to make a better "Quick-fix" and try to modifie the current code to handle several multiple execution request waiting. There might be something I haven't understand, and I 'll appreciate some advice in the right way to handle that.

Matthias Bussonnier Carreau referenced this issue from a commit November 28, 2011
Commit has since been removed from the repository and is no longer available.
Matthias Bussonnier
Collaborator

@fperez, @takluyver
could you try this branch [ qtconsole-racecondition ] and tell me if it fixes the race condition for you ?
If so I'll clean it up tomorow (for me), and send a PR.
It basically slightly complicate the _request_info['execute'] handeling by making it a list which stores several waiting execution request.
I know there are still print statement in a few places, I'll remove and rebase later, same for changings the lists to mabe more adapted container.

Min RK
Owner

@carreau - you are correct, the qtconsole does not allow multiple outstanding execution requests, and does not protect itself by preventing them. This is fine if there are no background operations, but now that we have some, it obviously isn't.

I think your approach has the right idea, but I would use a dict by msg_id instead of a list. msg_ids are unique, so there is no worry about duplication.

Thomas Kluyver
Collaborator

I checked out the commit after you reverted @fperez's temporary fix, but before your fix, to check that I could still reproduce the bug. Unfortunately, I can't reproduce it after more than 40 tries, so I won't be able to tell if your fix makes a difference. Hopefully @fperez can check it.

The problem and the approach makes sense. I agree with Min that a dictionary is preferable.

Fernando Perez
Owner
Matthias Bussonnier
Collaborator

Won't have much time to ti re-write it today, and i'll do it with more suitable container.
If we start storing message id in dictionary, does it still make sens to have several _handle_xxx_reply ?
couldn't we just have a _handle_reply and switch code branch depending on what is poped from the dict ?

Matthias Bussonnier Carreau referenced this issue from a commit November 29, 2011
Commit has since been removed from the repository and is no longer available.
Matthias Bussonnier Carreau referenced this issue from a commit November 29, 2011
Commit has since been removed from the repository and is no longer available.
Matthias Bussonnier Carreau referenced this issue from a commit November 29, 2011
Commit has since been removed from the repository and is no longer available.
Matthias Bussonnier Carreau referenced this issue from a commit in Carreau/ipython November 28, 2011
Matthias Bussonnier Revert "Temporary fix to work around #1057."
    This reverts commit 65546bf, done to
    temporaly fixed a race condition introduced by #956, next commits should
    fixe this race condition
4573b16
Matthias Bussonnier Carreau referenced this issue from a commit November 29, 2011
Commit has since been removed from the repository and is no longer available.
Matthias Bussonnier Carreau referenced this issue from a commit November 29, 2011
Commit has since been removed from the repository and is no longer available.
Matthias Bussonnier
Collaborator

Ok, fixed, with dictionnary, (rebased, force pushed ) several times because I'm a moron, and also to put space around equal sign...

There is some redundent info as now, _ExecutionRequest namedTuple not only contains msg_id and kind , but are stored in a dictionnary with msg_id as key...
Just get a question on this else:
https://github.com/ipython/ipython/blob/master/IPython/frontend/qt/console/ipython_widget.py#L173
why is it aligned with if info.kind and not if info , like here :
https://github.com/ipython/ipython/blob/master/IPython/frontend/qt/console/frontend_widget.py#L403

Matthias Bussonnier Carreau referenced this issue from a commit November 29, 2011
Commit has since been removed from the repository and is no longer available.
Matthias Bussonnier Carreau referenced this issue from a commit November 29, 2011
Commit has since been removed from the repository and is no longer available.
Matthias Bussonnier Carreau referenced this issue from a commit in Carreau/ipython November 28, 2011
Matthias Bussonnier qtconsole: fix race-cond, handle multiple exec
	fix race condition in qtconsole due to a race condition beween the population
	of the magic menu and the first prompt request. Resolved by upgrading the
	logic of the console to handle several executions request in parallel.

	Some namedTuple might still carry redundent information but the fix is the
	first priority

	fixes #1057 , introduced by #956
8553345
Fernando Perez fperez closed this in a5fd0a3 November 29, 2011
Fernando Perez fperez referenced this issue from a commit January 10, 2012
Commit has since been removed from the repository and is no longer available.
Brian E. Granger ellisonbg referenced this issue from a commit January 10, 2012
Commit has since been removed from the repository and is no longer available.
Brian E. Granger ellisonbg referenced this issue from a commit January 10, 2012
Commit has since been removed from the repository and is no longer available.
Fernando Perez fperez referenced this issue from a commit January 10, 2012
Commit has since been removed from the repository and is no longer available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.