Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error messages in ipengine/ipcontroller #2473

Closed
jankatins opened this issue Oct 6, 2012 · 9 comments · Fixed by #2818
Closed

Better error messages in ipengine/ipcontroller #2473

jankatins opened this issue Oct 6, 2012 · 9 comments · Fixed by #2818
Labels
Milestone

Comments

@jankatins
Copy link
Contributor

I currently try to debug why my ipengines don't work on one maschine (on another they run fine):

On the ipengine side I get this in the logs (--log-level=DEBUG, This is from a debug run with both engine and controller on the 32bit maschine):

2012-10-06 17:39:39.854 [IPEngineApp] Loading url_file u'C:\\Users\\jschulz\\.ipython\\profile_test-shutup\\security\\ipcontroller-engine.json'
2012-10-06 17:39:39.855 [IPEngineApp] Config changed:
2012-10-06 17:39:39.855 [IPEngineApp] {'RegistrationFactory': {}, 'EngineFactory': {'url': u'tcp://127.0.0.1:52977', 'location': u'192.168.56.1', 'sshserver': u''}, 'Session': {'packer': u'json', 'unpacker': u'json', 'key': 'b2161611-8143-4b71-980e-c35d57aa1eac'}, 'IPEngineApp': {'cluster_id': u'', 'log_to_file': True, 'log_level': u'DEBUG'}, 'ProfileDir': {'location': u'C:\\Users\\jschulz\\.ipython\\profile_test-shutup'}}
2012-10-06 17:39:39.871 [IPEngineApp] Registering with controller at tcp://127.0.0.1:52977
2012-10-06 17:39:40.056 [IPEngineApp] Using existing profile dir: u'C:\\Users\\jschulz\\.ipython\\profile_test-shutup'
2012-10-06 17:39:40.058 [IPEngineApp] Completed registration with id 1
2012-10-06 17:40:50.285 [IPEngineApp] 
*** MESSAGE TYPE:apply_request***
2012-10-06 17:40:50.285 [IPEngineApp]    Content: {}
   --->


On the controller side I see the engine registered successfully and then sending lots of tasks (arrived on x) but no "finished" message. The controller afterwards sends the same tasks out to other engines. There is alsmost no CPU on the engine side.

The engines work on a 64bit system, the non-working engines are on 32-bit Win7. Runing the same code "unclustered" works on both the 32bit and 64bit platform.

Running the whole cluster on the 32 bit win7 is resulting in the same problem (see log above, I get the same log when running against the normal controller).

Is there any way to get a better debug message?

@jankatins
Copy link
Contributor Author

I found a way:

try:
    task.get())
except RemoteError as e:
    e.print_traceback()
    print e
    if e.engine_info:
        print "e-info: " + str(e.engine_info)
    if e.ename:
        print "e-name:" + str(e.ename)

Result in this:

---------------------------------------------------------------------------
ImportError                               
Traceback (most recent call last)C:\portabel\Python27\lib\site-packages\IPython\zmq\serialize.pyc in unpack_apply_message(bufs, g, copy)
    184         for i in range(2):
    185             bufs[i] = bufs[i].bytes
--> 186     f = uncan(pickle.loads(bufs.pop(0)), g)
    187     info = pickle.loads(bufs.pop(0))
    188     arg_bufs, kwarg_bufs = bufs[:info['narg_bufs']], bufs[info['narg_bufs']:]
C:\portabel\Python27\lib\site-packages\IPython\utils\pickleutil.pyc in uncan(obj, g)
    228             break
    229         elif isinstance(obj, cls):
--> 230             return uncanner(obj, g)
    231 
    232     if import_needed:
C:\portabel\Python27\lib\site-packages\IPython\utils\pickleutil.pyc in <lambda>(obj, g)
    270 
    271 uncan_map = {
--> 272     CannedObject : lambda obj, g: obj.get_object(g),
    273 }
    274 
C:\portabel\Python27\lib\site-packages\IPython\utils\pickleutil.pyc in get_object(self, g)
     99         # try to load function back into its module:
    100         if not self.module.startswith('__'):
--> 101             __import__(self.module)
    102             g = sys.modules[self.module].__dict__
    103 
ImportError: No module named functions
ImportError(No module named functions)
e-info: {'engine_id': 8, 'method': 'apply', 'engine_uuid': 'cbe81479-5f86-4712-956e-6bdc560113a5'}
e-name:ImportError

Can the engine or the controller please print that as well? At least everything which has somethin to do with "ImportError"?

@jankatins
Copy link
Contributor Author

My code is separated so that I have a "setup" code in my main.py and some functions in another file "functions.py". Main calls a function from "functions.py" which calls tasks.append(lv.apply(simulate_function, arg1=aerg1, arg2=arg2)), where simulate_function is also in "functions.py" will call other functions from functions.py.

main.py:

from functions import simulate
...
simulate(...)

functions.py

def simulate(...):
    ...
    tasks.append(lv.apply(simulate_function, arg1=aerg1, arg2=arg2))
    ....
def simulate_function(...):
    input = ...
    _add_errors(input)
    ...
def _add_errors(...):
    ...

What I don't understand is why it works on one computer and not on the other (both run the same ipython version from today; one 64bit, one 32bit Win7).

I tried reverting 46ee6cb on the 32bit side, but that didn't help:

�[1;31m---------------------------------------------------------------------------�[0m
�[1;31mNameError�[0m                                 Traceback (most recent call last)�[1;32m<string>�[0m in �[0;36m<module>�[1;34m()�[0m
�[1;32mC:\data\phd\montecarlo-paper\functions.py�[0m in �[0;36msimulate_function�[1;34m(arg1, arg2)�[0m
�[1;31mNameError�[0m: global name '_add_errors' is not defined
NameError(global name '_add_errors' is not defined)
e-info: {'engine_id': 10, 'method': 'apply', 'engine_uuid': 'e4f650d1-e91b-486c-97df-9e0ef352ec95'}
e-name:NameError

@jankatins
Copy link
Contributor Author

My current workaround (and example code...):

from IPython.parallel import Client, RemoteError, AsyncResult
c = Client(profile="laptop-only")  
lv = c.load_balanced_view()  

def do_work(input=None, functs=None):
    try:
        # throws an error on my remote 32bit win7 computer
        _make_happy = make_happy
    except:
        #raise 
        _make_happy = functs["make_happy"]
    return _make_happy(input)

def make_happy(input=None):
    import sys
    return sys.path

functs = {}
functs["make_happy"] = make_happy

tasks = []


for i in range(10):
    tasks.append(lv.apply(do_work, input=i, functs=functs))  


for task in tasks:
    try:
        print(str(task.get()))
        print(str(task.metadata))
    except RemoteError as e:
        e.print_traceback()
        print e
        if e.engine_info:
            print "e-info: " + str(e.engine_info)
        if e.ename:
            print "e-name:" + str(e.ename)

@minrk
Copy link
Member

minrk commented Oct 13, 2012

Result in this gibberish:

That's not gibberish, that's ANSI color escapes. To disable the colors, do:

client[:].execute("%colors nocolor")

or set ZMQInteractiveShell.colors = 'NoColor' configurable in your ipengine_config to change the default behavior.

@jankatins
Copy link
Contributor Author

@minrk It would be nice if that setting could be applied if the underlying system can not read it (i.e. on windows). I'm not sure how to do this: is this acceptable:

            # Hack to not get ansi sequences in stacktraces on windows
            if os.name!='posix':
                self.config.ZMQInteractiveShell.colors = 'NoColor'
            # FIXME: This is a hack until IPKernelApp and IPEngineApp can be fully merged
            app = IPKernelApp(config=self.config, shell=self.kernel.shell, kernel=self.kernel, log=self.log)

Unfortunately the ZMQInteractiveShell.colors is also not in the generated default config :-(

@minrk
Copy link
Member

minrk commented Oct 14, 2012

It would be nice if that setting could be applied if the underlying system can not read it (i.e. on windows).

Not really doable, since the notebook and qtconsole can both draw the colors just fine, and your engines could be on *ix, while your client is on Windows, etc. or vice versa, where that logic would be wrong. The problem is, as I have argued frequently, that colors are hardcoded at the completely wrong time (when the traceback is generated, rather than when/where it is to be drawn). This prevents us from making sensible decisions like you suggest.

Unfortunately the ZMQInteractiveShell.colors is also not in the generated default config :-(

That's really a minor documentation issue for generating the default config (which is nothing but a big comment block). It doesn't mean it can't be used. I will make sure to add the ZMQInteractiveShell to the ipengineapp list.

@jankatins
Copy link
Contributor Author

from my standpoint (and with the ideas in #2489), what is missing here is instead of simple returning the exception to the client, it would be nice if the engine could print the exception to the log:

(Pseudocode...)

try:
    return_val = run_code(...)
except Exception as e:
   if log_level < 30:
       print_to_log(e)
   return_val = e
send_to_client(return_val)

@minrk
Copy link
Member

minrk commented Oct 14, 2012

@minrk
Copy link
Member

minrk commented Jan 20, 2013

can you test #2818, and see if it is satisfactory?

@minrk minrk closed this as completed in 4838344 Jan 23, 2013
minrk added a commit that referenced this issue Mar 5, 2013
so it will appear in default ipengine_config.py, `--help-all`, etc.

as [mentioned](#2473 (comment)) in #2473
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this issue Nov 3, 2014
log user tracebacks in the kernel (INFO-level)

Should make debugging errors a little less painful

closes ipython#2473
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants