Better document how organizing remote running code in different functions and modules #2489

Open
janschulz opened this Issue Oct 13, 2012 · 11 comments

Projects

None yet

4 participants

Contributor

Currently organizing code in different modules or different functions results in errors when such code is run remotely (see #2473 (comment) for an example).

It would be nice if the documentation would show how to work around such issues and maybe implement a simple way to decorate such functions so that running them remotely is easy.

Owner
minrk commented Oct 13, 2012

in your case, what is the engine working directory?

Does dv.execute("import functions") work? If not, then I think the right behavior is happening.

Owner
minrk commented Oct 13, 2012

It is not unreasonable for module functions to require that their module be present. If you want to explicitly note that functions do not need their module, then you can use the @interactive decorator, which tells IPython that a function should be treated as if it was typed interactively instead of as a part of the module in which it is defined.

You are right that the old version of the options pricer will see an ImportError (unless the engines share the client's working dir), and I can confirm that simply applying @interactive to price_options, works as expected:

from IPython.parallel import interactive

@interactive
def price_options(...):
   """the rest unchanged"""

Based on that, I am marking this as a docs bug, and pinging @ellisonbg, as he is the one who removed the files that are in the 0.13 docs via literalinclude, but are now missing.

Contributor

The engines are on a different computer and the engines use the default working dir (whatever is used when running the "ipcluster engine" from a comandline)

Just to make it clear:
my "project" consists of two files: functions.py and main.py. Running main.py with a above code worked before the change but not with the change from #2395.

If that's really the intended change, then how would one split code into different files? Do I really have to put every function into main.py? From my current understading, both of this is not possible:

  • Splitting the remote function into a different file (this issue) [Update: ok, with @interactive]
  • running a function remotely which calls a function from the same file where the function comes from #2473 (comment)

So splitting code and organizing it is actually not possible at all: your remote function must be single function (or use code from installed libraries)

Owner
minrk commented Oct 14, 2012

If that's really the intended change, then how would one split code into
different files?

Plenty of ways, involving push/pull, @interactive, or possibly sending
file contents around. See below.

Do I really have to put every function into main.py?

Absolutely not.

From my current understading, both of this is not possible:

  • Splitting the remote function into a different file (this issue)
  • running a function remotely which calls a function from the same
    file where the function comes from #2473

So splitting code and organizing it is actually not possible at all: your
remote function must be single function (or use code from installed
libraries)

I guess my comment above was not clear. There are many ways for you to
break up code within a single script, or deal with simple modules only
available on the Client:

  • use scripts as scripts, e.g. with view.run('script.py') to make names
    available
  • use @interactive to associate module functions with the user namespace
    (and dissociate them from their module)
  • send functions with push/pull to establish required namespace
  • send the modules themselves, so they are importable remotely:
def send_module(view, mod):
    fname = mod.__file__
    with open(fname) as f:
        data = f.read()

    def _write_module(filename, filedata):
        import os
        if os.path.exists(filename):
            return False
        with open(filename, 'w') as f:
            f.write(filedata)
        return os.path.abspath(filename)

    return view.apply_async(_write_module, fname, data)

import functions
send_module(view, functions)

The only difference between before/after #2395 is that if you want to use
a function from a local module on an engine where that module is not
available,
you have to decorate that function with @interactive. Other than that,
the exact same things will and will not work.

Contributor

use scripts as scripts, e.g. with view.run('script.py') to make names available
use @interactive to associate module functions with the user namespace (and dissociate them from their module)
send functions with push/pull to establish required namespace
send the modules themselves, so they are importable remotely:

This still leaves out a the case of a LoadBalancedView, where new engines can appear during the execution of many tasks... I will give it a try to build a @remote_runable(put_in_namesspace=[function1, function2]) decorator, which takes the required function names with it...

def send_module(view, mod):
[...]

It would be nice if this (and the rest of you suggestion) could be made available in Client. And documented :-) I've renamed the issue accordingly...

Contributor

A first try (should probably get a extra flag in loadbalanced view or in the client, which then would just wrap the function, instead of doing it manually in the calling code):

from IPython.parallel import Client, RemoteError, AsyncResult
c = Client(profile="ipython_test")  
lv = c.load_balanced_view()  

def remote_wrapped(*args_fn, **kwargs_fn):
    po=""
    try:
        current_func = kwargs_fn.pop("__current_func__")
        local_functs = kwargs_fn.pop("__local_functs__")
        local_modules = kwargs_fn.pop("__local_modules__")
    except:
        #raise Exception("Remote Func sync did not work")
        raise
    modules = {}
    for filename, filedata, modulename in local_modules:
        #import os
        #if os.path.exists(filename):
        #    continue
        orig_name = filename
        if filedata is not None:
            with open(filename, 'w') as f:
                f.write(filedata)
            orig_name = filename[:-4]
        import importlib
        po += "\n%s|%s" %(modulename,filename[:-4])
        modules[modulename] = importlib.import_module(orig_name)
        po += "\n" + str(globals())
    po += "\nall: " + str(globals())
    current_func.func_globals.update(local_functs)
    current_func.func_globals.update(modules)
    kwargs_fn["out"] = po
    return current_func(*args_fn, **kwargs_fn)

def get_locally_defined_functions(fn):
    import os, inspect, sys
    base_path, _ = os.path.split(sys.executable)
    #print(base_path)
    local_refs = {}
    # first get all methods which are locally defined for the method
    for _name, _obj in fn.func_globals.items():
        if not inspect.isfunction(_obj):
            continue
        try:
            _file = inspect.getfile(_obj)
            _file_base = os.path.split(_file)
            if not _file_base[0].startswith(base_path):
                #print("name: %s, file:%s" % (_name, _file_base[0]))
                local_refs[_name] = _obj
        except TypeError:
            # Fails for builtin types
            pass
    return local_refs

def get_locally_defined_modules(fn):
    import os, inspect
    cwd = os.getcwd()
    #print(base_path)
    local_functs = []
    # first get all modules which are locally defined for the method
    for _name, _obj in fn.func_globals.items():
        if not inspect.ismodule(_obj):
            continue
        try:
            _file = inspect.getfile(_obj)
            _file_base = os.path.split(_file)
            #print("%s|%s" % _file_base)
            if _file_base[0].startswith(cwd):
                #print("name: %s, file:%s" % (_name, _file_base[0]))
                with open(_file) as f: # use the pyc
                    data = f.read()    
                local_functs.append((_file_base[1], data, _name))
            else:
                _file_name = _file_base[1]
                if _file_name.startswith("__init__"):
                    _file_name = os.path.basename(_file_base[0])
                    #print(_file_name)
                local_functs.append((_file_name, None, _name))
        except TypeError:
            # Fails for builtin types
            pass
    return local_functs



def sync_local_functions(*args, **kwargs):
    #print args
    #print kwargs
    # Add all local defined functions (including the current method!) to the arguments
    kwargs["__local_functs__"] = get_locally_defined_functions(args[0])
    kwargs["__local_modules__"] = get_locally_defined_modules(args[0])
    kwargs["__current_func__"] = args[0]
    # Wrap the function, so that it uses the new arguments
    newargs = (remote_wrapped,) + args[1:]
    return newargs, kwargs


import import_test as blubber
import numpy as np
def do_work2(inputs=None, out=""):
    #return str(globals())
#    np.random.normal(1,1,1)
    #rest = str(globals())
    return "input=" + blubber.make_happy(inputs) #+ "\n" + rest + "\n" + out

tasks = []

for i in range(2):
    args, kwargs = sync_local_functions(do_work2, inputs=i, )
    tasks.append(lv.apply(*args, **kwargs))  

# Take that, you ansi escape sequence bitches...
from colorama import init
init()

for task in tasks:
    try:
        print(str(task.metadata))
        print(str(task.get()))
    except RemoteError as e:
        e.print_traceback()
        print e
        if e.engine_info:
            print "e-info: " + str(e.engine_info)
        if e.ename:
            print "e-name:" + str(e.ename)



Whats working:

def func_a:
    pass
from numpy import np
import func_x from local_module
import local_module
import local_module as bar
def simualtion():
    func_a(...)
    func_x(...)
    module.func(...)
    bar.func(...)
    np.random.normal(1,1,1)

Whats not working:

def simulation():
    from local_module import func_b
    import local_module [as xxx]
    # basically every local import *only* inside the function won't work

If you have a way to get at the local imports inside the function, then I thing this would be great...

Owner
minrk commented Oct 14, 2012

@JanSchulz thanks for the sample code. I'm not sure that I could write a version of this that would belong in library code, but it definitely belongs in docs/examples or the cookbook at least.

As you point out, one case that is not served particularly well by IPython.parallel is using a LoadBalancedView with engines arriving over time where the engines need some initialization prior to assigning tasks. We do relatively well when you can use a DirectView to set up your engine namespaces prior to task submission, but the only hook we have for initializing an engine is the regular IPython startup code. I don't have a great model for how to improve this use case.

Contributor

@minrk I've updated the code to also work with modules. So basically every import works but local modules/functions imported inside the function (you would need to do a import local_module beforehand to just push the code through).

This model currently replaces the module level .pyc every time the task is run, so it works on a LoadBalanceView but is probably not very efficient, especially when the module is big.

To solve that probably the lbv would need a "setup" task, which the hub would then run once on every engine before any other task from that lbv would be sent to that engine.

Owner
minrk commented Oct 14, 2012

If we had a send_module function and a mechanism for clients to register one or more tasks to be run at startup of an engine prior to availability for new tasks, then this use case should be covered, yes? It would look like:

import functions
rc.register_startup_task(send_module, functions)

lv = rc.load_balanced_view()
proceed_unchanged()
...

As opposed to the mechanism that already works today, as long as new engines are not arriving while the queue is processing (and you want your tasks to be assigned to them):

import functions

dv = rc[:]
send_module(dv, functions) # defined above

lv = rc.load_balanced_view()
proceed_unchanged()
...
Contributor

Yep, the above would be nice. Then the "help" would simple say that by putting all your simulation code into a new module and doing an import and registering it with a start_task would be enough.

I think that would be a very easy way to both explain it in the documentation and to work with it.

@minrk minrk was assigned Jan 20, 2013
@minrk minrk modified the milestone: 4.0, 3.0 Nov 14, 2014
@Carreau Carreau modified the milestone: 4.0, 5.0 Jun 12, 2015

Hi, I am trying to use send_module function from #2489 to make my module available on remote engines.

First of all I am suspicious about lines

        if os.path.exists(filename):
            return False

because after sending module once, changing it locally and trying to send again would not work, am I right?

Another annoying problem I got is that anytime I try to send my module the only thing that is being written to a file on remote machine is <memory at 0x2aac02ac43e0> or similar stuff.

I would be grateful for a help.

@Carreau Carreau modified the milestone: 5.0, wishlist May 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment