Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mlc.SuperPool().map() can not recognize global objects in the function #2

Open
zhiruiwang opened this issue Feb 17, 2018 · 2 comments

Comments

@zhiruiwang
Copy link

When I passing a lambda function to fill in additional parameters, the map function can not find the f function that is called inside lambda function:

import mlcrate as mlc
pool = mlc.SuperPool() 

def f(x,y):
    return x ** (2/y)

res = pool.map(lambda x: f(x, 2), range(1000))
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\multiprocess\pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "C:\Anaconda3\lib\site-packages\pathos\helpers\mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "C:\Anaconda3\lib\site-packages\mlcrate\__init__.py", line 125, in func_tracked
    return func(x), i
  File "<ipython-input-3-c53f7e0849d6>", line 8, in <lambda>
NameError: name 'f' is not defined

Also when I have a global variable in the function, the map function also can not find the global variable

import mlcrate as mlc
pool = mlc.SuperPool() 

y = 2

def f(x):
    return x ** (2/y)

res = pool.map(f, range(1000))
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\multiprocess\pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "C:\Anaconda3\lib\site-packages\pathos\helpers\mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "C:\Anaconda3\lib\site-packages\mlcrate\__init__.py", line 125, in func_tracked
    return func(x), i
  File "<ipython-input-4-474bcea6b16a>", line 7, in f
NameError: name 'y' is not defined

Is there a way to pass these objects to the process pool to let the pool know which global function and variable we want to use?

@mxbi mxbi added the bug label Mar 18, 2018
@mxbi
Copy link
Owner

mxbi commented Mar 18, 2018

Good catch, I didn't encounter these in my testing. SuperPool is a wrapper around pathos.ProcessPool, and it's not very clear to me how the state of the pool can be updated (by sending variables that were created after the pool was) in pathos - so it wouldn't be easy for me to implement that here.

In both of your cases however, a workaround would be to start the pool after creating the function.
It seems that pathos pickles and sends "one layer" of variables when map() is called, so it'll send the function but not any variables referenced in that function. Likewise, if you call it on a lambda function, it'll pickle that function but not the objects contained within unless they were created before the pool was.

In general, I'm not happy with the performance & stability of SuperPool, so a rewrite which solves these issues is in order. I just need to figure out a better way of implementing it :)

@thomasjpfan
Copy link

@zhiruiwang This should work for your use case:

from functools import partial
import mlcrate as mlc
pool = mlc.SuperPool()


def f(x, y):
    return x**(2 / y)

res = pool.map(partial(f, y=2), range(1000))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants