-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can't import a local module from dask workers #187
Comments
If you have a fixed set of workers then use Client.upload_file
If you are willing to push this module to github then consider adding a pip
install line to your worker-template with the EXTRA_PIP_PACKAGES
environment variable
For changing the behavior of how functions in other files serialize you'll
have to take that upstream with cloudpickle/pickle
…On Wed, Mar 28, 2018 at 10:06 PM, Ryan Abernathey ***@***.***> wrote:
In my research, I commonly write a module where I dump longer functions /
classes and import it from my notebooks. This reduces clutter in the
notebook. Perhaps these functions are used in multiple notebooks, but I
don't consider them reusable / general / stable enough to actually package,
distribute, etc. I imagine many people work this way.
As a concrete example, I have a file in my examples directory on
pangeo.pydata.org called 'foo.py':
class Foo(object):
def __init__(self):
pass
I import this from a notebook and create an instance
import foo
f = foo.Foo()
Now I create a cluster and try to scatter this object
from dask.distributed import Clientfrom dask_kubernetes import KubeCluster
cluster = KubeCluster(n_workers=1)
client = Client(cluster)
client.scatter(f)
I get a long error, the gist of which is distributed.core - ERROR - No
module named 'foo'.
What is confusing to me is that this example would work perfectly fine if
I just defined Foo within a cell in the notebook.
We should think about how to support this sort of thing, because I feel
like lots of people work this way.
Obviously related to other customizable environment issues such as #136
<#136>, #133
<#133>, #125
<#125>, #67
<#67>, etc.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#187>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszN0Fkr64BzjCoBjGYCjqJXSY0L9qks5tjEGcgaJpZM4S_oba>
.
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In my research, I commonly write a module where I dump longer functions / classes and import it from my notebooks. This reduces clutter in the notebook. Perhaps these functions are used in multiple notebooks, but I don't consider them reusable / general / stable enough to actually package, distribute, etc. I imagine many people work this way.
As a concrete example, I have a file in my
examples
directory on pangeo.pydata.org called 'foo.py':I import this from a notebook and create an instance
Now I create a cluster and try to scatter this object
I get a long error, the gist of which is
distributed.core - ERROR - No module named 'foo'
.What is confusing to me is that this example would work perfectly fine if I just defined
Foo
within a cell in the notebook.We should think about how to support this sort of thing, because I feel like lots of people work this way.
Obviously related to other customizable environment issues such as #136, #133, #125, #67, etc.
The text was updated successfully, but these errors were encountered: