Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does seems to work; #62

Open
xiaodaigh opened this issue Feb 14, 2022 · 2 comments
Open

Does seems to work; #62

xiaodaigh opened this issue Feb 14, 2022 · 2 comments

Comments

@xiaodaigh
Copy link

using Distributed
addprocs()

using CSV, FileTrees

CSV.write("c:/tmp/meh/a.csv", DataFrame(a = 1:3))
CSV.write("c:/tmp/meh/b.csv", DataFrame(a = 1:3))

a = FileTrees.FileTree("c:/tmp/meh", lazy=true)


b = FileTrees.load(a, lazy = true) do file
    1
end;

ok = mapvalues(b) do y
    y + 1
end;

ok2 = reducevalues(+, ok)

exec(ok2)

I am expecting 4 to be returned but it's complaining about

ERROR: LoadError: On worker 2:
KeyError: key Dagger [d58978e5-989f-55fb-8d15-ea34adc7bf54] not found

I am on Julia 1.7.2 and FileTrees 0.3.4

@jpsamaroo
Copy link
Collaborator

Try @everywhere using FileTrees before doing anything with FileTrees to ensure that FileTrees and Dagger are properly scoped in Main.

@DrChainsaw
Copy link
Collaborator

Longer explanation: When you addprocs with Distributed, you spin up new Julia processes which are pretty much independent of the process you call addprocs from. As such, they don't know anything about what modules you have loaded and what variables you have declared. This is also what allows Distributed to run on multiple machines connected over a network, e.g. a compute cluster. The @everywhere commad just means "run this command on all processes".

Note that if you are not running from the default environment (e.g. you have started Julia with --project or have ran Pkg.activate()) you also need to ensure that the added processes are running in the same environment or else you will get a similar error as above. Most failsafe way to do this is to run addprocs(...; exeflags ="--project").

Afaik, Dagger also makes seamless usage of Threads which don't require jumping through the above hoops to get parallelism as they run inside the same process. There are some subtleties w.r.t. memory allocation which in some cases makes running multiple threads slower than multiple processes, so it can be useful to try both though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants