Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add associative keyword for mv and cp #64

Merged
merged 1 commit into from
Jun 13, 2022

Conversation

DrChainsaw
Copy link
Collaborator

Fixes #60

This adds the associative keyword for mv and cp which makes combine applied recursively when there are multiple values that needs to be combined. This is the same as what is done for reducevalues and enables more parallelism.

I chose a different default value compared to reducevalues to avoid breakage, but unless someone objects I'll change the default to be consistent with reducevalues in a subsequent breaking release.

@jpsamaroo : I think this is a universal way to reduce in a parallel, but if you have time I appreciate a check that it's not just accidentally depending on some scheduler implementation detail.

Example:

julia> using FileTrees, Distributed

julia> addprocs(10; exeflags=["--project", "--threads=1"], lazy=false);

julia> @everywhere using FileTrees, Distributed

julia> @everywhere function myvcat(x, y)
                   @info "combine lengths $(length(x)) and $(length(y))"
                   sleep(1) # fake slowness
                   vcat(x,y)
               end

julia> tt = mapvalues(identity, maketree("root" => ["next" => [(name=string(x), value=1:10) for x in 'a':'k']]); lazy=true);

julia> ttm = mv(tt, r"next/[a-z]$", s"next"; combine=myvcat, associative=false);

julia> @time exec(ttm);
      From worker 8:    [ Info: combine lengths 10 and 10
      From worker 8:    [ Info: combine lengths 20 and 10
      From worker 8:    [ Info: combine lengths 30 and 10
      From worker 8:    [ Info: combine lengths 40 and 10
      From worker 8:    [ Info: combine lengths 50 and 10
      From worker 8:    [ Info: combine lengths 60 and 10
      From worker 8:    [ Info: combine lengths 70 and 10
      From worker 8:    [ Info: combine lengths 80 and 10
      From worker 8:    [ Info: combine lengths 90 and 10
      From worker 8:    [ Info: combine lengths 100 and 10
 11.606803 seconds (27.74 k allocations: 1.300 MiB)

julia> ttm_assoc = mv(tt, r"next/[a-z]$", s"next"; combine=myvcat, associative=true);

julia> @time exec(ttm_assoc);
      From worker 2:    [ Info: combine lengths 10 and 10
      From worker 4:    [ Info: combine lengths 10 and 10
      From worker 6:    [ Info: combine lengths 10 and 10
      From worker 2:    [ Info: combine lengths 10 and 10
      From worker 7:    [ Info: combine lengths 10 and 20
      From worker 4:    [ Info: combine lengths 10 and 20
      From worker 2:    [ Info: combine lengths 10 and 20
      From worker 4:    [ Info: combine lengths 20 and 30
      From worker 2:    [ Info: combine lengths 30 and 30
      From worker 2:    [ Info: combine lengths 50 and 60
  4.793218 seconds (43.04 k allocations: 2.158 MiB, 0.90% compilation time)

@DrChainsaw DrChainsaw merged commit a46cdc9 into shashi:master Jun 13, 2022
@DrChainsaw DrChainsaw deleted the assoc_mv_cp branch June 13, 2022 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mv does not seem to parallelize combine
1 participant