map should iterate #51

joernhees · 2017-02-04T19:23:20Z

passing huge lists / iterables into map or map_as_completed will first "register" them all for computation and only after it exhausted them all, compute them in parallel.

try running the following with python -m scoop example.py and notice how nothing is printed for a long time:

#from scoop.futures import map as parallel_map
from scoop.futures import map_as_completed as parallel_map


def square(x):
    return x * x

if __name__ == '__main__':
    squares = parallel_map(square, range(1000000))
    for sq in squares:
        print(sq)

I think the reason is in https://github.com/soravux/scoop/blob/master/scoop/futures.py#L94 . In python 3 map returns an iterable. Even for python 2 it would be cool if the internal function would batch the input.

The text was updated successfully, but these errors were encountered:

soravux · 2017-02-18T17:50:54Z

You are right, SCOOP generates futures from every element of the iterable first. The reason is to be able to distribute those tasks to remote workers. If map_as_completed returned an iterable, tasks would be emitted one after another, making the computation serial and not parallel, unless we batch the inputs, as you said.

Batching the input explicitly would require the user to manually emit batches when he desires. I haven't found a way to provide a clean and simple API to perform this.

Batching implicitly (on SCOOP's end) would delay the scheduling of later tasks, which could hinder load balancing. I am not formally against this, as long as the default value does not cause surprise to the power user and performs suboptimally for the beginner.
The emission of tasks should be done in the communication thread, though, it should not wait for a call to SCOOP API from the user's code.
I'm open to pull requests in this direction.

mjmdavis · 2017-03-12T19:16:59Z

Perhaps a solution would be for the user to provide an argument.
'take=1000'
Then the map could take 1000, balance and dispatch then take another 1000 until stop iteration.

mjmdavis · 2017-03-12T19:30:30Z

My system runs out of RAM when the iterator is consumed all at once 🤗🤓

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

map should iterate #51

map should iterate #51

joernhees commented Feb 4, 2017

soravux commented Feb 18, 2017

mjmdavis commented Mar 12, 2017

mjmdavis commented Mar 12, 2017

map should iterate #51

map should iterate #51

Comments

joernhees commented Feb 4, 2017

soravux commented Feb 18, 2017

mjmdavis commented Mar 12, 2017

mjmdavis commented Mar 12, 2017