Parallelize reads #1607

pomadchin · 2016-08-02T14:47:20Z

s3
cassandra
file

fosskers · 2016-08-02T19:57:56Z

@lossyrob Grisha has opted here not to explicitly use Tasks with a custom thread pool. Apparently nondeterminism handles the specifics of that.

He was seeing similar issues to what we had before, where Tasks weren't appearing to complete and using pool.shutdown() was causing problems. We might want to consider back-porting his solution here to our other work.

echeipesh · 2016-08-04T17:53:11Z

Without dedicated pool it will end up using scala.concurrent.ExecutionContext.Implicits.global which is comes with this info:

The implicit global ExecutionContext. Import global when you want to provide the global ExecutionContext implicitly.

The default ExecutionContext implementation is backed by a work-stealing thread pool. By default, the thread pool uses a target number of worker threads equal to the number of available processors.

I think that works out alright for spark tasks

echeipesh · 2016-08-04T17:57:48Z

s3/src/main/scala/geotrellis/spark/io/s3/S3RDDReader.scala


-          tileSeq.flatten
+          nondeterminism.njoin(maxOpen = 8, maxQueued = 8) { range map read }.runFoldMap(identity).unsafePerformSync


Notice 8 is magic number here but 32 is for Cassandra and File. Did it benchmark to be best that way?

It would be best if we could configure this somehow, and not have magic numbers impossibly un-magicked

How do you feel about using typesafe config for this? This is probably a system wide property rather than per-action property.

just experimented how tests would run faster on a local machine; though they are still magical. probably it makes sense to make them configurable, as speed of them depends on certain machine(s) configuration.

pomadchin · 2016-08-07T08:03:28Z

@echeipesh in terms of scalaz, it would use DefaultStrategy, which is FixedThread pool with threads = available processors amount. Thx for a pointer, though I am still curious (as an example of what i am talking about S3RDDWriter), creating a batch of tasks using our custom thread pool, how tasks would be scheduled with nondeterminism.njoin as it uses DefaultStrategy and DefaultExecutor?

def njoin[A](maxOpen: Int, maxQueued: Int)(source: Process[Task, Process[Task, A]])(implicit S: Strategy): Process[Task, A]

fosskers · 2016-08-09T14:37:30Z

This continues to be an informative PR.

pomadchin added 3 commits August 2, 2016 17:46

sim reads for s3 and for cassandra

2daf665

fix s3 rdd reader thread pool close

adc937e

fix sim reads

c737e48

pomadchin changed the title ~~[WIP] Parallelize reads~~ Parallelize reads Aug 2, 2016

pomadchin added 2 commits August 2, 2016 23:06

code refactor

1c26b0f

+file rdd reader par

0d29af0

echeipesh reviewed Aug 4, 2016
View reviewed changes

pomadchin added 3 commits August 11, 2016 17:04

fixed thread pool control in writes and reads

4b9bf7e

reads/writes threads are now confiurable

44d4ad3

add readers / writers thread pool docs

4b7accc

echeipesh merged commit 4772fb1 into locationtech:master Aug 17, 2016

lossyrob added this to the 1.0 milestone Oct 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize reads #1607

Parallelize reads #1607

pomadchin commented Aug 2, 2016 •

edited

Loading

fosskers commented Aug 2, 2016 •

edited

Loading

echeipesh commented Aug 4, 2016 •

edited

Loading

echeipesh Aug 4, 2016

lossyrob Aug 4, 2016

echeipesh Aug 4, 2016

pomadchin Aug 9, 2016

pomadchin commented Aug 7, 2016 •

edited

Loading

fosskers commented Aug 9, 2016


		tileSeq.flatten
		nondeterminism.njoin(maxOpen = 8, maxQueued = 8) { range map read }.runFoldMap(identity).unsafePerformSync

Parallelize reads #1607

Parallelize reads #1607

Conversation

pomadchin commented Aug 2, 2016 • edited Loading

fosskers commented Aug 2, 2016 • edited Loading

echeipesh commented Aug 4, 2016 • edited Loading

echeipesh Aug 4, 2016

Choose a reason for hiding this comment

lossyrob Aug 4, 2016

Choose a reason for hiding this comment

echeipesh Aug 4, 2016

Choose a reason for hiding this comment

pomadchin Aug 9, 2016

Choose a reason for hiding this comment

pomadchin commented Aug 7, 2016 • edited Loading

fosskers commented Aug 9, 2016

pomadchin commented Aug 2, 2016 •

edited

Loading

fosskers commented Aug 2, 2016 •

edited

Loading

echeipesh commented Aug 4, 2016 •

edited

Loading

pomadchin commented Aug 7, 2016 •

edited

Loading