-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hail] memory-efficient scan #6345
Conversation
Looks about right to me. |
scanAggsPerPartition.foreach { x => | ||
partitionIndices(i) = os.getPos | ||
i += 1 | ||
val oos = new ObjectOutputStream(os) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we lift the oos up outside the loop, and flush once at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, you need to flush inside to make the position correct. Could you add a comment to that effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it was pretty weird, I should verify but I also had problems getting the right positions when re-using the same OOS
var i = 0 | ||
val scanAggsPerPartitionFile = hc.getTemporaryFile() | ||
HailContext.get.sFS.writeFileNoCompression(scanAggsPerPartitionFile) { os => | ||
scanAggsPerPartition.foreach { x => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use zipWithIndex instead of the i += 1
? seems a little clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also don't need numPartitions + 1 scan intermediates, right? (you don't have to do the last one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. It adds like three lines of code to save not many bytes so I removed it. I can use zipWithIndex
if (codec != null) | ||
codec.createOutputStream(os) | ||
else | ||
os | ||
} | ||
|
||
private def createNoCompresion(filename: String): FSDataOutputStream = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
size += a.length | ||
if (size > sizeLimit) { | ||
val file = hc.getTemporaryFile() | ||
fs.writeFileNoCompression(file) { os => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about object output stream -- lift if possible add comment
} | ||
} | ||
|
||
class SpillingCollectIterator[T: ClassTag] private (rdd: RDD[T], sizeLimit: Int) extends Iterator[T] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs unit tests -- can use property-based testing to generate random arrays, partitioning, size limits, then compare array.iterator
with SpillingCollectIterator(sc.parallelize(array, nPartitions), sizeLimit)
For posterity this is what goes wrong if I don't create a fresh OOS for each object:
|
e5eeb9d
to
ee9fa94
Compare
@tpoterba all comments addressed |
This adds SpillingCollectIterator which avoids holding more than 1000 aggregation results in memory at one time. We could do something that listens for GC events and spills data if there's high memory pressure. That seems a bit error prone and hard.
The number of results kept in memory is a flag on the HailContext. In C++ we can design a system that is aware of its memory usage and adjusts memory allocated to scans accordingly.
Implementation Notes
I had to add two new file operations to
FS
andHadoopFS
because I need seekable file input streams. When we add non-hadoopFS
's we'll need to address the interface issue.When we overflow our in-memory buffer, we spill to a disk file. We use O(n_partitions / mem_limit) files. We stream through the files to
scanLeft
, to compute the globally valid scan state per partition. The stream writes its results to another file which must be on a cluster-visible file system (we useHailContext.getTemporaryFile
). Finally, each partition reads that file and seeks to its scan state.I somewhat better solution would be to eagerly scan as results come in. I leave that as future work.
Timings
Master 0.2.14-4da055db5a7b
This branch