Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
sys/cmd/mapr: fix severe data loss bug when processing >4MB of data
The way mapr divided very large input into roughly 4MB chunks, to be processed by separate awk invocations, failed to account for stream buffering on Unix-like systems. As a result, data was lost in between chunks. Each time an awk invocation exited at the end of a chunk, data from the standard input stream was lost as it was already buffered for the awk process that exited. This made mapr fundamentally unusable for large data input. In this commit, this whole broken chunking notion goes away and is replaced by asynchronous processing. mapr is now a bit slower, but works correctly and also uses much less shell working memory. lib/modernish/mdl/sys/cmd/mapr.mm: - Instead of invoking multiple awk's in a very broken way, invoke just one awk in the background using modernish portable process substitution (sys/cmd/procsubst). Read from this process using a 'while' loop that evals each line read. (On most shells, we could '.' the input stream. But this does not work on bash 3.2, which cannot read dot scripts from a non- regular file. It also does not work on ksh93, which reads the entire dot script into memory before beginning execution, so that mapr wouldn't work with commands writing infinite output.) lib/modernish/aux/sys/cmd/mapr.awk: - Remove code that was used by the broken chunking mechanism.
- Loading branch information
Showing
2 changed files
with
35 additions
and
35 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters