I can read a 160041 lines file ~4.9 times
faster using Legacy.input_line...
I think we need many more benchmarks...
The culprit looks like BatIO and Enum, that add some wrapping around each call to
@agarwal I think you missed the link in your comment.
I looked at the code, and the source of slowness is probably twofold:
I used File.lines_of, but yes then at some point it calls BatIO.lines_of.
I think I should close this (since I created it): this is a too old performance regression compared to the stdlib
that was introduced too long time ago.
I don't think there is anything wrong with fixing old issues. c-cube looks interested in the performance aspect of BatIO (I personally tend to loathe IO-related stuff, so I should apologize for happily staying away of the discussion). It would help to have your actual benchmark code, though -- I would guess that most reasons we suspect could cause this regression will turn out not to matter that much in a realistic workflow, with one actual suspect being guitly by a large margin.
Thank you for looking at this!
Yet another argument to put IO in a separate library imho: if people want to use batteries in combination with libraries that use the standard input and ouput channels (and know what happens underneath, IO contains some bloat, like weak sets and whatnot).
Yep. I'm also pretty sure that IO brings in Unix. Separating the core of batteries from Unix is one of the main goals of refactoring.
Implementing a wc -l in ocaml with batteries (File.lines_of) is enough to see the problem.
The same program using (Legacy.open_in, Legacy.input_line, Legacy.close_in) will be faster I bet.
Here is my version trying to avoid batteries' IO:
module MU = My_utils
let with_in_file fn f =
let input = Legacy.open_in fn in
let res = f input in
let main () =
let nb_lines = ref 0 in
let _all_lines =
with_in_file Sys.argv.(1) (fun input ->
let res, _eof =
(fun () -> let l = Legacy.input_line input in
printf "init %d lines\n" !nb_lines;
MU.unfold_exc is the new constructor I am pushing for in BatList.