Skip to content
Reads a file using Java and prints some statistics
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Java Large File / Data Reading & Performance Testing

A reimplementation of paigen11/read-file-java using marschall/line-parser, marschall/charsequences and marschall/mini-csv.

We use the following approach for parsing

  • Use marschall/mini-csv for CSV parsing, which uses marschall/line-parser.
    • This allows us to drastically cut down on string allocations as just a reused CharSequence view is created for every line instead of a full String.
    • Since the file is in ASCII we can safe us the decoding and turn every byte into a char.
  • Use an Eclipse Collections Bag or counting it occurrences of months and first names.
    • This allows us to not to have to hold on to every first name and is more efficient than a HashMap<String, Integer>.
    • Unfortunately this adds about 10 MB.
  • Use YearMonth instead of a formatted String for representing a month.
    • Use Integer.parseInt for parsing the YearMonth instead of DateTimeFormatterBuilder because is drastically cuts down on allocations. This causes a noticeable speed improvement.
time java -Xmx16m -cp target/read-file-java-0.1.0-SNAPSHOT-shaded.jar com.github.marschall.readfilejava.ReadFile /path/to/file
time java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler
time java -Xmx6g -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC
-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:StartFlightRecording:filename=read-file-java.jfr:settings=$HOME/git/read-file-java/read-file-java.jfc -XX:FlightRecorderOptions:stackdepth=128
You can’t perform that action at this time.