Skip to content
External-Memory Sorting in Java
Branch: master
Clone or download


Build Status docs-badge Coverage Status

External-Memory Sorting in Java: useful to sort very large files using multiple cores and an external-memory algorithm.

The versions 0.1 of the library are compatible with Java 6 and above. Versions 0.2 and above require at least Java 8.

This code is used in Apache Jackrabbit Oak as well as in Apache Beam.

Code sample


//... inputfile: input file name
//... outputfile: output file name
// next command sorts the lines from inputfile to outputfile
ExternalSort.mergeSortedFiles(ExternalSort.sortInBatch(new File(inputfile)), new File(outputfile));
// you can also provide a custom string comparator, see API

Code sample (CSV)

For sorting CSV files, it might be more convenient to use CsvExternalSort.

import org.apache.commons.csv.CSVRecord;

// provide a comparator
Comparator<CSVRecord> comparator = (op1, op2) -> op1.get(0).compareTo(op2.get(0));
//... inputfile: input file name
//... outputfile: output file name
// next two lines sort the lines from inputfile to outputfile
List<File> sortInBatch = CsvExternalSort.sortInBatch(inputfile, comparator, CsvExternalSort.DEFAULTMAXTEMPFILES, Charset.defaultCharset(), null, false, 1);
CsvExternalSort.mergeSortedFiles(sortInBatch, outputfile, comparator, Charset.defaultCharset(), false, true);

API Documentation

Maven dependency

You can download the jar files from the Maven central repository:

You can also specify the dependency in the Maven "pom.xml" file:


How to build

  • get the java jdk
  • Install Maven 2
  • mvn install - builds jar (requires signing)
  • mvn test - runs tests
You can’t perform that action at this time.