Lets your sort and merge really, really big text files.
- Make sure you have .NET 8.0 installed: .NET Runtime is sufficent.
- Download the latest release
- Unzip to a folder of your choice
MassiveSort.exe merge -o sortedFile.txt -i unsortedFile.txt
- Update to use .NET 8.0.
- Tested on Windows and Debian platforms. Other Linux distributions supported by dotnet should also work.
- Increase
--max-sort-sizeto support sorting over 2GB of data in RAM.- Physical RAM is the limit to sort size. Tested on 96GB machine.
- Increasing
--slab-sizeallows up to 63TB to be sorted in RAM (in theory).
- Maximum sortable line size is now 128kB (131,072) bytes. Longer lines are skipped.
- Null bytes (ASCII
NULor0x00) are removed by default.- Use
--keep-nullsto keep null bytes.
- Use
- Improved support for files with similar starting lines.
- Splitting phase is limited to 16 iterations to avoid long path failures on Windows.
- Use
--split-countand--force-large-sortto control this behaviour.
- Improved memory usage via dotnet
MemoryPool; MassiveSort should not allocate excessive memory.- The
--aggressive-memory-collectionoption is always active; command line option removed.
- The
- Improve logging via
--debugand--save-statsoptions. - Update to latest version of 3rd party libraries.
- Fixed several bugs discovered when implementing all the above!
- Increase
--max-sort-sizemaximum to 2GB - Change old BitBucket references to GitHub
- Fixed bugs in end of line handling and blank line detection
- Better crash dump support
- Linux support via Mono (minimal testing on Ubuntu 14.04 / Mono 3.2)
--sort-byoption to allow sorting by length / dictionary order.
Merge many files and even whole folders into a single, sorted file:
MassiveSort.exe merge -o sortedAndMergedFile.txt -i unsortedFile.txt anotherFile.txt aDirectory\subFolder
By default, MassiveSort removes duplicates. Use the --leave-duplicates options to keep them (if you're that attached to them).
MassiveSort can list your duplicates in a separate file with the --save-duplicates option.
MassiveSort can normalise special, non-ascii or non-printable characters into the $HEX[] format with the --convert-to-dollar-hex option.
MassiveSort can trim or remove whitespace with the --whitespace option (note that this can make destructive changes to the lines you are sorting, particularly if there are non-ascii encodings).
More details and examples can be found in the Merge Verb
- Merge into a central file, with tags
- Configuration files for source files, so you don't to remember lots of options to import
- General purpose large scale merge sort on any
IEnumerable<T>