Skip to content

Simple Command Line Utility to remove duplicates from stdio or a file.

License

Notifications You must be signed in to change notification settings

sblundy/dedup-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeDup CLI

Simple command line utility for removing duplicate lines from input. Like uniq but not just adjacent lines.

The standard advice for removing duplicates from a pipeline is either sort | uniq or just sort -u. That works, but can be suboptimal. sort serializes the input. On multi-cpu machines this can prevent all of the cores from being utilized. Also, sorting the input isn't free, imposing an O(n log n) cost.

dedup scans for duplicates incrementally, keeping a hash-set of lines it has seen. Hash look-up and storage is O(c). And because it scans a line at a time, dedup can output a new line immediately, silently dropping subsequent ones as they appear.

Installation

Currently, the easiest way to install is via cargo. Checkout the project and run cargo install.

cargo install -path .

Usage

producer | dedup | consumer

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

About

Simple Command Line Utility to remove duplicates from stdio or a file.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published