DeDup CLI

Simple command line utility for removing duplicate lines from input. Like uniq but not just adjacent lines.

The standard advice for removing duplicates from a pipeline is either sort | uniq or just sort -u. That works, but can be suboptimal. sort serializes the input. On multi-cpu machines this can prevent all of the cores from being utilized. Also, sorting the input isn't free, imposing an O(n log n) cost.

dedup scans for duplicates incrementally, keeping a hash-set of lines it has seen. Hash look-up and storage is O(c). And because it scans a line at a time, dedup can output a new line immediately, silently dropping subsequent ones as they appear.

Installation

Currently, the easiest way to install is via cargo. Checkout the project and run cargo install.

cargo install -path .

Usage

producer | dedup | consumer

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

tests

tests

.gitignore

.gitignore

Cargo.lock

Cargo.lock

Cargo.toml

Cargo.toml

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

Repository files navigation

DeDup CLI

Installation

Usage

Contributing

License

About

Releases 1

Packages

Languages

License

sblundy/dedup-cli

Folders and files

Latest commit

History

Repository files navigation

DeDup CLI

Installation

Usage

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages