High-performance automated document redaction. Scans text files for sensitive terms and replaces them with [REDACTED]. Available in C++ (speed) and Rust (speed + integrity verification). Ideal for legal teams, compliance departments, and any organization handling sensitive document processing.
- 📄 What It Does
- 🔐 Integrity Checksum Example (Tamper Detection)
- 🔧 Why Two Versions?
- 📊 Performance Benchmarks
- 🎯 How to Use
- 📂 File Structure
- 🏗️ Design Notes
- 📌 Version
Scans text files for these sensitive terms and replaces them with [REDACTED]:
- Plaintiff
- Confidential
- SSN
- Assets
Line-by-line processing means zero memory overhead, even for huge files.
On Rust, it performs the same redaction but includes integrity checksums for document verification:
First Run:
Redaction complete. Output written to: file_redacted_rust.txt
Integrity Checksum: 9543857320846529
Second Run (Same File):
Redaction complete. Output written to: file_redacted_rust.txt
Integrity Checksum: 9543857320846529 ← SAME ✅
# (User accidentally edits file_redacted_rust.txt)
Redaction complete. Output written to: file_redacted_rust.txt
Integrity Checksum: 7284019356729481 ← DIFFERENT! ⚠️
Why This Matters: The checksum is a digital fingerprint. If it changes, the redacted file has been tampered with. Use this for audit trails and chain-of-custody verification in litigation.
| Feature | C++ | Rust |
|---|---|---|
| Speed | ⚡⚡⚡ | ⚡⚡⚡ |
| File Size | Unlimited | Unlimited |
| Dependencies | None | None |
| Integrity Checksum | No | Yes ✓ |
| Memory Safety | STL (Checked by compiler) | Rust Ownership (Strict compiler) |
What this means:
- STL (C++ Standard Library): Uses built-in, well-tested code libraries. The C++ compiler checks that arrays/strings don't overflow. Safe for production.
- Rust Ownership: The Rust compiler forces safe memory handling. No dangling pointers, no buffer overflows. Nearly impossible to write unsafe code by accident.
C++: Raw speed for high-volume discovery processing.
Rust: Speed + digital fingerprinting to verify files haven't been tampered with.
Real-world test on a 100+ MB file with 2,000,000 lines (legal_case_large.txt):
| Language | Time | Throughput |
|---|---|---|
| C++ | 10 seconds | ~10 MB/sec |
| Rust | 5 seconds | ~20 MB/sec |
Result: Rust processes the same file 2x faster while also computing the integrity checksum.
Why? Rust's ownership model and compiler optimizations eliminate memory overhead that C++ carries even with -O3. Both handle unlimited file sizes, but Rust wins on speed and safety.
# Build
g++ -std=c++17 -O3 main.cpp -o redact-core.exe
# Run
./redact-core.exe document.txt
# Output: document_redacted_cpp.txt# Build
cargo build --release
# Run
./target/release/redaction_rust.exe document.txt
# Output: document_redacted_rust.txt + Digital Fingerprintredaction/
├── README.md # This file
├── legal_test.txt # Test document
├── redaction_cpp/
│ └── main.cpp # C++ engine
└── redaction_rust/
├── Cargo.toml # Rust project config
├── Cargo.lock # Dependency lock
└── src/
└── main.rs # Rust engine
- No regex: Simple find-and-replace is faster and more predictable
- Case-sensitive: "Plaintiff" ≠ "plaintiff" (prevents over-redaction)
- Hardcoded terms: No config files means no user error
- Streaming I/O: Processes files in constant memory
- No external deps: All STL (C++) or Rust std lib (Rust)
v1.0 - Both engines working, checksums functional.

