Skip to content

mmhliton/compoundfile-rust-cpp

Repository files navigation

CompoundFile C++ Guide

This guide documents the C++ tooling built around the Rust Compound File Binary (CFB) core library. It covers building, running, traversal features, optional hashing, and performance notes for the CompoundFile executable.

Contents

  • Overview
  • Build & Requirements
  • Quick Start
  • Executable Usage (CompoundFile)
  • Recursive Traversal & Stream Preview
  • Human-Readable Sizes
  • Optional Hashing (SHA256 / fallback)
  • Performance Tips
  • Development Workflow
  • Troubleshooting
  • Future Enhancements

Overview

The C++ component wraps the Rust CFB core, exposing a native executable capable of:

  • Opening and traversing large CFB files (GB-scale)
  • Enumerating storages and streams (depth-first)
  • Optionally previewing initial bytes of stream content (hex + ASCII)
  • Summarizing counts, total stream payload size (bytes + IEC units) and execution time
  • (Optional) Hashing stream contents for integrity checking (when OpenSSL headers are available)

Build & Requirements

Prerequisites (Ubuntu/Debian)

sudo apt update
sudo apt install build-essential cmake pkg-config libssl-dev

If libssl-dev cannot be installed, the build will still succeed using a non-cryptographic placeholder hash routine.

Configure & Build

From the CompoundFile directory:

mkdir -p build
cd build
cmake ..
make -j$(nproc)

Resulting executable: CompoundFile (in build/).

OpenSSL Detection

The build attempts find_package(OpenSSL QUIET). If both OPENSSL_CRYPTO_LIBRARY and OPENSSL_SSL_LIBRARY exist, hashing is enabled (HAS_OPENSSL_SHA=1). Otherwise a fallback hash is used; a warning is emitted during CMake configuration.

Quick Start

./CompoundFile test.cfb            # Default: previews enabled
./CompoundFile test.cfb false      # Suppress stream previews (faster)
./CompoundFile large_test_1gb.cfb false  # Large file traversal

Exit code is 0 on success; non-zero on fatal errors (file open / validation).

Executable Usage

./CompoundFile <path_to_compound_file> [print-streams=true|false]
  • print-streams defaults to true if omitted.
  • Accepts case-insensitive true / false.

Recursive Traversal & Stream Preview

During traversal the tool prints each entry:

  • Processing: /Path/To/Entry
  • For storages (non-root directories): emits Storage (directory) line
  • For streams (files): when previews enabled prints:
    • Stream size: <N> bytes
    • A hex preview (up to 16 bytes, grouped) and ASCII preview (non-printable replaced with .)

Sample (small test file, previews enabled):

Opening compound file: "test.cfb"
Processing: /
Processing: /RootStream
  Stream size: 43 bytes
    hex: 54 68 69 73 20 69 73 20 61 20 72 6f 6f 74 20 6c
    txt: This is a root level stream with test data.
Processing: /TestStorage
  Storage (directory)
Processing: /TestStorage/TestStream
  Stream size: 58 bytes
    hex: 48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 21 20 54 68
    txt: Hello, World! This is test data in a compound file stream.

C++ Traversal Results (using Rust core library):
Total Storage Count: 1
Total Stream Count: 2
Total Size of Streams: 101 bytes (101 B)
C++ traversal processing took 2 ms.

Human-Readable Sizes

Final summary prints raw total bytes plus an IEC-formatted unit:

Total Size of Streams: 1073741824 bytes (1.00 GiB)

Units: B, KiB, MiB, GiB, TiB (two decimals except bytes).

Optional Hashing (SHA256 / Fallback)

  • Define CHECK_STREAM_DATA at top of CompoundFile_linux.cpp (uncomment macro) to enable hashing.
  • Requires OpenSSL headers (<openssl/sha.h>). If absent, a lightweight non-cryptographic rolling hash fallback is used automatically.
  • Hashes stored in a map keyed by stream path (currently not printed; extend as needed).

Performance Tips

  • Disable previews (false) for large files to reduce I/O and formatting cost.
  • Pipe output for huge hierarchies:
    ./CompoundFile large_test_1gb.cfb false | less
  • Use release build flags if needed (add -DCMAKE_BUILD_TYPE=Release when configuring CMake).
  • Consider depth limiting or JSON export (future features) for tooling integration.

Development Workflow

  1. Edit source (CompoundFile_linux.cpp, Hash_linux.cpp, io_linux.cpp).
  2. Reconfigure if CMake changes: cmake .. (from build dir).
  3. Incremental build: make -j$(nproc).
  4. Run tests manually by invoking the executable on known CFB samples.
  5. Commit:
    git add .
    git commit -m "Add human-readable size formatting to CompoundFile output"
    git push origin master

Troubleshooting

Issue Cause Resolution
OpenSSL not found warning Missing dev headers sudo apt install libssl-dev (optional)
Segfault / runtime error Corrupt or malformed CFB Try a known-good file; run Rust tests
Excessive output Large number of entries Run with false flag or pipe to less
Slow traversal Previews + huge streams Disable previews; build in Release

Clean rebuild if build artifacts stale:

rm -rf build
mkdir build
cd build
cmake ..
make -j$(nproc)

Future Enhancements (Roadmap)

  • Depth limiting (--max-depth N)
  • JSON/YAML export of hierarchy
  • Selective stream extraction (--dump PATH)
  • Pattern filtering (--grep PATTERN on stream names)
  • Progress interval reporting (every N entries)
  • Optional hash output / verification mode

License

Consult root project licenses (Rust and C++ repositories) for usage and distribution terms.


For Rust-specific documentation refer back to CFB_PROJECT_GUIDE.md in the Rust repository.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published