Parallel Hashing Performance Upgrade by pwgit-create · Pull Request #372 · pwssOrg/File-Integrity-Scanner

pwgit-create · 2026-05-23T22:08:18Z

Parallel Hashing Performance Upgrade, Dependency Update, and Documentation Improvements

Overview

This release introduces a significant performance enhancement to the file integrity scanning engine through the addition of parallel hashing for large files. It also includes dependency updates, scan lifecycle improvements, logging configuration adjustments, and README restructuring.

The primary focus of this change is improving large-file scan performance and reducing redundant I/O operations during multi-algorithm hashing.

Key Changes

Hashing Engine Performance Improvement

Introduced ParallelFileHashHandler to handle large file hashing using parallel execution.
Refactored FileHashComputer to route large file processing through the parallel hashing implementation.
Added explicit lifecycle control:
- initializeParallelHashing() to set up thread pool and handler before scan start
- shutdownParallelHashProcessor() to properly release resources after scan completion
Improved fallback handling:
- OutOfMemoryError now switches to parallel hashing instead of single-threaded big file processing.

Performance Impact

Parallel hashing provides a major performance improvement in local testing, with 2x+ speedup observed for large file scans.
For files 1GB and larger, the improvement is especially significant:
- The file is read once only, regardless of the number of hashing algorithms applied.
- Previously, each algorithm required separate file reads.
- Now SHA-256, SHA-3, and BLAKE2b computations are executed in parallel over a single read stream.
Performance gains scale with file size and are increasingly beneficial in environments with many large files, where I/O reduction becomes the dominant factor.

Scan Service Integration

Parallel hashing initialization added at scan start for both:
- scanAllDirectories
- scanSingleDirectory
Thread pool shutdown added during scan finalization to prevent resource leaks.

Dependency Update

Updated algorithm-hash-extraction from 1.2.8 to 1.2.9.

Logging Updates

Added dedicated logging configuration for:
- lib.pwss.hash.file_hash_handler.parallel

Documentation Updates (README)

Refactored project title to: File-Integrity Scanner Backend (FIM Engine)
Improved technical clarity and structure of documentation.
Expanded explanation of cryptographic hashing and integrity verification.
Added Related Repositories section for end-user distribution.
Improved system architecture description and component breakdown.
Updated setup instructions for developers.

Architecture / Behavioral Impact

Large file processing is now optimized for parallel execution.
File I/O is reduced significantly when multiple hashing algorithms are used.
Clear separation of concerns between:
- Small file hashing (FileHashHandler)
- Large file parallel hashing (ParallelFileHashHandler)
Better scalability for environments with high-volume or large-size file systems.

Notes

Parallel hashing must be initialized before scan execution and properly shut down after completion to avoid thread pool leaks.
Requires algorithm-hash-extraction:1.2.9.

Testing

Verified scan execution for:
- Single directory scans
- Full directory scans
- Large file handling (1GB+ test cases)
Confirmed:
- Correct parallel execution of hashing algorithms
- Proper resource cleanup after scan completion
- Stable fallback behavior under memory pressure

…h computation * Upgraded the dependency algorithm-hash-extraction from version 1.2.8 to 1.2.9 * Added ParallelFileHashHandler for parallel hash computation of large files * Updated FileHashComputer to use parallel processing when computing hashes * Modified ScanServiceImpl to initialize and shutdown parallel hash processors * Added debugging log level for lib.pwss.hash.file_hash_handler.parallel package This change improves performance by utilizing parallel processing for file hash computations, especially for larger files.

* Updated project version from 1.8.5 to 1.9 in pom.xml * Changed log level for lib.pwss.hash.file_hash_handler.parallel package from DEBUG to ERROR in logback.xml This change prepares the project for a new release with improved logging configuration.

* Removed unused `import lib.pwss.hash.ParallelFileHash;` in FileHashComputer.java This cleanup removes an unnecessary import to keep the codebase tidy and improve maintainability.

Scan speed improvement

Improved README structure and clarity for the backend (FIM Engine). Focused on better architecture explanation, reduced redundancy, and clearer separation of system components. No functional changes.

Added a section explaining cryptographic hashes and their importance in file integrity.

Docs/backend readme restructure

….0.22)

Fix security vulnerabilities reported by Snyk in Tomcat (11.0.21 → 11…

lilstiffy

I really like the addition to the README :D

pwgit-create and others added 7 commits May 23, 2026 12:53

Remove unnecessary ParallelFileHash import

40431d5

* Removed unused `import lib.pwss.hash.ParallelFileHash;` in FileHashComputer.java This cleanup removes an unnecessary import to keep the codebase tidy and improve maintainability.

Merge pull request #369 from pwssOrg/scan_faster

125f715

Scan speed improvement

docs: restructure backend README

a272a19

Improved README structure and clarity for the backend (FIM Engine). Focused on better architecture explanation, reduced redundancy, and clearer separation of system components. No functional changes.

Add explanation of cryptographic hashes to README

64b1f62

Added a section explaining cryptographic hashes and their importance in file integrity.

Merge pull request #370 from pwssOrg/docs/backend-readme-restructure

a0aade8

Docs/backend readme restructure

pwgit-create requested a review from lilstiffy May 23, 2026 22:08

pwgit-create added enhancement New feature or request Spring Discussions specifically about the Spring Framework in Java Java Identifies issues and discussions related to the Java programming language hash Topics related to using hashes in code, including hashing algorithms labels May 23, 2026

pwgit-create and others added 2 commits May 24, 2026 00:35

Fix security vulnerabilities reported by Snyk in Tomcat (11.0.21 → 11…

fded0c3

….0.22)

Merge pull request #373 from pwssOrg/security_fix_1

347a97b

Fix security vulnerabilities reported by Snyk in Tomcat (11.0.21 → 11…

lilstiffy approved these changes May 24, 2026

View reviewed changes

pwgit-create merged commit 8f5e382 into master May 24, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Hashing Performance Upgrade#372

Parallel Hashing Performance Upgrade#372
pwgit-create merged 9 commits into
masterfrom
develop

pwgit-create commented May 23, 2026

Uh oh!

lilstiffy left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pwgit-create commented May 23, 2026

Parallel Hashing Performance Upgrade, Dependency Update, and Documentation Improvements

Overview

Key Changes

Hashing Engine Performance Improvement

Performance Impact

Scan Service Integration

Dependency Update

Logging Updates

Documentation Updates (README)

Architecture / Behavioral Impact

Notes

Testing

Uh oh!

lilstiffy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants