Parallel Hashing Performance Upgrade#372
Merged
Merged
Conversation
…h computation * Upgraded the dependency algorithm-hash-extraction from version 1.2.8 to 1.2.9 * Added ParallelFileHashHandler for parallel hash computation of large files * Updated FileHashComputer to use parallel processing when computing hashes * Modified ScanServiceImpl to initialize and shutdown parallel hash processors * Added debugging log level for lib.pwss.hash.file_hash_handler.parallel package This change improves performance by utilizing parallel processing for file hash computations, especially for larger files.
* Updated project version from 1.8.5 to 1.9 in pom.xml * Changed log level for lib.pwss.hash.file_hash_handler.parallel package from DEBUG to ERROR in logback.xml This change prepares the project for a new release with improved logging configuration.
* Removed unused `import lib.pwss.hash.ParallelFileHash;` in FileHashComputer.java This cleanup removes an unnecessary import to keep the codebase tidy and improve maintainability.
Scan speed improvement
Improved README structure and clarity for the backend (FIM Engine). Focused on better architecture explanation, reduced redundancy, and clearer separation of system components. No functional changes.
Added a section explaining cryptographic hashes and their importance in file integrity.
Docs/backend readme restructure
Fix security vulnerabilities reported by Snyk in Tomcat (11.0.21 → 11…
lilstiffy
approved these changes
May 24, 2026
Collaborator
lilstiffy
left a comment
There was a problem hiding this comment.
I really like the addition to the README :D
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Parallel Hashing Performance Upgrade, Dependency Update, and Documentation Improvements
Overview
This release introduces a significant performance enhancement to the file integrity scanning engine through the addition of parallel hashing for large files. It also includes dependency updates, scan lifecycle improvements, logging configuration adjustments, and README restructuring.
The primary focus of this change is improving large-file scan performance and reducing redundant I/O operations during multi-algorithm hashing.
Key Changes
Hashing Engine Performance Improvement
Introduced
ParallelFileHashHandlerto handle large file hashing using parallel execution.Refactored
FileHashComputerto route large file processing through the parallel hashing implementation.Added explicit lifecycle control:
initializeParallelHashing()to set up thread pool and handler before scan startshutdownParallelHashProcessor()to properly release resources after scan completionImproved fallback handling:
Performance Impact
Parallel hashing provides a major performance improvement in local testing, with 2x+ speedup observed for large file scans.
For files 1GB and larger, the improvement is especially significant:
Performance gains scale with file size and are increasingly beneficial in environments with many large files, where I/O reduction becomes the dominant factor.
Scan Service Integration
Parallel hashing initialization added at scan start for both:
scanAllDirectoriesscanSingleDirectoryThread pool shutdown added during scan finalization to prevent resource leaks.
Dependency Update
algorithm-hash-extractionfrom1.2.8to1.2.9.Logging Updates
Added dedicated logging configuration for:
lib.pwss.hash.file_hash_handler.parallelDocumentation Updates (README)
Architecture / Behavioral Impact
Large file processing is now optimized for parallel execution.
File I/O is reduced significantly when multiple hashing algorithms are used.
Clear separation of concerns between:
FileHashHandler)ParallelFileHashHandler)Better scalability for environments with high-volume or large-size file systems.
Notes
algorithm-hash-extraction:1.2.9.Testing
Verified scan execution for:
Confirmed: