Inspired by the SIGMOD 2013 Programming Contest, this system is a high-performance search engine optimized for massive document streams. It manages thousands of concurrent queries (Exact, Edit Distance, and Hamming Distance) using a custom-built parallel execution framework.
The backbone of the system is a custom Job Scheduler that manages a thread-safe task queue with
To prevent data race conditions and maintain cache locality, the scheduler implements Type-Based Batching:
- Group Execution: Jobs are grouped by type (
StartQuery,MatchDocument,EndQuery). - Synchronization Barriers: A barrier-based synchronization mechanism ensures that all query updates (
StartQuery) are fully committed before matching begins, preventing "dirty reads" without the overhead of heavy locking. - Main Thread Sync: Uses condition variables to allow the main thread to wait for batch completion, ensuring results are only gathered once parallel processing is finalized.
- Optimization: Replaced standard SHA1 with a lightweight, thread-optimized hash function to minimize CPU cycles.
- Lock-Free Read Path: By isolating
MatchArraystorage to thread-local heap space, lookup operations are entirely lock-free, significantly reducing contention during high-load document matching.
The system supports Edit and Hamming distance thresholds (1-3) via BK-Trees.
- Pruning Logic: Instead of multiple tree traversals for different thresholds, the lookup function was optimized to recursively prune subtrees using the current threshold as a dynamic bound.
-
Complexity: Search time is reduced to
$O(log N)$ , making fuzzy matching viable for real-time streams.
The system achieves near-linear scaling by minimizing the critical section size. Mutexes are placed only where strictly necessary (e.g., final result aggregation).
| Configuration | Execution Time (ms) | Speedup |
|---|---|---|
| Serial (1 Thread) | 1395ms | 1.0x |
| Parallel (2 Threads) | 771ms | 1.8x |
| Parallel (4 Threads) | 522ms | 2.6x |
Tested on 4-core Linux environment. Performance peaks at 4 threads, matching physical core count, proving minimal synchronization overhead.
- Memory Management: Utilized a centralized Query Object registry with reference-passing to reduce the memory footprint to just 4MB for millions of operations.
- Thread Safety: Replaced
strtok()withstrtok_r()and implementedpthread_barrierfor rigid task-group boundaries. - Data Locality: High-performance
MatchArraydesign ensures each thread writes to localized memory, avoiding "False Sharing" and cache misses.
Developed as part of the National and Kapodistrian University of Athens (UoA).
make # Compile project
make test # Compile unit tests