Performance optimizations: O(n*m) to O(n) lookups, lazy OUI loading, …#457
Performance optimizations: O(n*m) to O(n) lookups, lazy OUI loading, …#457
Conversation
…cached FD checks
- airodump.py: Replace O(n*m) nested loops with O(1) dict lookups for both
client-to-target mapping and old-target matching (250K+ comparisons → ~1K)
- airodump.py: Sample only first 4KB for chardet encoding detection instead
of reading entire CSV file, reducing I/O per scan cycle
- airodump.py: Suppress repeated null-byte warnings (log once per parse)
- config.py: Lazy-load OUI manufacturer database (1.7MB file) on first access
instead of parsing at startup, improving startup time
- process.py: Cache file descriptor count with 2-second TTL to avoid
os.listdir('/proc/pid/fd/') on every process creation
- process.py: Reduce FD limit sleep from 500ms to 100ms since cleanup is sync
- scanner.py: Use heapq.nlargest() for O(n log k) target capping instead of
O(n log n) full sort
- scanner.py: Remove duplicate pid.poll() system call in main scan loop
https://claude.ai/code/session_012qvM97c3D9M98CrMjmzuQd
There was a problem hiding this comment.
Pull request overview
This PR focuses on reducing scan-loop overhead and startup latency by replacing repeated linear scans and eager file parsing with cached/dict-based lookups and lazy loading in the WiFi scanning pipeline.
Changes:
- Optimizes scan-time data processing (dict lookups,
heapq.nlargest, fewer redundant system calls). - Introduces lazy loading for the OUI manufacturer database instead of parsing at startup.
- Adds a TTL cache for
/proc/<pid>/fdcounting to reduce filesystem scans during frequent process creation.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| wifite/util/scanner.py | Removes redundant pid.poll() check and caps target lists using heapq.nlargest for better asymptotic performance. |
| wifite/util/process.py | Adds FD-count caching with TTL and shortens FD-limit sleep delay. |
| wifite/ui/selector_view.py | Ensures manufacturer DB is loaded before OUI lookup in the selector UI. |
| wifite/ui/scanner_view.py | Ensures manufacturer DB is loaded before OUI lookup in the scanner TUI. |
| wifite/tools/airodump.py | Replaces nested loops with dict lookups and reduces CSV encoding-detection I/O; suppresses repeated null-byte warnings. |
| wifite/model/target.py | Loads manufacturer DB on-demand when rendering manufacturer column in Target.to_str(). |
| wifite/config.py | Moves OUI parsing to a lazy-load method and defers manufacturer DB initialization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def load_manufacturers(cls): | ||
| """Lazy-load OUI manufacturer database on first access.""" | ||
| if cls._manufacturers_loaded: | ||
| return | ||
| cls._manufacturers_loaded = True |
There was a problem hiding this comment.
Configuration.load_manufacturers() reads cls._manufacturers_loaded, but _manufacturers_loaded is not defined as a class attribute (it’s only set inside initialize). If load_manufacturers() is called before initialize(), this will raise AttributeError. Define _manufacturers_loaded = False at class scope (and consider using getattr(cls, '_manufacturers_loaded', False) for extra safety).
| def get_open_fd_count(): | ||
| """Get current open file descriptor count from /proc/{pid}/fd""" | ||
| """Get current open file descriptor count from /proc/{pid}/fd (cached with TTL)""" | ||
| now = time.time() | ||
| if now - Process._fd_cache_time < Process._FD_CACHE_TTL: | ||
| return Process._fd_cache_value | ||
| try: | ||
| proc_fd_dir = f'/proc/{os.getpid()}/fd' | ||
| if os.path.exists(proc_fd_dir): | ||
| return len(os.listdir(proc_fd_dir)) | ||
| Process._fd_cache_value = len(os.listdir(proc_fd_dir)) | ||
| Process._fd_cache_time = now | ||
| return Process._fd_cache_value |
There was a problem hiding this comment.
get_open_fd_count() caches results for 2s, but check_fd_limit() calls it twice back-to-back (before and immediately after cleanup). With the TTL, the “after cleanup” call will return the cached pre-cleanup value, making freed/new_count wrong and potentially keeping the system in a “high FD usage” state for up to the TTL. Consider adding a force_refresh/use_cache parameter or explicitly invalidating the cache before the post-cleanup re-check.
| # Use dict lookup (O(1)) instead of nested loop (O(n*m)) | ||
| old_by_bssid = {t.bssid: t for t in old_targets} | ||
| for new_target in new_targets: | ||
| just_found = True | ||
| for old_target in old_targets: | ||
| # If the new_target is found in old_target copy attributes from old target | ||
| if old_target == new_target: | ||
| # Identify decloaked targets | ||
| if new_target.essid_known and not old_target.essid_known: | ||
| # We decloaked a target! | ||
| new_target.decloaked = True | ||
|
|
||
| old_target.transfer_info(new_target) | ||
| just_found = False | ||
| break | ||
|
|
||
| # If the new_target is not in old_targets, check target_archives | ||
| # and copy attributes from there | ||
| if just_found and new_target.bssid in target_archives: | ||
| old_target = old_by_bssid.get(new_target.bssid) | ||
| if old_target is not None: | ||
| # Identify decloaked targets | ||
| if new_target.essid_known and not old_target.essid_known: | ||
| new_target.decloaked = True | ||
| old_target.transfer_info(new_target) | ||
| elif new_target.bssid in target_archives: |
There was a problem hiding this comment.
old_by_bssid = {t.bssid: t for t in old_targets} changes behavior if old_targets contains duplicate BSSIDs (possible because CSV parsing doesn’t dedupe). The previous nested loop copied from the first matching old_target; this dict keeps the last one. If you want to preserve previous semantics, build the dict with “first wins” (e.g., only set the key if it’s not already present).
| self.targets = self.targets[:self._max_targets] | ||
|
|
||
| # Use heapq.nlargest: O(n log k) vs O(n log n) for full sort | ||
| self.targets = heapq.nlargest(self._max_targets, self.targets, key=lambda x: x.power) |
There was a problem hiding this comment.
heapq.nlargest() is not stable for ties, so when multiple targets share the same power, their relative order may become non-deterministic compared to the previous stable list.sort(). If target ordering is user-visible (and affects selection indices), consider adding a deterministic tie-breaker (e.g., (power, bssid)), or keep the stable sort when order matters.
| self.targets = heapq.nlargest(self._max_targets, self.targets, key=lambda x: x.power) | |
| # Add deterministic tie-breaker (BSSID) to avoid non-deterministic ordering for equal power | |
| self.targets = heapq.nlargest( | |
| self._max_targets, | |
| self.targets, | |
| key=lambda x: (x.power, getattr(x, 'bssid', '')) | |
| ) |
…cached FD checks