Stream log files line-by-line instead of readlines() by nitrobass24 · Pull Request #385 · nitrobass24/seedsync

nitrobass24 · 2026-04-20T05:04:39Z

Closes #374

Summary

Rewrites LogsHandler._read_logs in src/python/web/handler/logs.py to iterate each log file with for line in f: instead of f.readlines().
Uses a deque(maxlen=limit) for completed entries since the response only returns the trailing limit. A single current_entry is held outside the deque so continuation lines (tracebacks, multi-line messages) still get appended correctly.
Semantics preserved: rotated-file ordering, search / min_level / before filtering, continuation-line concatenation, 1-based global line index for the before cursor.

Impact

Peak memory for a /server/logs request with default limit=500 is now O(limit × avg entry size) instead of O(total log bytes).
Before defaults (10 MB × 10 rotations) could pull ~110 MB into memory per request; now bounded regardless of file size.

Test plan

cd src/python && ruff check . → clean
cd src/python && pyright → 0 errors, 0 warnings
New file src/python/tests/integration/test_web/test_handler/test_logs.py adds 7 tests, all pass
Memory test: 20 MB synthetic log (255 751 entries, limit=500), measured via tracemalloc.get_traced_memory():
- New streaming impl: 0.31 MB peak
- Old readlines() impl (verified by monkey-patching the old code back in): 185 MB peak — fails the test's < 10 MB assertion as intended
Existing tests/integration/test_web/ suite: 82 pass (1 pre-existing timing flake in test_stream_status.py, reproduces on clean develop — unrelated)

Summary by CodeRabbit

Tests
- Added comprehensive integration tests for the logs endpoint covering ordering across rotated files, multiline/continuation entries, min-level and substring search filters, global "before" cursor semantics, and a bounded-memory validation using a large synthetic log.
Performance
- Improved log reading to stream rotated logs and keep memory usage bounded while preserving correct ordering and filter behavior.

coderabbitai · 2026-04-20T05:04:44Z

Warning

Rate limit exceeded

@nitrobass24 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 48 minutes and 55 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 48 minutes and 55 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 62b123f9-5449-4819-9368-e70c238bb41d

📥 Commits

Reviewing files that changed from the base of the PR and between e68a3b0 and 757da56.

📒 Files selected for processing (2)

src/python/tests/integration/test_web/test_handler/test_logs.py
src/python/web/handler/logs.py

📝 Walkthrough

Walkthrough

Streams and parses rotated log files line-by-line in the logs handler using a bounded deque to retain only the last N completed entries; adds an integration test suite that validates ordering, multi-line continuations, filters, rotated-file semantics, and bounded-memory behavior with a large synthetic log.

Changes

Cohort / File(s)	Summary
Handler Streaming `src/python/web/handler/logs.py`	Replaced full-file `readlines()` with line-by-line streaming. Builds entries incrementally, uses `deque(maxlen=limit)`, adds a `flush()` helper, maintains a global completed-entry index for `before` semantics, preserves multi-line continuation parsing, and returns matched entries without accumulating entire files. Imports `deque`.
Logs Handler Tests `src/python/tests/integration/test_web/test_handler/test_logs.py`	Added `TestLogsHandler` suite (7 tests) and helper `_header`. Tests redirect handler logdir to a temp dir, generate rotated/active log files, verify default limit/order, continuation-line attachment, rotated-file ordering, `min_level` and `search` filters, `before` semantics (global index), and a bounded-memory test using `tracemalloc` with a ~20MB synthetic log.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client (HTTP)
    participant Handler as LogsHandler
    participant FS as Filesystem
    participant Deque as BoundedDeque
    Client->>Handler: GET /server/logs?limit=...
    Handler->>FS: open rotated files (oldest→newest)
    loop per line
        FS-->>Handler: yield line
        Handler->>Handler: parse header vs continuation
        alt new header
            Handler->>Deque: flush completed entry (apply before/filter)
            Handler->>Deque: append if matched (deque truncates oldest)
        else continuation
            Handler->>Handler: append to current entry.message
        end
    end
    Handler->>Deque: flush final in-progress entry (apply before/filter)
    Handler->>Client: return JSON list(matched entries)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hop through files, one line at a time,
Stitching traces, keeping order fine.
Deque holds tight to the freshest pick,
Memory light, my paws stay quick.
A carrot for logs—clean, small, sublime.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Stream log files line-by-line instead of readlines()' directly describes the main implementation change in the pull request.
Linked Issues check	✅ Passed	All coding requirements from issue `#374` are met: streaming line-by-line implemented, bounded deque used, semantics preserved, comprehensive tests added, linters clean.
Out of Scope Changes check	✅ Passed	All changes are directly related to the streaming log files objective. The handler implementation, test suite, and commit message context manager update are all in scope.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/logs-stream-reading

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

CI's `ruff format --check` caught formatting drift in the new test_logs.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nitrobass24 · 2026-04-20T06:24:15Z

@CodeRabbit review

coderabbitai · 2026-04-20T06:24:20Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

nitrobass24 · 2026-04-20T15:03:24Z

@CodeRabbit review

coderabbitai · 2026-04-20T15:04:15Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/python/web/handler/logs.py (1)

80-92: ⚠️ Potential issue | 🟠 Major

Process rotated logs from oldest to newest before applying deque(maxlen=limit).

Line 82 adds the active log first, then Line 84 walks .log.1, .log.2, etc. Since active .log is newest, this traversal is newest → oldest; with deque(maxlen=limit), older rotated entries can evict newer active-log entries when the combined logs exceed limit.

🐛 Proposed fix

-        # Gather log file paths: .log, .log.1, .log.2, ... up to backup count
+        # Gather log file paths oldest -> newest: .log.N, ..., .log.1, .log
         log_files: list[str] = []
-        if os.path.isfile(base_path):
-            log_files.append(base_path)
-        for i in range(1, Constants.LOG_BACKUP_COUNT + 1):
+        for i in range(Constants.LOG_BACKUP_COUNT, 0, -1):
             rotated = "{}.{}".format(base_path, i)
             if os.path.isfile(rotated):
                 log_files.append(rotated)
+        if os.path.isfile(base_path):
+            log_files.append(base_path)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/python/web/handler/logs.py` around lines 80 - 92, The code builds
log_files with the active base_path first then .log.1, .log.2, ... which results
in newest→oldest ordering and causes deque(maxlen=limit) to evict newer active
entries; change the construction so log_files is ordered oldest→newest (e.g.,
iterate rotated indices in reverse or append rotated files first and then
base_path last, or simply reverse log_files before streaming). Update the logic
around base_path, Constants.LOG_BACKUP_COUNT and log_files to ensure matched
(the deque) receives entries oldest to newest.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/python/tests/integration/test_web/test_handler/test_logs.py`:
- Around line 78-94: The test test_rotated_file_ordering_oldest_first has its
assertion reversed: change the expected ordering in the final assert to
["oldest", "middle", "newest"] so the test verifies oldest→newest across rotated
files (you may also update the docstring if desired); locate this in the test
method test_rotated_file_ordering_oldest_first and adjust the expected list used
in the assertion that compares [e["message"] for e in entries].

---

Outside diff comments:
In `@src/python/web/handler/logs.py`:
- Around line 80-92: The code builds log_files with the active base_path first
then .log.1, .log.2, ... which results in newest→oldest ordering and causes
deque(maxlen=limit) to evict newer active entries; change the construction so
log_files is ordered oldest→newest (e.g., iterate rotated indices in reverse or
append rotated files first and then base_path last, or simply reverse log_files
before streaming). Update the logic around base_path, Constants.LOG_BACKUP_COUNT
and log_files to ensure matched (the deque) receives entries oldest to newest.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: aeb3f085-215a-42ae-a1d5-00a81456c666

📥 Commits

Reviewing files that changed from the base of the PR and between 6ca434e and 1386d9a.

📒 Files selected for processing (2)

src/python/tests/integration/test_web/test_handler/test_logs.py
src/python/web/handler/logs.py

The previous implementation appended base_path (.log, newest) first and then .log.1, .log.2, ..., .log.N (oldest). Under the deque(maxlen=limit), this retained the OLDEST entries and evicted the newest — opposite of what a log viewer should return. RotatingFileHandler rotates by renaming .log -> .log.1 -> .log.2, so .log.N is the oldest surviving backup and .log is the currently-active file. Iterate in reverse (N..1) then .log last so the deque naturally retains the *newest* `limit` entries in chronological oldest->newest order. Also fix the test assertion in test_rotated_file_ordering_oldest_first which had been written to match the (incorrect) old behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nitrobass24 · 2026-04-20T17:54:36Z

@CodeRabbit review

coderabbitai · 2026-04-20T17:54:42Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/python/web/handler/logs.py`:
- Around line 111-136: Replace the manual open/try/finally close pattern with a
context manager: keep the OSError guard around the open call for log_file, but
after successfully opening use "with open(log_file, encoding='utf-8',
errors='replace') as f:" and move the "for line in f:" loop inside that
with-block, removing the outer try/finally and the explicit f.close(); preserve
all existing logic that uses _LOG_PATTERN, current_entry, and flush so behavior
remains identical.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a9a557f6-c981-496c-a795-684127fe8033

📥 Commits

Reviewing files that changed from the base of the PR and between 1386d9a and 54d11e8.

📒 Files selected for processing (2)

src/python/tests/integration/test_web/test_handler/test_logs.py
src/python/web/handler/logs.py

Replaces the try/finally f.close() pattern with `with f:`. OSError guard around open() is preserved so an unreadable rotated file still just skips to the next. Behavior identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nitrobass24 · 2026-04-20T19:09:14Z

@CodeRabbit review

coderabbitai · 2026-04-20T19:09:19Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

nitrobass24 · 2026-04-20T20:06:49Z

@CodeRabbit review

coderabbitai · 2026-04-20T20:06:55Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/python/web/handler/logs.py`:
- Around line 103-139: The log scanning still processes entire files even after
the pagination cursor (`before`) is exhausted; update the loop that iterates
log_files to stop scanning further files once global_entry_idx reaches or
exceeds before (i.e., when before != 0 and global_entry_idx >= before) to avoid
wasted I/O; specifically, add a check using the existing symbols
global_entry_idx, before and the flush() behaviour before opening/processing
each log_file (or inside the per-file loop) and break out of the outer
file-iteration early when the condition is met so no further regex matching or
file reads occur.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 65b08518-d098-400e-b88b-04501b4fd147

📥 Commits

Reviewing files that changed from the base of the PR and between 54d11e8 and e68a3b0.

📒 Files selected for processing (1)

src/python/web/handler/logs.py

coderabbitai · 2026-04-20T20:10:21Z

+        def flush(entry: dict[str, str]) -> None:
+            nonlocal global_entry_idx
+            global_entry_idx += 1
+            if before != 0 and global_entry_idx > before:
+                return
+            if self._entry_matches(entry, search, min_level):
+                matched.append(entry)

-        global_line_idx = 0
        for log_file in log_files:
            try:
-                with open(log_file, encoding="utf-8", errors="replace") as f:
-                    lines = f.readlines()
+                f = open(log_file, encoding="utf-8", errors="replace")
            except OSError:
                continue
-
-            current_entry = None
-            for line in lines:
-                match = _LOG_PATTERN.match(line)
-                if match:
-                    # Flush previous entry
-                    if current_entry is not None:
-                        global_line_idx += 1
-                        if before == 0 or global_line_idx <= before:
-                            if self._entry_matches(current_entry, search, min_level):
-                                entries.append(current_entry)
-                    current_entry = {
-                        "timestamp": match.group(1),
-                        "level": match.group(2),
-                        "logger": match.group(3),
-                        "process": match.group(4),
-                        "thread": match.group(5),
-                        "message": match.group(6),
-                    }
-                elif current_entry is not None:
-                    # Continuation line (traceback, etc.)
-                    current_entry["message"] += "\n" + line.rstrip()
-
-            # Flush last entry
+            current_entry: dict[str, str] | None = None
+            with f:
+                for line in f:
+                    match = _LOG_PATTERN.match(line)
+                    if match:
+                        # Header line: flush any previous entry, then start a new one.
+                        if current_entry is not None:
+                            flush(current_entry)
+                        current_entry = {
+                            "timestamp": match.group(1),
+                            "level": match.group(2),
+                            "logger": match.group(3),
+                            "process": match.group(4),
+                            "thread": match.group(5),
+                            "message": match.group(6),
+                        }
+                    elif current_entry is not None:
+                        # Continuation line (traceback, etc.) — append to current entry.
+                        current_entry["message"] += "\n" + line.rstrip()
+
+            # End-of-file flush: preserves old behaviour of flushing each file's
+            # final entry before moving on to the next rotated file.
            if current_entry is not None:
-                global_line_idx += 1
-                if before == 0 or global_line_idx <= before:
-                    if self._entry_matches(current_entry, search, min_level):
-                        entries.append(current_entry)
-
-        # Return the most recent entries (last N)
-        if len(entries) > limit:
-            entries = entries[-limit:]
+                flush(current_entry)


🧹 Nitpick | 🔵 Trivial

Consider early-terminating once the before cursor is exhausted.

When a client paginates with a small before value, flush() becomes a no-op for every subsequent entry but the handler still fully streams and regex-matches the remainder of every rotated file. For large .log + many backups this is wasted I/O per paginated request. Since entries flow oldest→newest and global_entry_idx only grows, you can short-circuit once before != 0 and global_entry_idx >= before (the deque can no longer be affected).

♻️ Proposed refactor

def flush(entry: dict[str, str]) -> None: nonlocal global_entry_idx global_entry_idx += 1 if before != 0 and global_entry_idx > before: return if self._entry_matches(entry, search, min_level): matched.append(entry) for log_file in log_files: try: f = open(log_file, encoding="utf-8", errors="replace") except OSError: continue current_entry: dict[str, str] | None = None with f: for line in f: match = _LOG_PATTERN.match(line) if match: # Header line: flush any previous entry, then start a new one. if current_entry is not None: flush(current_entry) + if before != 0 and global_entry_idx >= before: + current_entry = None + break current_entry = { "timestamp": match.group(1), "level": match.group(2), "logger": match.group(3), "process": match.group(4), "thread": match.group(5), "message": match.group(6), } elif current_entry is not None: # Continuation line (traceback, etc.) — append to current entry. current_entry["message"] += "\n" + line.rstrip() # End-of-file flush: preserves old behaviour of flushing each file's # final entry before moving on to the next rotated file. if current_entry is not None: flush(current_entry) + if before != 0 and global_entry_idx >= before: + break

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/python/web/handler/logs.py` around lines 103 - 139, The log scanning still processes entire files even after the pagination cursor (`before`) is exhausted; update the loop that iterates log_files to stop scanning further files once global_entry_idx reaches or exceeds before (i.e., when before != 0 and global_entry_idx >= before) to avoid wasted I/O; specifically, add a check using the existing symbols global_entry_idx, before and the flush() behaviour before opening/processing each log_file (or inside the per-file loop) and break out of the outer file-iteration early when the condition is met so no further regex matching or file reads occur.

Once global_entry_idx >= before, no subsequent entry can contribute to the response (flush() already no-ops beyond that index). The outer file-iteration now breaks out at that point, skipping open() and regex scanning of any newer rotated files. Adds test_files_past_before_cursor_are_not_opened which patches open() inside the handler module and asserts that only the one file needed to saturate `before` is read. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Stream log files line-by-line instead of readlines() (#374)

405d689

Apply ruff format to new test file

1386d9a

CI's `ruff format --check` caught formatting drift in the new test_logs.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread src/python/tests/integration/test_web/test_handler/test_logs.py

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread src/python/web/handler/logs.py Outdated

Use context manager for log file open

e68a3b0

Replaces the try/finally f.close() pattern with `with f:`. OSError guard around open() is preserved so an unreadable rotated file still just skips to the next. Behavior identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

nitrobass24 merged commit fb02be9 into develop Apr 20, 2026
17 checks passed

nitrobass24 deleted the perf/logs-stream-reading branch April 20, 2026 20:29

Conversation

nitrobass24 commented Apr 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Impact

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

nitrobass24 commented Apr 20, 2026

Uh oh!

coderabbitai Bot commented Apr 20, 2026

Uh oh!

nitrobass24 commented Apr 20, 2026

Uh oh!

coderabbitai Bot commented Apr 20, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nitrobass24 commented Apr 20, 2026

Uh oh!

coderabbitai Bot commented Apr 20, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nitrobass24 commented Apr 20, 2026

Uh oh!

coderabbitai Bot commented Apr 20, 2026

Uh oh!

nitrobass24 commented Apr 20, 2026

Uh oh!

coderabbitai Bot commented Apr 20, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nitrobass24 commented Apr 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading