Skip to content

Stream log files line-by-line instead of readlines()#385

Merged
nitrobass24 merged 5 commits intodevelopfrom
perf/logs-stream-reading
Apr 20, 2026
Merged

Stream log files line-by-line instead of readlines()#385
nitrobass24 merged 5 commits intodevelopfrom
perf/logs-stream-reading

Conversation

@nitrobass24
Copy link
Copy Markdown
Owner

@nitrobass24 nitrobass24 commented Apr 20, 2026

Closes #374

Summary

  • Rewrites LogsHandler._read_logs in src/python/web/handler/logs.py to iterate each log file with for line in f: instead of f.readlines().
  • Uses a deque(maxlen=limit) for completed entries since the response only returns the trailing limit. A single current_entry is held outside the deque so continuation lines (tracebacks, multi-line messages) still get appended correctly.
  • Semantics preserved: rotated-file ordering, search / min_level / before filtering, continuation-line concatenation, 1-based global line index for the before cursor.

Impact

  • Peak memory for a /server/logs request with default limit=500 is now O(limit × avg entry size) instead of O(total log bytes).
  • Before defaults (10 MB × 10 rotations) could pull ~110 MB into memory per request; now bounded regardless of file size.

Test plan

  • cd src/python && ruff check . → clean
  • cd src/python && pyright → 0 errors, 0 warnings
  • New file src/python/tests/integration/test_web/test_handler/test_logs.py adds 7 tests, all pass
  • Memory test: 20 MB synthetic log (255 751 entries, limit=500), measured via tracemalloc.get_traced_memory():
    • New streaming impl: 0.31 MB peak
    • Old readlines() impl (verified by monkey-patching the old code back in): 185 MB peak — fails the test's < 10 MB assertion as intended
  • Existing tests/integration/test_web/ suite: 82 pass (1 pre-existing timing flake in test_stream_status.py, reproduces on clean develop — unrelated)

Summary by CodeRabbit

  • Tests

    • Added comprehensive integration tests for the logs endpoint covering ordering across rotated files, multiline/continuation entries, min-level and substring search filters, global "before" cursor semantics, and a bounded-memory validation using a large synthetic log.
  • Performance

    • Improved log reading to stream rotated logs and keep memory usage bounded while preserving correct ordering and filter behavior.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

Warning

Rate limit exceeded

@nitrobass24 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 48 minutes and 55 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 48 minutes and 55 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 62b123f9-5449-4819-9368-e70c238bb41d

📥 Commits

Reviewing files that changed from the base of the PR and between e68a3b0 and 757da56.

📒 Files selected for processing (2)
  • src/python/tests/integration/test_web/test_handler/test_logs.py
  • src/python/web/handler/logs.py
📝 Walkthrough

Walkthrough

Streams and parses rotated log files line-by-line in the logs handler using a bounded deque to retain only the last N completed entries; adds an integration test suite that validates ordering, multi-line continuations, filters, rotated-file semantics, and bounded-memory behavior with a large synthetic log.

Changes

Cohort / File(s) Summary
Handler Streaming
src/python/web/handler/logs.py
Replaced full-file readlines() with line-by-line streaming. Builds entries incrementally, uses deque(maxlen=limit), adds a flush() helper, maintains a global completed-entry index for before semantics, preserves multi-line continuation parsing, and returns matched entries without accumulating entire files. Imports deque.
Logs Handler Tests
src/python/tests/integration/test_web/test_handler/test_logs.py
Added TestLogsHandler suite (7 tests) and helper _header. Tests redirect handler logdir to a temp dir, generate rotated/active log files, verify default limit/order, continuation-line attachment, rotated-file ordering, min_level and search filters, before semantics (global index), and a bounded-memory test using tracemalloc with a ~20MB synthetic log.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client (HTTP)
    participant Handler as LogsHandler
    participant FS as Filesystem
    participant Deque as BoundedDeque
    Client->>Handler: GET /server/logs?limit=...
    Handler->>FS: open rotated files (oldest→newest)
    loop per line
        FS-->>Handler: yield line
        Handler->>Handler: parse header vs continuation
        alt new header
            Handler->>Deque: flush completed entry (apply before/filter)
            Handler->>Deque: append if matched (deque truncates oldest)
        else continuation
            Handler->>Handler: append to current entry.message
        end
    end
    Handler->>Deque: flush final in-progress entry (apply before/filter)
    Handler->>Client: return JSON list(matched entries)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hop through files, one line at a time,
Stitching traces, keeping order fine.
Deque holds tight to the freshest pick,
Memory light, my paws stay quick.
A carrot for logs—clean, small, sublime.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Stream log files line-by-line instead of readlines()' directly describes the main implementation change in the pull request.
Linked Issues check ✅ Passed All coding requirements from issue #374 are met: streaming line-by-line implemented, bounded deque used, semantics preserved, comprehensive tests added, linters clean.
Out of Scope Changes check ✅ Passed All changes are directly related to the streaming log files objective. The handler implementation, test suite, and commit message context manager update are all in scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/logs-stream-reading

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

CI's `ruff format --check` caught formatting drift in the new test_logs.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nitrobass24
Copy link
Copy Markdown
Owner Author

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@nitrobass24
Copy link
Copy Markdown
Owner Author

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/python/web/handler/logs.py (1)

80-92: ⚠️ Potential issue | 🟠 Major

Process rotated logs from oldest to newest before applying deque(maxlen=limit).

Line 82 adds the active log first, then Line 84 walks .log.1, .log.2, etc. Since active .log is newest, this traversal is newest → oldest; with deque(maxlen=limit), older rotated entries can evict newer active-log entries when the combined logs exceed limit.

🐛 Proposed fix
-        # Gather log file paths: .log, .log.1, .log.2, ... up to backup count
+        # Gather log file paths oldest -> newest: .log.N, ..., .log.1, .log
         log_files: list[str] = []
-        if os.path.isfile(base_path):
-            log_files.append(base_path)
-        for i in range(1, Constants.LOG_BACKUP_COUNT + 1):
+        for i in range(Constants.LOG_BACKUP_COUNT, 0, -1):
             rotated = "{}.{}".format(base_path, i)
             if os.path.isfile(rotated):
                 log_files.append(rotated)
+        if os.path.isfile(base_path):
+            log_files.append(base_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/python/web/handler/logs.py` around lines 80 - 92, The code builds
log_files with the active base_path first then .log.1, .log.2, ... which results
in newest→oldest ordering and causes deque(maxlen=limit) to evict newer active
entries; change the construction so log_files is ordered oldest→newest (e.g.,
iterate rotated indices in reverse or append rotated files first and then
base_path last, or simply reverse log_files before streaming). Update the logic
around base_path, Constants.LOG_BACKUP_COUNT and log_files to ensure matched
(the deque) receives entries oldest to newest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/python/tests/integration/test_web/test_handler/test_logs.py`:
- Around line 78-94: The test test_rotated_file_ordering_oldest_first has its
assertion reversed: change the expected ordering in the final assert to
["oldest", "middle", "newest"] so the test verifies oldest→newest across rotated
files (you may also update the docstring if desired); locate this in the test
method test_rotated_file_ordering_oldest_first and adjust the expected list used
in the assertion that compares [e["message"] for e in entries].

---

Outside diff comments:
In `@src/python/web/handler/logs.py`:
- Around line 80-92: The code builds log_files with the active base_path first
then .log.1, .log.2, ... which results in newest→oldest ordering and causes
deque(maxlen=limit) to evict newer active entries; change the construction so
log_files is ordered oldest→newest (e.g., iterate rotated indices in reverse or
append rotated files first and then base_path last, or simply reverse log_files
before streaming). Update the logic around base_path, Constants.LOG_BACKUP_COUNT
and log_files to ensure matched (the deque) receives entries oldest to newest.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: aeb3f085-215a-42ae-a1d5-00a81456c666

📥 Commits

Reviewing files that changed from the base of the PR and between 6ca434e and 1386d9a.

📒 Files selected for processing (2)
  • src/python/tests/integration/test_web/test_handler/test_logs.py
  • src/python/web/handler/logs.py

Comment thread src/python/tests/integration/test_web/test_handler/test_logs.py
The previous implementation appended base_path (.log, newest) first and
then .log.1, .log.2, ..., .log.N (oldest). Under the deque(maxlen=limit),
this retained the OLDEST entries and evicted the newest — opposite of
what a log viewer should return.

RotatingFileHandler rotates by renaming .log -> .log.1 -> .log.2, so
.log.N is the oldest surviving backup and .log is the currently-active
file. Iterate in reverse (N..1) then .log last so the deque naturally
retains the *newest* `limit` entries in chronological oldest->newest
order.

Also fix the test assertion in test_rotated_file_ordering_oldest_first
which had been written to match the (incorrect) old behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nitrobass24
Copy link
Copy Markdown
Owner Author

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/python/web/handler/logs.py`:
- Around line 111-136: Replace the manual open/try/finally close pattern with a
context manager: keep the OSError guard around the open call for log_file, but
after successfully opening use "with open(log_file, encoding='utf-8',
errors='replace') as f:" and move the "for line in f:" loop inside that
with-block, removing the outer try/finally and the explicit f.close(); preserve
all existing logic that uses _LOG_PATTERN, current_entry, and flush so behavior
remains identical.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a9a557f6-c981-496c-a795-684127fe8033

📥 Commits

Reviewing files that changed from the base of the PR and between 1386d9a and 54d11e8.

📒 Files selected for processing (2)
  • src/python/tests/integration/test_web/test_handler/test_logs.py
  • src/python/web/handler/logs.py

Comment thread src/python/web/handler/logs.py Outdated
Replaces the try/finally f.close() pattern with `with f:`. OSError guard
around open() is preserved so an unreadable rotated file still just
skips to the next. Behavior identical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nitrobass24
Copy link
Copy Markdown
Owner Author

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@nitrobass24
Copy link
Copy Markdown
Owner Author

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/python/web/handler/logs.py`:
- Around line 103-139: The log scanning still processes entire files even after
the pagination cursor (`before`) is exhausted; update the loop that iterates
log_files to stop scanning further files once global_entry_idx reaches or
exceeds before (i.e., when before != 0 and global_entry_idx >= before) to avoid
wasted I/O; specifically, add a check using the existing symbols
global_entry_idx, before and the flush() behaviour before opening/processing
each log_file (or inside the per-file loop) and break out of the outer
file-iteration early when the condition is met so no further regex matching or
file reads occur.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 65b08518-d098-400e-b88b-04501b4fd147

📥 Commits

Reviewing files that changed from the base of the PR and between 54d11e8 and e68a3b0.

📒 Files selected for processing (1)
  • src/python/web/handler/logs.py

Comment on lines +103 to +139
def flush(entry: dict[str, str]) -> None:
nonlocal global_entry_idx
global_entry_idx += 1
if before != 0 and global_entry_idx > before:
return
if self._entry_matches(entry, search, min_level):
matched.append(entry)

global_line_idx = 0
for log_file in log_files:
try:
with open(log_file, encoding="utf-8", errors="replace") as f:
lines = f.readlines()
f = open(log_file, encoding="utf-8", errors="replace")
except OSError:
continue

current_entry = None
for line in lines:
match = _LOG_PATTERN.match(line)
if match:
# Flush previous entry
if current_entry is not None:
global_line_idx += 1
if before == 0 or global_line_idx <= before:
if self._entry_matches(current_entry, search, min_level):
entries.append(current_entry)
current_entry = {
"timestamp": match.group(1),
"level": match.group(2),
"logger": match.group(3),
"process": match.group(4),
"thread": match.group(5),
"message": match.group(6),
}
elif current_entry is not None:
# Continuation line (traceback, etc.)
current_entry["message"] += "\n" + line.rstrip()

# Flush last entry
current_entry: dict[str, str] | None = None
with f:
for line in f:
match = _LOG_PATTERN.match(line)
if match:
# Header line: flush any previous entry, then start a new one.
if current_entry is not None:
flush(current_entry)
current_entry = {
"timestamp": match.group(1),
"level": match.group(2),
"logger": match.group(3),
"process": match.group(4),
"thread": match.group(5),
"message": match.group(6),
}
elif current_entry is not None:
# Continuation line (traceback, etc.) — append to current entry.
current_entry["message"] += "\n" + line.rstrip()

# End-of-file flush: preserves old behaviour of flushing each file's
# final entry before moving on to the next rotated file.
if current_entry is not None:
global_line_idx += 1
if before == 0 or global_line_idx <= before:
if self._entry_matches(current_entry, search, min_level):
entries.append(current_entry)

# Return the most recent entries (last N)
if len(entries) > limit:
entries = entries[-limit:]
flush(current_entry)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider early-terminating once the before cursor is exhausted.

When a client paginates with a small before value, flush() becomes a no-op for every subsequent entry but the handler still fully streams and regex-matches the remainder of every rotated file. For large .log + many backups this is wasted I/O per paginated request. Since entries flow oldest→newest and global_entry_idx only grows, you can short-circuit once before != 0 and global_entry_idx >= before (the deque can no longer be affected).

♻️ Proposed refactor
         def flush(entry: dict[str, str]) -> None:
             nonlocal global_entry_idx
             global_entry_idx += 1
             if before != 0 and global_entry_idx > before:
                 return
             if self._entry_matches(entry, search, min_level):
                 matched.append(entry)

         for log_file in log_files:
             try:
                 f = open(log_file, encoding="utf-8", errors="replace")
             except OSError:
                 continue
             current_entry: dict[str, str] | None = None
             with f:
                 for line in f:
                     match = _LOG_PATTERN.match(line)
                     if match:
                         # Header line: flush any previous entry, then start a new one.
                         if current_entry is not None:
                             flush(current_entry)
+                            if before != 0 and global_entry_idx >= before:
+                                current_entry = None
+                                break
                         current_entry = {
                             "timestamp": match.group(1),
                             "level": match.group(2),
                             "logger": match.group(3),
                             "process": match.group(4),
                             "thread": match.group(5),
                             "message": match.group(6),
                         }
                     elif current_entry is not None:
                         # Continuation line (traceback, etc.) — append to current entry.
                         current_entry["message"] += "\n" + line.rstrip()

             # End-of-file flush: preserves old behaviour of flushing each file's
             # final entry before moving on to the next rotated file.
             if current_entry is not None:
                 flush(current_entry)
+            if before != 0 and global_entry_idx >= before:
+                break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/python/web/handler/logs.py` around lines 103 - 139, The log scanning
still processes entire files even after the pagination cursor (`before`) is
exhausted; update the loop that iterates log_files to stop scanning further
files once global_entry_idx reaches or exceeds before (i.e., when before != 0
and global_entry_idx >= before) to avoid wasted I/O; specifically, add a check
using the existing symbols global_entry_idx, before and the flush() behaviour
before opening/processing each log_file (or inside the per-file loop) and break
out of the outer file-iteration early when the condition is met so no further
regex matching or file reads occur.

Once global_entry_idx >= before, no subsequent entry can contribute to
the response (flush() already no-ops beyond that index). The outer
file-iteration now breaks out at that point, skipping open() and regex
scanning of any newer rotated files.

Adds test_files_past_before_cursor_are_not_opened which patches open()
inside the handler module and asserts that only the one file needed to
saturate `before` is read.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nitrobass24 nitrobass24 merged commit fb02be9 into develop Apr 20, 2026
17 checks passed
@nitrobass24 nitrobass24 deleted the perf/logs-stream-reading branch April 20, 2026 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logs handler: stream log files instead of readlines() (up to ~110MB per request)

1 participant