Stream log files line-by-line instead of readlines()#385
Stream log files line-by-line instead of readlines()#385nitrobass24 merged 5 commits intodevelopfrom
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 48 minutes and 55 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughStreams and parses rotated log files line-by-line in the logs handler using a bounded deque to retain only the last N completed entries; adds an integration test suite that validates ordering, multi-line continuations, filters, rotated-file semantics, and bounded-memory behavior with a large synthetic log. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client (HTTP)
participant Handler as LogsHandler
participant FS as Filesystem
participant Deque as BoundedDeque
Client->>Handler: GET /server/logs?limit=...
Handler->>FS: open rotated files (oldest→newest)
loop per line
FS-->>Handler: yield line
Handler->>Handler: parse header vs continuation
alt new header
Handler->>Deque: flush completed entry (apply before/filter)
Handler->>Deque: append if matched (deque truncates oldest)
else continuation
Handler->>Handler: append to current entry.message
end
end
Handler->>Deque: flush final in-progress entry (apply before/filter)
Handler->>Client: return JSON list(matched entries)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
CI's `ruff format --check` caught formatting drift in the new test_logs.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/python/web/handler/logs.py (1)
80-92:⚠️ Potential issue | 🟠 MajorProcess rotated logs from oldest to newest before applying
deque(maxlen=limit).Line 82 adds the active log first, then Line 84 walks
.log.1,.log.2, etc. Since active.logis newest, this traversal is newest → oldest; withdeque(maxlen=limit), older rotated entries can evict newer active-log entries when the combined logs exceedlimit.🐛 Proposed fix
- # Gather log file paths: .log, .log.1, .log.2, ... up to backup count + # Gather log file paths oldest -> newest: .log.N, ..., .log.1, .log log_files: list[str] = [] - if os.path.isfile(base_path): - log_files.append(base_path) - for i in range(1, Constants.LOG_BACKUP_COUNT + 1): + for i in range(Constants.LOG_BACKUP_COUNT, 0, -1): rotated = "{}.{}".format(base_path, i) if os.path.isfile(rotated): log_files.append(rotated) + if os.path.isfile(base_path): + log_files.append(base_path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/python/web/handler/logs.py` around lines 80 - 92, The code builds log_files with the active base_path first then .log.1, .log.2, ... which results in newest→oldest ordering and causes deque(maxlen=limit) to evict newer active entries; change the construction so log_files is ordered oldest→newest (e.g., iterate rotated indices in reverse or append rotated files first and then base_path last, or simply reverse log_files before streaming). Update the logic around base_path, Constants.LOG_BACKUP_COUNT and log_files to ensure matched (the deque) receives entries oldest to newest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/python/tests/integration/test_web/test_handler/test_logs.py`:
- Around line 78-94: The test test_rotated_file_ordering_oldest_first has its
assertion reversed: change the expected ordering in the final assert to
["oldest", "middle", "newest"] so the test verifies oldest→newest across rotated
files (you may also update the docstring if desired); locate this in the test
method test_rotated_file_ordering_oldest_first and adjust the expected list used
in the assertion that compares [e["message"] for e in entries].
---
Outside diff comments:
In `@src/python/web/handler/logs.py`:
- Around line 80-92: The code builds log_files with the active base_path first
then .log.1, .log.2, ... which results in newest→oldest ordering and causes
deque(maxlen=limit) to evict newer active entries; change the construction so
log_files is ordered oldest→newest (e.g., iterate rotated indices in reverse or
append rotated files first and then base_path last, or simply reverse log_files
before streaming). Update the logic around base_path, Constants.LOG_BACKUP_COUNT
and log_files to ensure matched (the deque) receives entries oldest to newest.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: aeb3f085-215a-42ae-a1d5-00a81456c666
📒 Files selected for processing (2)
src/python/tests/integration/test_web/test_handler/test_logs.pysrc/python/web/handler/logs.py
The previous implementation appended base_path (.log, newest) first and then .log.1, .log.2, ..., .log.N (oldest). Under the deque(maxlen=limit), this retained the OLDEST entries and evicted the newest — opposite of what a log viewer should return. RotatingFileHandler rotates by renaming .log -> .log.1 -> .log.2, so .log.N is the oldest surviving backup and .log is the currently-active file. Iterate in reverse (N..1) then .log last so the deque naturally retains the *newest* `limit` entries in chronological oldest->newest order. Also fix the test assertion in test_rotated_file_ordering_oldest_first which had been written to match the (incorrect) old behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/python/web/handler/logs.py`:
- Around line 111-136: Replace the manual open/try/finally close pattern with a
context manager: keep the OSError guard around the open call for log_file, but
after successfully opening use "with open(log_file, encoding='utf-8',
errors='replace') as f:" and move the "for line in f:" loop inside that
with-block, removing the outer try/finally and the explicit f.close(); preserve
all existing logic that uses _LOG_PATTERN, current_entry, and flush so behavior
remains identical.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a9a557f6-c981-496c-a795-684127fe8033
📒 Files selected for processing (2)
src/python/tests/integration/test_web/test_handler/test_logs.pysrc/python/web/handler/logs.py
Replaces the try/finally f.close() pattern with `with f:`. OSError guard around open() is preserved so an unreadable rotated file still just skips to the next. Behavior identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/python/web/handler/logs.py`:
- Around line 103-139: The log scanning still processes entire files even after
the pagination cursor (`before`) is exhausted; update the loop that iterates
log_files to stop scanning further files once global_entry_idx reaches or
exceeds before (i.e., when before != 0 and global_entry_idx >= before) to avoid
wasted I/O; specifically, add a check using the existing symbols
global_entry_idx, before and the flush() behaviour before opening/processing
each log_file (or inside the per-file loop) and break out of the outer
file-iteration early when the condition is met so no further regex matching or
file reads occur.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 65b08518-d098-400e-b88b-04501b4fd147
📒 Files selected for processing (1)
src/python/web/handler/logs.py
| def flush(entry: dict[str, str]) -> None: | ||
| nonlocal global_entry_idx | ||
| global_entry_idx += 1 | ||
| if before != 0 and global_entry_idx > before: | ||
| return | ||
| if self._entry_matches(entry, search, min_level): | ||
| matched.append(entry) | ||
|
|
||
| global_line_idx = 0 | ||
| for log_file in log_files: | ||
| try: | ||
| with open(log_file, encoding="utf-8", errors="replace") as f: | ||
| lines = f.readlines() | ||
| f = open(log_file, encoding="utf-8", errors="replace") | ||
| except OSError: | ||
| continue | ||
|
|
||
| current_entry = None | ||
| for line in lines: | ||
| match = _LOG_PATTERN.match(line) | ||
| if match: | ||
| # Flush previous entry | ||
| if current_entry is not None: | ||
| global_line_idx += 1 | ||
| if before == 0 or global_line_idx <= before: | ||
| if self._entry_matches(current_entry, search, min_level): | ||
| entries.append(current_entry) | ||
| current_entry = { | ||
| "timestamp": match.group(1), | ||
| "level": match.group(2), | ||
| "logger": match.group(3), | ||
| "process": match.group(4), | ||
| "thread": match.group(5), | ||
| "message": match.group(6), | ||
| } | ||
| elif current_entry is not None: | ||
| # Continuation line (traceback, etc.) | ||
| current_entry["message"] += "\n" + line.rstrip() | ||
|
|
||
| # Flush last entry | ||
| current_entry: dict[str, str] | None = None | ||
| with f: | ||
| for line in f: | ||
| match = _LOG_PATTERN.match(line) | ||
| if match: | ||
| # Header line: flush any previous entry, then start a new one. | ||
| if current_entry is not None: | ||
| flush(current_entry) | ||
| current_entry = { | ||
| "timestamp": match.group(1), | ||
| "level": match.group(2), | ||
| "logger": match.group(3), | ||
| "process": match.group(4), | ||
| "thread": match.group(5), | ||
| "message": match.group(6), | ||
| } | ||
| elif current_entry is not None: | ||
| # Continuation line (traceback, etc.) — append to current entry. | ||
| current_entry["message"] += "\n" + line.rstrip() | ||
|
|
||
| # End-of-file flush: preserves old behaviour of flushing each file's | ||
| # final entry before moving on to the next rotated file. | ||
| if current_entry is not None: | ||
| global_line_idx += 1 | ||
| if before == 0 or global_line_idx <= before: | ||
| if self._entry_matches(current_entry, search, min_level): | ||
| entries.append(current_entry) | ||
|
|
||
| # Return the most recent entries (last N) | ||
| if len(entries) > limit: | ||
| entries = entries[-limit:] | ||
| flush(current_entry) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider early-terminating once the before cursor is exhausted.
When a client paginates with a small before value, flush() becomes a no-op for every subsequent entry but the handler still fully streams and regex-matches the remainder of every rotated file. For large .log + many backups this is wasted I/O per paginated request. Since entries flow oldest→newest and global_entry_idx only grows, you can short-circuit once before != 0 and global_entry_idx >= before (the deque can no longer be affected).
♻️ Proposed refactor
def flush(entry: dict[str, str]) -> None:
nonlocal global_entry_idx
global_entry_idx += 1
if before != 0 and global_entry_idx > before:
return
if self._entry_matches(entry, search, min_level):
matched.append(entry)
for log_file in log_files:
try:
f = open(log_file, encoding="utf-8", errors="replace")
except OSError:
continue
current_entry: dict[str, str] | None = None
with f:
for line in f:
match = _LOG_PATTERN.match(line)
if match:
# Header line: flush any previous entry, then start a new one.
if current_entry is not None:
flush(current_entry)
+ if before != 0 and global_entry_idx >= before:
+ current_entry = None
+ break
current_entry = {
"timestamp": match.group(1),
"level": match.group(2),
"logger": match.group(3),
"process": match.group(4),
"thread": match.group(5),
"message": match.group(6),
}
elif current_entry is not None:
# Continuation line (traceback, etc.) — append to current entry.
current_entry["message"] += "\n" + line.rstrip()
# End-of-file flush: preserves old behaviour of flushing each file's
# final entry before moving on to the next rotated file.
if current_entry is not None:
flush(current_entry)
+ if before != 0 and global_entry_idx >= before:
+ break🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/python/web/handler/logs.py` around lines 103 - 139, The log scanning
still processes entire files even after the pagination cursor (`before`) is
exhausted; update the loop that iterates log_files to stop scanning further
files once global_entry_idx reaches or exceeds before (i.e., when before != 0
and global_entry_idx >= before) to avoid wasted I/O; specifically, add a check
using the existing symbols global_entry_idx, before and the flush() behaviour
before opening/processing each log_file (or inside the per-file loop) and break
out of the outer file-iteration early when the condition is met so no further
regex matching or file reads occur.
Once global_entry_idx >= before, no subsequent entry can contribute to the response (flush() already no-ops beyond that index). The outer file-iteration now breaks out at that point, skipping open() and regex scanning of any newer rotated files. Adds test_files_past_before_cursor_are_not_opened which patches open() inside the handler module and asserts that only the one file needed to saturate `before` is read. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes #374
Summary
LogsHandler._read_logsinsrc/python/web/handler/logs.pyto iterate each log file withfor line in f:instead off.readlines().deque(maxlen=limit)for completed entries since the response only returns the trailinglimit. A singlecurrent_entryis held outside the deque so continuation lines (tracebacks, multi-line messages) still get appended correctly.beforecursor.Impact
/server/logsrequest with defaultlimit=500is now O(limit × avg entry size) instead of O(total log bytes).Test plan
cd src/python && ruff check .→ cleancd src/python && pyright→ 0 errors, 0 warningssrc/python/tests/integration/test_web/test_handler/test_logs.pyadds 7 tests, all passlimit=500), measured viatracemalloc.get_traced_memory():readlines()impl (verified by monkey-patching the old code back in): 185 MB peak — fails the test's< 10 MBassertion as intendedtests/integration/test_web/suite: 82 pass (1 pre-existing timing flake intest_stream_status.py, reproduces on clean develop — unrelated)Summary by CodeRabbit
Tests
Performance