BE: SYNC API logging#1654
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughServer sync POST and client send_data were hardened with detailed logging, defensive JSON parsing, per-endpoint retry/alerting, and sequential plugin log file storage; SECURITY.md wording/formatting and datetime imports were adjusted; filename-parsing tests were added; a minor plugin boolean was reformatted. ChangesSync endpoint and client send flow
Docs, utilities, and minor formatting
Sequence Diagram(s): sequenceDiagram
participant Node as Node.send_data
participant Hub as sync_endpoint.handle_sync_post
participant Storage as PluginLogStorage
Node->>Hub: POST /sync (JSON: data,node_name,plugin) with Bearer auth
Hub->>Hub: log metadata, read raw body, json.loads (400 on failure)
Hub->>Storage: ensure plugin log dir, enumerate encoded/decoded files
Hub->>Storage: compute next filename, write coerced data
Hub-->>Node: HTTP 200 JSON success (or 400/500 error)
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/SECURITY.md`:
- Line 11: The disclaimer repeats the same guidance twice; edit the sentence in
SECURITY.md to remove the redundant clause so it reads concisely (keep one of
the two equivalent phrases). Specifically, remove either "Always test and secure
your setup before exposing it to the outside world." or "Always properly secure
and isolate your deployment before exposing it externally." so the final
paragraph states the responsibility disclaimer once and includes a single clear
admonition about securing and isolating deployments before external exposure.
In `@server/api_server/sync_endpoint.py`:
- Around line 78-81: The code extracts data = body.get(...), node_name =
body.get(...), plugin = body.get(...) without ensuring body is a dict, which
causes AttributeError for JSON values like null/array/string; add a validation
immediately before these extractions (check isinstance(body, dict)) and if it
fails return a 400 response with a clear error message (e.g., "request body must
be a JSON object") so that the handler gracefully rejects non-object JSON
instead of raising an unhandled exception; update the extraction block that
references body, data, node_name, and plugin accordingly.
- Around line 136-141: The except block in sync_endpoint.py currently contains
an inline "import traceback" which violates the project convention; move the
"import traceback" statement to the module top-level imports and remove the
inline import inside the except Exception as e block (the block that calls mylog
with f"[SYNC API] WRITE_FAILED={e}" and traceback.format_exc()). Update the
file's import section to include traceback (and ensure no other inline imports
remain), leaving the except block to simply reference traceback.format_exc() and
log as before.
- Around line 59-72: The raw body is being consumed with
request.get_data(cache=False) which prevents request.get_json() from accessing
the payload; change the read to preserve the data for later JSON parsing by
using request.get_data with caching enabled (e.g. request.get_data(cache=True)
or omit the cache=False) or alternatively parse JSON directly from the
already-read raw bytes (e.g. json.loads(raw)) so that request.get_json() doesn't
operate on an empty stream; update the code around request.get_data, the raw
variable, and the subsequent request.get_json call in sync_endpoint.py
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: be75a910-0186-47b5-878f-96d346ae631d
📒 Files selected for processing (3)
docs/SECURITY.mdserver/api_server/sync_endpoint.pyserver/utils/datetime_utils.py
| # ---- EXTRACT FIELDS | ||
| data = body.get("data", "") | ||
| node_name = body.get("node_name", "") | ||
| plugin = body.get("plugin", "") |
There was a problem hiding this comment.
Missing validation that parsed JSON is a dictionary.
If the request body is valid JSON but not an object (e.g., null, [], "string"), body won't be a dict and body.get() will raise AttributeError, resulting in an unhandled 500 error instead of a descriptive 400.
+ if not isinstance(body, dict):
+ mylog("none", ["[SYNC API] JSON body is not an object"])
+ return jsonify({"error": "expected JSON object"}), 400
+
data = body.get("data", "")
node_name = body.get("node_name", "")
plugin = body.get("plugin", "")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # ---- EXTRACT FIELDS | |
| data = body.get("data", "") | |
| node_name = body.get("node_name", "") | |
| plugin = body.get("plugin", "") | |
| # ---- EXTRACT FIELDS | |
| if not isinstance(body, dict): | |
| mylog("none", ["[SYNC API] JSON body is not an object"]) | |
| return jsonify({"error": "expected JSON object"}), 400 | |
| data = body.get("data", "") | |
| node_name = body.get("node_name", "") | |
| plugin = body.get("plugin", "") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/api_server/sync_endpoint.py` around lines 78 - 81, The code extracts
data = body.get(...), node_name = body.get(...), plugin = body.get(...) without
ensuring body is a dict, which causes AttributeError for JSON values like
null/array/string; add a validation immediately before these extractions (check
isinstance(body, dict)) and if it fails return a 400 response with a clear error
message (e.g., "request body must be a JSON object") so that the handler
gracefully rejects non-object JSON instead of raising an unhandled exception;
update the extraction block that references body, data, node_name, and plugin
accordingly.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
server/api_server/sync_endpoint.py (2)
100-131:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftMake the sequential filename allocation atomic.
The
listdir()count on Lines 101-109 and theopen(..., "w")on Line 130 form a TOCTOU race. Two sync requests for the same{plugin, node_name}can compute the samefile_count, and the later write silently overwrites the earlier payload. If the sequence number must remain monotonic, reserve it atomically with exclusive creation and retry; otherwise switch to collision-free names such as UUIDs/timestamps.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/api_server/sync_endpoint.py` around lines 100 - 131, The current listdir-based sequence (encoded_files/decoded_files -> file_count) is vulnerable to TOCTOU races causing overwrites; change allocation so the sequence is reserved atomically: attempt exclusive-create retries when writing (use an atomic create flag/primitive such as os.open with O_CREAT|O_EXCL or equivalent) for the target name computed from file_count (referencing file_count and file_path_new), and loop incrementing the sequence on EEXIST until success; alternatively replace the sequence entirely with a collision-free name (UUID or timestamp) when constructing file_path_new to avoid the race.
99-121:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftDon't embed raw identifiers into the dot-delimited filename contract.
front/plugins/sync/sync.pylater recoverspluginandnode_nameby callingfile_name.split('.')and reading fixed indexes. A node name likehub.example.localproduceslast_result.<plugin>.encoded.hub.example.local.<n>.log, so the downstream parser only recovershuband misattributes the payload. Please sanitize/encode these fields before building the filename, or change both sides to use a delimiter/serialization that cannot collide with user-provided values.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/api_server/sync_endpoint.py` around lines 99 - 121, The filename construction uses raw plugin and node_name values which can contain dots and break the downstream parser that splits on '.' (see file_name.split('.') in front/plugins/sync/sync.py); fix by encoding or escaping plugin and node_name before building file_path_new (and the encoded_files/decoded_files glob logic) — e.g., replace or percent-encode dots (or use base64/url-encode) for both plugin and node_name when creating last_result.{plugin}.encoded.{node_name}.{n}.log and update the consumer (front/plugins/sync/sync.py) to decode/restore those values instead of relying on naive split; ensure the same encoding/decoding functions are used in the parts that list files (encoded_files/decoded_files), compute file_count, and when parsing filenames later so identifiers cannot collide with user-provided dots.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@server/api_server/sync_endpoint.py`:
- Around line 100-131: The current listdir-based sequence
(encoded_files/decoded_files -> file_count) is vulnerable to TOCTOU races
causing overwrites; change allocation so the sequence is reserved atomically:
attempt exclusive-create retries when writing (use an atomic create
flag/primitive such as os.open with O_CREAT|O_EXCL or equivalent) for the target
name computed from file_count (referencing file_count and file_path_new), and
loop incrementing the sequence on EEXIST until success; alternatively replace
the sequence entirely with a collision-free name (UUID or timestamp) when
constructing file_path_new to avoid the race.
- Around line 99-121: The filename construction uses raw plugin and node_name
values which can contain dots and break the downstream parser that splits on '.'
(see file_name.split('.') in front/plugins/sync/sync.py); fix by encoding or
escaping plugin and node_name before building file_path_new (and the
encoded_files/decoded_files glob logic) — e.g., replace or percent-encode dots
(or use base64/url-encode) for both plugin and node_name when creating
last_result.{plugin}.encoded.{node_name}.{n}.log and update the consumer
(front/plugins/sync/sync.py) to decode/restore those values instead of relying
on naive split; ensure the same encoding/decoding functions are used in the
parts that list files (encoded_files/decoded_files), compute file_count, and
when parsing filenames later so identifiers cannot collide with user-provided
dots.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a71038f3-42ed-4842-8afc-ebc5435899d8
📒 Files selected for processing (4)
docs/SECURITY.mdfront/plugins/sync/sync.pyserver/api_server/sync_endpoint.pyserver/plugin.py
✅ Files skipped from review due to trivial changes (2)
- server/plugin.py
- docs/SECURITY.md
…iers for PUSH and PULL modes
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@front/plugins/sync/sync.py`:
- Around line 189-194: The current parser uses presence of '.decoded.' or
'.encoded.' in file_name to choose the PUSH branch and extracts syncHubNodeName,
which incorrectly triggers on PULL names like
last_result.office.encoded.lab.log; change the condition to require both the
marker ('.decoded.' or '.encoded.') and that the filename also ends with the
pushed-file shape (i.e., matches the '.<counter>.log' suffix pattern) before
using the marker-based split, otherwise fall back to the existing
last_result.*.log extraction; update the same guard in the test file
test/plugins/test_sync_protocol.py to cover cases like
last_result.office.encoded.lab.log so tests assert the PULL branch is used.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f79d72d6-759c-483f-93eb-17961ef17372
📒 Files selected for processing (2)
front/plugins/sync/sync.pytest/plugins/test_sync_protocol.py
Summary by CodeRabbit
Documentation
Bug Fixes / Reliability
Tests
Chores