-
Notifications
You must be signed in to change notification settings - Fork 1
Problems 1925 3100 3133 3164 3197 3228 3260 3291 3320 3351 3380 3413 3444 3471 #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems 1925 3100 3133 3164 3197 3228 3260 3291 3320 3351 3380 3413 3444 3471 #108
Conversation
…er line - Updated format_json_value to format number arrays with multiple numbers per line - Numbers wrap at printWidth (150) to match Prettier style - Matches the formatting style in book-sets.json
Reviewer's GuideUpdates JSON normalization to support configurable line-wrapping for numeric arrays and reapplies formatting to book-sets.json to match the new style. Flow diagram for updated JSON normalization and numeric array wrappingflowchart TD
A["sort_json_by_numeric_keys called with input_file and output_file"] --> B["Read raw JSON from input_file"]
B --> C["Parse JSON into data structure"]
C --> D["Sort top level keys numerically into sorted_data"]
D --> E["Initialize lines list with '{'"]
E --> F["For each key,value in sorted_data"]
subgraph G["Formatting each value"]
direction TB
F --> G1["Call format_json_value(value, indent_level 1, print_width 150)"]
G1 --> H{Value type?}
H --> I["None"]
H --> J["bool"]
H --> K["int or float"]
H --> L["str"]
H --> M["dict"]
H --> N["list"]
M --> M1["For each item in sorted dict: format_json_value(v, indent_level+1, print_width)"]
M1 --> M2["Join formatted items with commas and newlines, wrap in '{ }'"]
N --> N1{"List empty?"}
N1 --> N2["Return '[]'"]
N1 --> N3{"First element is number?"}
N3 --> O["Numeric array formatting with wrapping"]
subgraph P["Numeric array wrapping algorithm"]
direction TB
O --> P1["Compute available_width = print_width - len(next_indent) - 2"]
P1 --> P2["Initialize lines, current_line, current_length = 0"]
P2 --> P3["For each item in list"]
P3 --> P4["Compute item_str and item_length (include comma+space if needed)"]
P4 --> P5{"current_length + item_length > available_width and current_line not empty?"}
P5 --> P6["Append current_line joined by ', ' to lines and start new line with item"]
P5 --> P7["Append item to current_line and update current_length"]
P6 --> P8["After loop, append remaining current_line to lines"]
P7 --> P8
P8 --> P9["Indent each line with next_indent and wrap with '[ ]'"]
end
N3 --> Q["Non numeric list formatting"]
Q --> Q1["formatted_items = format_json_value(item, indent_level+1, print_width) for each item"]
Q1 --> Q2["Compute total_length of formatted_items"]
Q2 --> Q3{"total_length < 100 and len(list) <= 5?"}
Q3 --> Q4["Render on single line"]
Q3 --> Q5["Render one item per line with indentation"]
end
G --> R["Append ' key: formatted_value' to lines"]
R --> S{"More keys?"}
S --> F
S --> T["Append '}' to lines"]
T --> U["Join lines with newlines"]
U --> V["Write normalized JSON to output_file"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Caution Review failedThe pull request is closed. WalkthroughA new optional Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Poem
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- In the numeric list branch you switched from
json.dumpstostr(item), which may change how floats, large ints, or non-ASCII values are serialized compared to the rest of the JSON; consider usingjson.dumps(item, ensure_ascii=False)consistently to avoid subtle formatting differences. - The
print_widthis currently hard-coded as 150 insort_json_by_numeric_keys; if different widths might be useful, consider threading this through as a parameter rather than a constant.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In the numeric list branch you switched from `json.dumps` to `str(item)`, which may change how floats, large ints, or non-ASCII values are serialized compared to the rest of the JSON; consider using `json.dumps(item, ensure_ascii=False)` consistently to avoid subtle formatting differences.
- The `print_width` is currently hard-coded as 150 in `sort_json_by_numeric_keys`; if different widths might be useful, consider threading this through as a parameter rather than a constant.
## Individual Comments
### Comment 1
<location> `scripts/normalize_json.py:42-51` </location>
<code_context>
+ for i, item in enumerate(value):
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider using `json.dumps` instead of `str` for numeric items to keep JSON formatting consistent.
In this numeric-list branch you're using `str(item)`/`str(x)` for both measuring and emitting, whereas the previous code used `json.dumps(item, ensure_ascii=False)`. That change can alter JSON formatting and make behavior diverge from other branches. Consider computing `item_str = json.dumps(item, ensure_ascii=False)` once and reusing it for both width calculation and output.
Suggested implementation:
```python
for i, item in enumerate(value):
item_str = json.dumps(item, ensure_ascii=False)
# Add comma and space length (2) if not first item on line
item_length = len(item_str) + (2 if current_line else 0)
if current_length + item_length > available_width and current_line:
# Start a new line
lines.append(", ".join(current_line))
current_line = [item_str]
current_length = len(item_str)
else:
```
1. Ensure `import json` is present at the top of `scripts/normalize_json.py` if it's not already imported.
2. In the `else:` branch (not shown), make sure you append `item_str` to `current_line` (not the raw `item`) and keep `current_length` updated with `len(item_str)` plus the comma/space when applicable.
</issue_to_address>
### Comment 2
<location> `scripts/normalize_json.py:17-20` </location>
<code_context>
-def format_json_value(value, indent_level=0):
+def format_json_value(value, indent_level=0, print_width=150):
"""Format a JSON value with custom formatting following Prettier style."""
indent = " " * indent_level
</code_context>
<issue_to_address>
**suggestion:** The `print_width` parameter isn’t applied consistently, e.g. the non-numeric list single-line threshold is still hardcoded.
`print_width` is used for numeric lists, but other arrays still rely on the hardcoded `total_length < 100` check. Consider basing that threshold on `print_width` (optionally with a margin) so the width configuration is applied consistently across all list types.
Suggested implementation:
```python
def format_json_value(value, indent_level=0, print_width=150):
```
```python
elif isinstance(value, list):
if not value:
```
```python
total_length = len(indent) + 2 + sum(len(item) + 2 for item in items) - 2
max_width_for_list = max(print_width - len(indent), 0)
if total_length <= max_width_for_list:
```
I assumed the non-numeric list single-line decision currently uses `if total_length < 100:` after building `items`. If the condition appears in multiple branches (e.g. for different list element types), apply the same replacement to each occurrence so all non-numeric array formatting respects `print_width`. Also ensure that all recursive calls to `format_json_value` in list handling pass the `print_width` argument through (as is already done in the dict branch shown). If the `total_length` calculation differs, keep it as-is and only adjust the width check to use `print_width` via `max_width_for_list`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| for i, item in enumerate(value): | ||
| item_str = str(item) | ||
| # Add comma and space length (2) if not first item on line | ||
| item_length = len(item_str) + (2 if current_line else 0) | ||
|
|
||
| if current_length + item_length > available_width and current_line: | ||
| # Start a new line | ||
| lines.append(", ".join(str(x) for x in current_line)) | ||
| current_line = [item] | ||
| current_length = len(item_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Consider using json.dumps instead of str for numeric items to keep JSON formatting consistent.
In this numeric-list branch you're using str(item)/str(x) for both measuring and emitting, whereas the previous code used json.dumps(item, ensure_ascii=False). That change can alter JSON formatting and make behavior diverge from other branches. Consider computing item_str = json.dumps(item, ensure_ascii=False) once and reusing it for both width calculation and output.
Suggested implementation:
for i, item in enumerate(value):
item_str = json.dumps(item, ensure_ascii=False)
# Add comma and space length (2) if not first item on line
item_length = len(item_str) + (2 if current_line else 0)
if current_length + item_length > available_width and current_line:
# Start a new line
lines.append(", ".join(current_line))
current_line = [item_str]
current_length = len(item_str)
else:- Ensure
import jsonis present at the top ofscripts/normalize_json.pyif it's not already imported. - In the
else:branch (not shown), make sure you appenditem_strtocurrent_line(not the rawitem) and keepcurrent_lengthupdated withlen(item_str)plus the comma/space when applicable.
| def format_json_value(value, indent_level=0, print_width=150): | ||
| """Format a JSON value with custom formatting following Prettier style.""" | ||
| indent = " " * indent_level | ||
| next_indent = " " * (indent_level + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: The print_width parameter isn’t applied consistently, e.g. the non-numeric list single-line threshold is still hardcoded.
print_width is used for numeric lists, but other arrays still rely on the hardcoded total_length < 100 check. Consider basing that threshold on print_width (optionally with a margin) so the width configuration is applied consistently across all list types.
Suggested implementation:
def format_json_value(value, indent_level=0, print_width=150): elif isinstance(value, list):
if not value: total_length = len(indent) + 2 + sum(len(item) + 2 for item in items) - 2
max_width_for_list = max(print_width - len(indent), 0)
if total_length <= max_width_for_list:I assumed the non-numeric list single-line decision currently uses if total_length < 100: after building items. If the condition appears in multiple branches (e.g. for different list element types), apply the same replacement to each occurrence so all non-numeric array formatting respects print_width. Also ensure that all recursive calls to format_json_value in list handling pass the print_width argument through (as is already done in the dict branch shown). If the total_length calculation differs, keep it as-is and only adjust the width check to use print_width via max_width_for_list.
Summary
Brief description of the changes in this PR.
Type of Change
Objective
For new features and performance improvements: Clearly describe the objective and rationale for this change.
Testing
Breaking Changes
If this is a breaking change, describe:
Checklist
Related Issues
Closes #[issue number]
Summary by Sourcery
Adjust JSON normalization formatting to support multi-value-per-line numeric arrays with a configurable print width and propagate this setting through JSON value formatting.
Enhancements:
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.