Skip to content

feat: Add JSON output format for machine-readable diffs (#60)#70

Merged
houfu merged 3 commits into
mainfrom
feat/json-output-format
Oct 12, 2025
Merged

feat: Add JSON output format for machine-readable diffs (#60)#70
houfu merged 3 commits into
mainfrom
feat/json-output-format

Conversation

@houfu
Copy link
Copy Markdown
Owner

@houfu houfu commented Oct 12, 2025

Summary

Adds JSON output format to make diff results easily parseable by AI agents and automation tools.

Closes #60

Changes

New API Method

from redlines import Redlines
import json

r = Redlines(
    "The quick brown fox jumps over the lazy dog.",
    "The quick brown fox walks past the lazy dog."
)

# Get JSON output
json_output = r.output_json(pretty=True)
data = json.loads(json_output)

# Access changes programmatically
for change in data["changes"]:
    if change["type"] == "replace":
        print(f"Changed '{change['source_text']}' to '{change['test_text']}'")

JSON Output Structure

The JSON includes complete diff information:

  • Source and test texts - Original strings being compared
  • Token arrays - How the text was parsed internally
  • Changes array - All diff operations (equal, insert, delete, replace)
  • Statistics - Change counts and summary

Hybrid Position System

Each change provides both position types for maximum flexibility:

  • Character positions [start, end] - For direct string slicing (e.g., source[20:31])
  • Token positions [start, end] - For understanding internal representation

This design allows users to choose the most convenient approach for their use case while maintaining full transparency into how Redlines processes
the text.

Example Output

  {
    "source": "The quick brown fox jumps over the lazy dog.",
    "test": "The quick brown fox walks past the lazy dog.",
    "source_tokens": ["The ", "quick ", "brown ", "fox ", "jumps ", "over ", "the ", "lazy ", "dog."],
    "test_tokens": ["The ", "quick ", "brown ", "fox ", "walks ", "past ", "the ", "lazy ", "dog."],
    "changes": [
      {
        "type": "equal",
        "text": "The quick brown fox ",
        "source_position": [0, 20],
        "test_position": [0, 20],
        "source_token_position": [0, 4],
        "test_token_position": [0, 4]
      },
      {
        "type": "replace",
        "source_text": "jumps over ",
        "test_text": "walks past ",
        "source_position": [20, 31],
        "test_position": [20, 31],
        "source_token_position": [4, 6],
        "test_token_position": [4, 6]
      },
      {
        "type": "equal",
        "text": "the lazy dog.",
        "source_position": [31, 44],
        "test_position": [31, 44],
        "source_token_position": [6, 9],
        "test_token_position": [6, 9]
      }
    ],
    "stats": {
      "deletions": 0,
      "insertions": 0,
      "replacements": 1,
      "unchanged": 2,
      "total_changes": 1
    }
  }

Design Decisions

  1. Keep "replace" as single operation - Maintains semantic meaning rather than decomposing into delete+insert
  2. Include "equal" operations - Provides complete diff representation, not just changes
  3. Expose token lists - Full transparency into tokenization process
  4. Hybrid positions - Both character (convenience) and token (internal representation) positions

Files Changed

  • redlines/redlines.py - Added output_json() method with pretty-print support
  • redlines/json_schema.json - JSON Schema Draft 7 specification
  • tests/test_json_output.py - Comprehensive test suite (16 test cases)

Testing

  • ✅ All 46 tests passing (16 new tests for JSON output)
  • ✅ 96% code coverage on redlines.py
  • ✅ mypy type checking passes
  • ✅ Handles unicode, special characters, and edge cases
  • ✅ Position accuracy verified for both character and token levels

Backwards Compatibility

This is a purely additive change - no existing functionality is modified. All existing tests continue to pass.

houfu and others added 3 commits October 11, 2025 23:17
Implements issue #60 - JSON output format for AI agents and automation tools.

Features:
- New output_json() method with optional pretty-printing
- Hybrid approach: both character and token positions
- Complete diff representation including equal operations
- Exposed token lists for full transparency
- Comprehensive JSON Schema specification

The JSON output includes:
- Source and test texts
- Token arrays showing text parsing
- Changes array with all diff operations
- Statistics about changes

Each change provides:
- Type (equal/insert/delete/replace)
- Text content
- Character positions for direct string access
- Token positions for internal representation

Testing:
- 16 comprehensive test cases
- All 46 tests pass
- 96% code coverage on redlines.py

Files:
- redlines/redlines.py: Added output_json() method
- redlines/json_schema.json: JSON Schema Draft 7 spec
- tests/test_json_output.py: Complete test suite

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add explicit type annotations to resolve mypy errors in output_json() method.
The change variable now has explicit type hint dict[str, t.Any] to handle
None values in position fields for delete/insert operations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Adds explicit type hint dict[str, int] to resolve mypy strict mode error.
Required for Python 3.10 compatibility with strict type checking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@houfu houfu merged commit bb5739a into main Oct 12, 2025
10 checks passed
@houfu houfu deleted the feat/json-output-format branch October 12, 2025 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add JSON output format for machine-readable diffs

1 participant