feat: Add JSON output format for machine-readable diffs (#60) by houfu · Pull Request #70 · houfu/redlines

houfu · 2025-10-12T03:47:22Z

Summary

Adds JSON output format to make diff results easily parseable by AI agents and automation tools.

Closes #60

Changes

New API Method

from redlines import Redlines
import json

r = Redlines(
    "The quick brown fox jumps over the lazy dog.",
    "The quick brown fox walks past the lazy dog."
)

# Get JSON output
json_output = r.output_json(pretty=True)
data = json.loads(json_output)

# Access changes programmatically
for change in data["changes"]:
    if change["type"] == "replace":
        print(f"Changed '{change['source_text']}' to '{change['test_text']}'")

JSON Output Structure

The JSON includes complete diff information:

Source and test texts - Original strings being compared
Token arrays - How the text was parsed internally
Changes array - All diff operations (equal, insert, delete, replace)
Statistics - Change counts and summary

Hybrid Position System

Each change provides both position types for maximum flexibility:

Character positions [start, end] - For direct string slicing (e.g., source[20:31])
Token positions [start, end] - For understanding internal representation

This design allows users to choose the most convenient approach for their use case while maintaining full transparency into how Redlines processes
the text.

Example Output

  {
    "source": "The quick brown fox jumps over the lazy dog.",
    "test": "The quick brown fox walks past the lazy dog.",
    "source_tokens": ["The ", "quick ", "brown ", "fox ", "jumps ", "over ", "the ", "lazy ", "dog."],
    "test_tokens": ["The ", "quick ", "brown ", "fox ", "walks ", "past ", "the ", "lazy ", "dog."],
    "changes": [
      {
        "type": "equal",
        "text": "The quick brown fox ",
        "source_position": [0, 20],
        "test_position": [0, 20],
        "source_token_position": [0, 4],
        "test_token_position": [0, 4]
      },
      {
        "type": "replace",
        "source_text": "jumps over ",
        "test_text": "walks past ",
        "source_position": [20, 31],
        "test_position": [20, 31],
        "source_token_position": [4, 6],
        "test_token_position": [4, 6]
      },
      {
        "type": "equal",
        "text": "the lazy dog.",
        "source_position": [31, 44],
        "test_position": [31, 44],
        "source_token_position": [6, 9],
        "test_token_position": [6, 9]
      }
    ],
    "stats": {
      "deletions": 0,
      "insertions": 0,
      "replacements": 1,
      "unchanged": 2,
      "total_changes": 1
    }
  }

Design Decisions

Keep "replace" as single operation - Maintains semantic meaning rather than decomposing into delete+insert
Include "equal" operations - Provides complete diff representation, not just changes
Expose token lists - Full transparency into tokenization process
Hybrid positions - Both character (convenience) and token (internal representation) positions

Files Changed

redlines/redlines.py - Added output_json() method with pretty-print support
redlines/json_schema.json - JSON Schema Draft 7 specification
tests/test_json_output.py - Comprehensive test suite (16 test cases)

Testing

✅ All 46 tests passing (16 new tests for JSON output)
✅ 96% code coverage on redlines.py
✅ mypy type checking passes
✅ Handles unicode, special characters, and edge cases
✅ Position accuracy verified for both character and token levels

Backwards Compatibility

This is a purely additive change - no existing functionality is modified. All existing tests continue to pass.

Implements issue #60 - JSON output format for AI agents and automation tools. Features: - New output_json() method with optional pretty-printing - Hybrid approach: both character and token positions - Complete diff representation including equal operations - Exposed token lists for full transparency - Comprehensive JSON Schema specification The JSON output includes: - Source and test texts - Token arrays showing text parsing - Changes array with all diff operations - Statistics about changes Each change provides: - Type (equal/insert/delete/replace) - Text content - Character positions for direct string access - Token positions for internal representation Testing: - 16 comprehensive test cases - All 46 tests pass - 96% code coverage on redlines.py Files: - redlines/redlines.py: Added output_json() method - redlines/json_schema.json: JSON Schema Draft 7 spec - tests/test_json_output.py: Complete test suite 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add explicit type annotations to resolve mypy errors in output_json() method. The change variable now has explicit type hint dict[str, t.Any] to handle None values in position fields for delete/insert operations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Adds explicit type hint dict[str, int] to resolve mypy strict mode error. Required for Python 3.10 compatibility with strict type checking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

houfu and others added 3 commits October 11, 2025 23:17

houfu merged commit bb5739a into main Oct 12, 2025
10 checks passed

houfu deleted the feat/json-output-format branch October 12, 2025 03:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add JSON output format for machine-readable diffs (#60)#70

feat: Add JSON output format for machine-readable diffs (#60)#70
houfu merged 3 commits into
mainfrom
feat/json-output-format

houfu commented Oct 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

houfu commented Oct 12, 2025

Summary

Changes

New API Method

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant