Skip to content

Conversation

@Iamrodos
Copy link
Contributor

Fixes #133 - Avoid rewriting unchanged JSON files

When backing up labels, milestones, releases, hooks, followers, and following, compare existing file content before writing and skip the write if content is identical, preserving file timestamps.

Added json_dump_if_changed() helper that reads existing files, compares content, and only writes if changed using atomic operations (temp file + rename). If the comparison can't be performed, it conservatively writes anyway.

As issues/pulls already use incremental logic and always fetch fresh data they are untouched. No point reading the file when we know it will be written anyway.

Example running backup twice on unchanged repository:

First run:

github-backup Iamrodos -t TOKEN --output-directory /tmp/test --labels --milestones --releases
2025-11-29T10:00:00.000: Saved 5 labels to disk
2025-11-29T10:00:00.100: Saved 3 milestones to disk
2025-11-29T10:00:00.200: Saved 2 releases to disk

Second run (nothing changed):

2025-11-29T10:05:00.000: 5 labels unchanged, skipped write
2025-11-29T10:05:00.100: 3 milestones unchanged, skipped write
2025-11-29T10:05:00.200: 2 releases unchanged, skipped write

Added pytest test suite (9 tests) covering the new functionality.

Side note

Made log messages consistent and past tense ("Saved" instead of "Saving"/"Writing"). In my case I mount a s3fuse for saving to so there is overhead to read. However, testing against large repositories showed these files are small (largest: 47KB for 9,000+ followers, typical: <10KB), so the read cost is negligible compared unnecessary writes and not preserving timestamp.

…, hooks, followers, and following

This change reduces unnecessary writes when backing up metadata that changes
infrequently. The implementation compares existing file content before writing
and skips the write if the content is identical, preserving file timestamps.

Key changes:
- Added json_dump_if_changed() helper that compares content before writing
- Uses atomic writes (temp file + rename) for all metadata files
- NOT applied to issues/pulls (they use incremental_by_files logic)
- Made log messages consistent and past tense ("Saved" instead of "Saving")
- Added informative logging showing skip counts

Fixes josegonzalez#133
@josegonzalez josegonzalez merged commit 83ff0ae into josegonzalez:master Nov 30, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

labels.json is always re-downloaded even if it's the same

2 participants