Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,26 @@
Changelog
=========

Unreleased
----------
- Add GitHub Discussions backups via GraphQL, including comments, replies,
optional attachment downloads, and per-repository incremental checkpoints.
- Add pull request review backups with ``--pull-reviews`` and one-time
incremental backfill for existing backups.
- Store incremental ``last_update`` checkpoints per repository resource instead
of using one global checkpoint for the whole output directory. Existing
backups use the legacy global checkpoint as a migration fallback, and the
legacy file is removed once existing issue/pull backups have resource
checkpoints (#62).
- Stop paginating pull requests during incremental backups once the sorted
results are at or older than the active checkpoint.
- Avoid re-fetching discussions and pull requests whose ``updated_at`` exactly
matches the active incremental checkpoint.
- Avoid extra release asset list requests by using asset metadata already
included in GitHub's releases response.
- Add ``--token-from-gh`` to read authentication from ``gh auth token``.


0.61.5 (2026-02-18)
-------------------
------------------------
Expand Down
63 changes: 46 additions & 17 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ github-backup

|PyPI| |Python Versions|

The package can be used to backup an *entire* `Github <https://github.com/>`_ organization, repository or user account, including starred repos, issues and wikis in the most appropriate format (clones for wikis, json files for issues).
The package can be used to backup an *entire* `Github <https://github.com/>`_ organization, repository or user account, including starred repos, issues, discussions and wikis in the most appropriate format (clones for wikis, json files for issues and discussions).

Requirements
============
Expand Down Expand Up @@ -36,16 +36,18 @@ Show the CLI help output::

CLI Help output::

github-backup [-h] [-t TOKEN_CLASSIC] [-f TOKEN_FINE] [-q] [--as-app]
[-o OUTPUT_DIRECTORY] [-l LOG_LEVEL] [-i]
github-backup [-h] [-t TOKEN_CLASSIC] [-f TOKEN_FINE] [--token-from-gh]
[-q] [--as-app] [-o OUTPUT_DIRECTORY] [-l LOG_LEVEL] [-i]
[--incremental-by-files]
[--starred] [--all-starred] [--starred-skip-size-over MB]
[--watched] [--followers] [--following] [--all]
[--issues] [--issue-comments] [--issue-events] [--pulls]
[--pull-comments] [--pull-commits] [--pull-details]
[--pull-comments] [--pull-reviews] [--pull-commits]
[--pull-details]
[--labels] [--hooks] [--milestones] [--security-advisories]
[--repositories] [--bare] [--no-prune] [--lfs] [--wikis]
[--gists] [--starred-gists] [--skip-archived] [--skip-existing]
[--discussions] [--repositories] [--bare] [--no-prune]
[--lfs] [--wikis] [--gists] [--starred-gists]
[--skip-archived] [--skip-existing]
[-L [LANGUAGES ...]] [-N NAME_REGEX] [-H GITHUB_HOST]
[-O] [-R REPOSITORY] [-P] [-F] [--prefer-ssh] [-v]
[--keychain-name OSX_KEYCHAIN_ITEM_NAME]
Expand All @@ -71,6 +73,7 @@ CLI Help output::
-f, --token-fine TOKEN_FINE
fine-grained personal access token (github_pat_....),
or path to token (file://...)
--token-from-gh read token from GitHub CLI (gh auth token)
-q, --quiet supress log messages less severe than warning, e.g.
info
--as-app authenticate as github app instead of as a user.
Expand All @@ -95,6 +98,7 @@ CLI Help output::
--issue-events include issue events in backup
--pulls include pull requests in backup
--pull-comments include pull request review comments in backup
--pull-reviews include pull request reviews in backup
--pull-commits include pull request commits in backup
--pull-details include more pull request details in backup [*]
--labels include labels in backup
Expand All @@ -103,6 +107,7 @@ CLI Help output::
--milestones include milestones in backup
--security-advisories
include security advisories in backup
--discussions include discussions in backup
--repositories include repository clone in backup
--bare clone bare repositories
--no-prune disable prune option for git fetch
Expand Down Expand Up @@ -143,8 +148,8 @@ CLI Help output::
applies if including releases
--skip-assets-on [SKIP_ASSETS_ON ...]
skip asset downloads for these repositories
--attachments download user-attachments from issues and pull
requests
--attachments download user-attachments from issues, pull requests,
and discussions
--throttle-limit THROTTLE_LIMIT
start throttling of GitHub API requests after this
amount of API requests remain
Expand All @@ -171,6 +176,8 @@ The positional argument ``USER`` specifies the user or organization account you

**Classic tokens** (``-t TOKEN``) are `slightly less secure <https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#personal-access-tokens-classic>`_ as they provide very coarse-grained permissions.

If you already authenticate with the `GitHub CLI <https://cli.github.com/>`_, you can use ``--token-from-gh`` to read the token with ``gh auth token`` instead of passing a token directly. This avoids placing the token in shell history or process arguments. When ``--github-host`` is set, the token is read with ``gh auth token --hostname HOST``.


Fine Tokens
~~~~~~~~~~~
Expand All @@ -181,7 +188,7 @@ Customise the permissions for your use case, but for a personal account full bac

**User permissions**: Read access to followers, starring, and watching.

**Repository permissions**: Read access to contents, issues, metadata, pull requests, and webhooks.
**Repository permissions**: Read access to contents, discussions, issues, metadata, pull requests, and webhooks.


GitHub Apps
Expand Down Expand Up @@ -262,9 +269,9 @@ LFS objects are fetched for all refs, not just the current checkout, ensuring a
About Attachments
-----------------

When you use the ``--attachments`` option with ``--issues`` or ``--pulls``, the tool will download user-uploaded attachments (images, videos, documents, etc.) from issue and pull request descriptions and comments. In some circumstances attachments contain valuable data related to the topic, and without their backup important information or context might be lost inadvertently.
When you use the ``--attachments`` option with ``--issues``, ``--pulls`` or ``--discussions``, the tool will download user-uploaded attachments (images, videos, documents, etc.) from issue, pull request and discussion descriptions and comments. In some circumstances attachments contain valuable data related to the topic, and without their backup important information or context might be lost inadvertently.

Attachments are saved to ``issues/attachments/{issue_number}/`` and ``pulls/attachments/{pull_number}/`` directories, where ``{issue_number}`` is the GitHub issue number (e.g., issue #123 saves to ``issues/attachments/123/``). Each attachment directory contains:
Attachments are saved to ``issues/attachments/{issue_number}/``, ``pulls/attachments/{pull_number}/`` and ``discussions/attachments/{discussion_number}/`` directories, where ``{issue_number}`` is the GitHub issue number (e.g., issue #123 saves to ``issues/attachments/123/``). Each attachment directory contains:

- The downloaded attachment files (named by their GitHub identifier with appropriate file extensions)
- If multiple attachments have the same filename, conflicts are resolved with numeric suffixes (e.g., ``report.pdf``, ``report_1.pdf``, ``report_2.pdf``)
Expand All @@ -284,6 +291,16 @@ The tool automatically extracts file extensions from HTTP headers to ensure file
**Fine-grained token limitation:** Due to a GitHub platform limitation, fine-grained personal access tokens (``github_pat_...``) cannot download attachments from private repositories directly. This affects both ``/assets/`` (images) and ``/files/`` (documents) URLs. The tool implements a workaround for image attachments using GitHub's Markdown API, which converts URLs to temporary JWT-signed URLs that can be downloaded. However, this workaround only works for images - document attachments (PDFs, text files, etc.) will fail with 404 errors when using fine-grained tokens on private repos. For full attachment support on private repositories, use a classic token (``-t``) instead of a fine-grained token (``-f``). See `#477 <https://github.com/josegonzalez/python-github-backup/issues/477>`_ for details.


About Discussions
-----------------

GitHub Discussions are backed up with GitHub's GraphQL API because the REST API does not expose discussions. Use ``--discussions`` to save each discussion as JSON under ``repositories/{repo}/discussions/{number}.json``. Discussion backups include the discussion body and metadata, category information, comments, and comment replies.

``--discussions`` is included in ``--all``. Unlike most REST API-backed resources, discussions require authentication because GitHub's GraphQL API requires a token. Fine-grained personal access tokens and GitHub Apps need read access to the repository's Discussions permission.

Incremental backups use a per-repository checkpoint at ``repositories/{repo}/discussions/last_update`` based on discussion ``updatedAt`` timestamps. This is separate from the repository-level ``last_update`` file so discussion activity is not missed if the repository's own update timestamp does not change. If you enable ``--discussions`` on an existing incremental backup, the first run performs a full discussions backup for each repository and creates the discussions checkpoint for future runs.


About security advisories
-------------------------

Expand Down Expand Up @@ -325,12 +342,24 @@ For finer control, avoid using ``--assets`` with starred repos, or use ``--skip-

Alternatively, consider just storing links to starred repos in JSON format with ``--starred``.

About pull request reviews
--------------------------

Use ``--pull-reviews`` with ``--pulls`` to include GitHub pull request review metadata under each pull request's ``review_data`` key. Reviews are separate from review comments: ``--pull-comments`` backs up inline review comments via ``comment_data`` and regular PR conversation comments via ``comment_regular_data``, while ``--pull-reviews`` backs up review state, submitted time, commit ID, and the top-level review body.

``--pull-reviews`` is included in ``--all``. Incremental backups use a per-repository checkpoint at ``repositories/{repo}/pulls/reviews_last_update``. If ``--pull-reviews`` is enabled on an existing incremental backup, the first run performs a one-time backfill for pull request reviews so older PRs are not skipped by the existing pull request checkpoint. Existing ``comment_data``, ``comment_regular_data`` and ``commit_data`` fields are preserved when only review data is being added.


Incremental Backup
------------------

Using (``-i, --incremental``) will only request new data from the API **since the last run (successful or not)**. e.g. only request issues from the API since the last run.
Using (``-i, --incremental``) will only request new data from the API **since the last successful resource backup**. e.g. only request issues from the API since the last issue backup for that repository.

Incremental checkpoints for issue and pull request API backups are stored per resource in that repository's backup directory (for example ``repositories/{repo}/issues/last_update``, ``repositories/{repo}/pulls/last_update`` or ``starred/{owner}/{repo}/pulls/last_update``). Older versions stored a single global ``last_update`` file in the output directory root. During migration, the legacy global checkpoint is used as a fallback only for resource directories that already contain backup data but do not yet have their own checkpoint. New repositories or newly enabled resources with no existing data get a full backup instead of inheriting an unrelated global checkpoint.

After all existing issue and pull request resource directories have per-resource checkpoints, the legacy global ``last_update`` file is removed automatically.

This means any blocking errors on previous runs can cause a large amount of missing data in backups.
This means any blocking errors on previous runs can cause missing data in backups for the affected repository resource.

Using (``--incremental-by-files``) will request new data from the API **based on when the file was modified on filesystem**. e.g. if you modify the file yourself you may miss something.

Expand All @@ -343,7 +372,7 @@ Known blocking errors

Some errors will block the backup run by exiting the script. e.g. receiving a 403 Forbidden error from the Github API.

If the incremental argument is used, this will result in the next backup only requesting API data since the last blocked/failed run. Potentially causing unexpected large amounts of missing data.
If the incremental argument is used, per-resource checkpoints are only advanced after that resource's backup work completes. A blocking error can still abort the overall run, but repositories and resources that were not processed will keep their previous checkpoints.

It's therefore recommended to only use the incremental argument if the output/result is being actively monitored, or complimented with periodic full non-incremental runs, to avoid unexpected missing data in a regular backup runs.

Expand Down Expand Up @@ -416,14 +445,14 @@ Quietly and incrementally backup useful Github user data (public and private rep
export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN
GH_USER=YOUR-GITHUB-USER

github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --security-advisories --repositories --wikis --releases --assets --attachments --pull-details --gists --starred-gists $GH_USER
github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-reviews --pull-commits --labels --milestones --security-advisories --discussions --repositories --wikis --releases --assets --attachments --pull-details --gists --starred-gists $GH_USER

Debug an error/block or incomplete backup into a temporary directory. Omit "incremental" to fill a previous incomplete backup. ::

export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN
GH_USER=YOUR-GITHUB-USER

github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER
github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-reviews --pull-commits --labels --milestones --discussions --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER

Pipe a token from stdin to avoid storing it in environment variables or command history (Unix-like systems only)::

Expand All @@ -439,7 +468,7 @@ This tool creates backups only, there is no inbuilt restore command.
cd /tmp/white-house/repositories/petitions/repository
git push --mirror git@github.com:WhiteHouse/petitions.git

**Issues, pull requests, comments, and other metadata** are saved as JSON files for archival purposes. The GitHub API does not support recreating this data faithfully, creating issues via the API has limitations:
**Issues, pull requests, discussions, comments, and other metadata** are saved as JSON files for archival purposes. The GitHub API does not support recreating this data faithfully, creating issues via the API has limitations:

- New issue/PR numbers are assigned (original numbers cannot be set)
- Timestamps reflect creation time (original dates cannot be set)
Expand Down
Loading