Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,22 @@ concurrency:

permissions:
contents: read
pull-requests: write

jobs:
auto-label:
name: Auto-label PR
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: bcoe/conventional-release-labels@v1
with:
token: ${{ secrets.GITHUB_TOKEN }}
ignored_types: '["chore"]'
type_labels: '{"feat":"feature","fix":"fix","perf":"performance","refactor":"refactor","docs":"documentation","test":"test","build":"build","ci":"ci","breaking_change":"breaking change"}'

lint-and-build:
name: Lint and build
runs-on: ubuntu-latest
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ All public symbols are importable directly from `q2google`:
| `GoProToPhotosSync` | Main orchestrator; runs discovery → transfer → create. |
| `GooglePhotosClient` | Resumable upload facade (`upload_file_path`, `create_media_items`). |
| `GooglePhotosOAuth` | Load, refresh, or obtain Google OAuth credentials. |
| `JsonFileBackend` | File-based `SyncStateBackend`; one JSON per session under a root directory. |
| `JsonFileBackend` | File-based `SyncStateBackend`; stores each session as a directory of JSON files (`meta.json`, `items/`, `batches/`). |
| `SessionState` | Full persisted session document (`to_dict` / `from_dict` for custom stores). |
| `SyncStateBackend` | Protocol — implement `load` / `save` to plug in any storage layer. |
| `Q2GoogleSettings` | Pydantic settings; batch sizes, timeouts, and paths with env-var overrides. |
Expand Down Expand Up @@ -295,7 +295,7 @@ All public symbols are importable directly from `q2google`:
| `GoProToPhotosSync` | Main orchestrator; runs discovery → transfer → create. |
| `GooglePhotosClient` | Resumable upload facade (`upload_file_path`, `create_media_items`). |
| `GooglePhotosOAuth` | Load, refresh, or obtain Google OAuth credentials. |
| `JsonFileBackend` | File-based `SyncStateBackend`; one JSON per session under a root directory. |
| `JsonFileBackend` | File-based `SyncStateBackend`; stores each session as a directory of JSON files (`meta.json`, `items/`, `batches/`). |
| `SessionState` | Full persisted session document (`to_dict` / `from_dict` for custom stores). |
| `SyncStateBackend` | Protocol — implement `load` / `save` to plug in any storage layer. |
| `Q2GoogleSettings` | Pydantic settings; batch sizes, timeouts, and paths with env-var overrides. |
Expand Down
2 changes: 1 addition & 1 deletion docs/ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ q2google moves assets from **GoPro cloud** (`gopro-api` / `AsyncGoProClient`) in
| `q2google/photos.py` | `GooglePhotosClient` + `GooglePhotoLibraryPort` — chunk upload and batched `batchCreate`. |
| `q2google/gphotos/` | Low-level Library v1 HTTP (`GooglePhotosAPI`), OAuth (`GooglePhotosOAuth`), Pydantic models. |
| `q2google/state/base.py` | `SessionState`, `ItemState`, `SyncStateBackend` protocol — persistence contract. |
| `q2google/state/local.py` | `JsonFileBackend` — one JSON file per session under a root directory. |
| `q2google/state/local.py` | `JsonFileBackend` — directory-tree backend; each session is a subdirectory containing `meta.json`, `items/*.json`, and `batches/*.json`. Reads legacy flat-file sessions transparently. |

## Sync pipeline (`sync_date_range`)

Expand Down
2 changes: 1 addition & 1 deletion docs/api/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ from q2google import (
| [`GoProToPhotosSync`](sync.md) | `q2google.sync` | Main orchestrator; runs discovery → transfer → create. |
| [`GooglePhotosClient`](photos.md) | `q2google.photos` | Resumable upload facade (`upload_file_path`, `create_media_items`). |
| [`GooglePhotosOAuth`](gphotos.md) | `q2google.gphotos.auth` | Load, refresh, or obtain Google OAuth credentials. |
| [`JsonFileBackend`](state.md) | `q2google.state.local` | File-based `SyncStateBackend`; one JSON per session under a root directory. |
| [`JsonFileBackend`](state.md) | `q2google.state.local` | File-based `SyncStateBackend`; stores each session as a directory of JSON files (`meta.json`, `items/`, `batches/`). |
| [`SessionState`](state.md) | `q2google.state.base` | Full persisted session document (`to_dict` / `from_dict`). |
| [`SyncStateBackend`](state.md) | `q2google.state.base` | Protocol — implement `load` / `save` to plug in any storage layer. |
| [`Q2GoogleSettings`](config.md) | `q2google.config` | Pydantic settings; batch sizes, timeouts, and paths with env-var overrides. |
Expand Down
2 changes: 1 addition & 1 deletion docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ q2google sync [OPTIONS]

| Option | Env var | Default | Description |
|--------|---------|---------|-------------|
| `--state-dir PATH` | `Q2GOOGLE_STATE_DIR` | `.q2google_sessions` | Directory for per-session JSON state files. |
| `--state-dir PATH` | `Q2GOOGLE_STATE_DIR` | `.q2google_sessions` | Root directory for per-session state; each session is stored as a subdirectory containing `meta.json`, `items/`, and `batches/`. |
| `--session-id TEXT` | `Q2GOOGLE_SESSION_ID` | auto-generated | Stable identifier; reuse to resume an interrupted run. |

### Transfer options
Expand Down
2 changes: 1 addition & 1 deletion docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ All settings are managed by `Q2GoogleSettings` — a [Pydantic `BaseSettings`](h

| Variable | Default | Description |
|----------|---------|-------------|
| `Q2GOOGLE_STATE_DIR` | `.q2google_sessions` | Root directory for per-session JSON state files. |
| `Q2GOOGLE_STATE_DIR` | `.q2google_sessions` | Root directory for per-session state; each session is stored as a subdirectory containing `meta.json`, `items/`, and `batches/`. |
| `Q2GOOGLE_SESSION_ID` | _(auto)_ | Default session identifier when `--session-id` is omitted from the CLI. |

## Transfer tuning
Expand Down
3 changes: 3 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,5 +88,8 @@ markdown_extensions:
format: !!python/name:pymdownx.superfences.fence_code_format
- pymdownx.tabbed:
alternate_style: true
- pymdownx.emoji:
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
- toc:
permalink: true
191 changes: 162 additions & 29 deletions q2google/state/local.py
Original file line number Diff line number Diff line change
@@ -1,81 +1,214 @@
"""Filesystem-backed :class:`~q2google.state.base.SyncStateBackend`.

Writes UTF-8 JSON per session using a temp file and :func:`os.replace` for atomic publish.
Intended for single-writer use; concurrent writers to the same session path are unsupported.
Stores each session as a directory tree with one JSON file per item and one per batch,
making concurrent writes to different items safe by construction. Each individual file
is published atomically via a temp file and :func:`os.replace`.

Layout::

{root}/
{session_id}/
meta.json # session metadata + stages (no items, no batches)
items/
{safe_file_name}.json # one file per ItemState
batches/
{batch_key}.json # one file per BatchState

Legacy flat-file sessions (``{session_id}.json``) written by older versions of this
module are still readable; ``load`` detects and falls back to that format transparently.
"""

from __future__ import annotations

import json
import os
from pathlib import Path
from typing import Any

from q2google.state.base import SessionState

_METADATA_KEYS: tuple[str, ...] = (
"schema_version",
"session_id",
"created_at",
"updated_at",
"start_date_iso",
"end_date_iso",
"batch_size",
"stages",
)


def _safe_name(name: str) -> str:
"""Sanitize ``name`` for safe use as a filesystem path component.

Args:
name: Raw string such as a session id, GoPro filename, or batch key.

Returns:
Version of ``name`` with path separators and ``..`` replaced by ``_``.
"""
return name.replace(os.sep, "_").replace("..", "_")


class JsonFileBackend:
"""Store each session as ``{root}/{sanitized_session_id}.json``."""
"""Store each session under ``{root}/{session_id}/`` as a directory of JSON files.

Each :class:`~q2google.state.base.ItemState` and
:class:`~q2google.state.base.BatchState` is written to its own file so that
concurrent writers updating different items never conflict. Session-level metadata
(stages, timestamps) lives in ``meta.json`` and is still subject to last-write-wins
semantics, but stage transitions are sequential in the current orchestrator so this
is not a practical concern.
"""

def __init__(self, root: str | Path) -> None:
"""Create the backend and ensure ``root`` exists.

Args:
root: Directory that will contain ``*.json`` session files.
root: Directory that will contain per-session subdirectories.
"""
self._root = Path(root)
self._root.mkdir(parents=True, exist_ok=True)

def _path(self, session_id: str) -> Path:
"""Resolve a safe filename under ``root`` for ``session_id``.
# ------------------------------------------------------------------
# Path helpers
# ------------------------------------------------------------------

def _session_dir(self, session_id: str) -> Path:
"""Return the session directory path for ``session_id``.

Args:
session_id: External session key (path separators and ``..`` neutralized).
session_id: External session key.

Returns:
Absolute path to the JSON file for this session.
``{root}/{safe(session_id)}/``
"""
safe = session_id.replace(os.sep, "_").replace("..", "_")
return self._root / f"{safe}.json"
return self._root / _safe_name(session_id)

def load(self, session_id: str) -> SessionState | None:
"""Load ``SessionState`` from disk when the JSON file exists.
def _meta_path(self, session_id: str) -> Path:
"""Return the path to the session metadata file.

Args:
session_id: Session key used when saving.
session_id: External session key.

Returns:
Parsed state, or ``None`` if the file is missing.
``{session_dir}/meta.json``
"""
return self._session_dir(session_id) / "meta.json"

Raises:
json.JSONDecodeError: If the file contents are not valid JSON.
def _item_path(self, session_id: str, file_name: str) -> Path:
"""Return the path to a single item file.

Args:
session_id: External session key.
file_name: GoPro logical filename used as the item key.

Returns:
``{session_dir}/items/{safe(file_name)}.json``
"""
path = self._path(session_id)
if not path.is_file():
return None
text = path.read_text(encoding="utf-8")
data = json.loads(text)
return SessionState.from_dict(data)
return self._session_dir(session_id) / "items" / f"{_safe_name(file_name)}.json"

def save(self, state: SessionState) -> None:
"""Write ``state`` atomically via temp file + replace.
def _batch_path(self, session_id: str, batch_key: str) -> Path:
"""Return the path to a single batch file.

Args:
state: Document whose ``session_id`` determines the output filename.
session_id: External session key.
batch_key: String batch index used as the batch key.

Returns:
``{session_dir}/batches/{safe(batch_key)}.json``
"""
return self._session_dir(session_id) / "batches" / f"{_safe_name(batch_key)}.json"

# ------------------------------------------------------------------
# I/O primitive
# ------------------------------------------------------------------

def _atomic_write(self, path: Path, data: dict[str, Any]) -> None:
"""Write ``data`` to ``path`` atomically via a temporary file and :func:`os.replace`.

Args:
path: Destination file path; parent directory is created if absent.
data: JSON-serializable mapping to persist.

Raises:
OSError: On failure to write the temp file or replace the destination.
"""
path = self._path(state.session_id)
path.parent.mkdir(parents=True, exist_ok=True)
payload = json.dumps(state.to_dict(), indent=2, ensure_ascii=False)
payload = json.dumps(data, indent=2, ensure_ascii=False)
tmp = path.with_suffix(f".{os.getpid()}.tmp")
try:
tmp.write_text(payload, encoding="utf-8")
os.replace(tmp, path)
except OSError:
if tmp.is_file():
tmp.unlink(missing_ok=True)
tmp.unlink(missing_ok=True)
raise

# ------------------------------------------------------------------
# SyncStateBackend interface
# ------------------------------------------------------------------

def load(self, session_id: str) -> SessionState | None:
"""Load ``SessionState`` for ``session_id`` from disk.

Falls back to the legacy flat-file format (``{session_id}.json``) when the
session directory does not exist, so sessions written by older versions of this
module remain readable without any migration step.

Args:
session_id: Session key used when saving.

Returns:
Parsed state, or ``None`` if neither the directory nor the legacy file exists.

Raises:
json.JSONDecodeError: If any JSON file on disk is malformed.
"""
legacy = self._root / f"{_safe_name(session_id)}.json"
if legacy.is_file():
return SessionState.from_dict(json.loads(legacy.read_text(encoding="utf-8")))

meta_path = self._meta_path(session_id)
if not meta_path.is_file():
return None

session_dir = self._session_dir(session_id)
data: dict[str, Any] = json.loads(meta_path.read_text(encoding="utf-8"))

data["items"] = {
d["file_name"]: d
for p in (session_dir / "items").glob("*.json")
for d in (json.loads(p.read_text(encoding="utf-8")),)
}
data["batches"] = {
str(d["batch_index"]): d
for p in (session_dir / "batches").glob("*.json")
for d in (json.loads(p.read_text(encoding="utf-8")),)
}
return SessionState.from_dict(data)

def save(self, state: SessionState) -> None:
"""Persist ``state`` by writing metadata, items, and batches to separate files.

Each file is written atomically. Writers updating different items never
conflict because they target distinct paths.

Args:
state: Complete session document to store.

Raises:
OSError: On failure to write any individual file.
"""
full = state.to_dict()
meta = {k: full[k] for k in _METADATA_KEYS}
self._atomic_write(self._meta_path(state.session_id), meta)

for file_name, item in full["items"].items():
self._atomic_write(self._item_path(state.session_id, file_name), item)

for batch_key, batch in full["batches"].items():
self._atomic_write(self._batch_path(state.session_id, str(batch_key)), batch)


__all__ = ["JsonFileBackend"]
Loading