Skip to content

Track update detection timestamps on packages#347

Merged
kaste merged 15 commits intomainfrom
update-latency-data
Mar 23, 2026
Merged

Track update detection timestamps on packages#347
kaste merged 15 commits intomainfrom
update-latency-data

Conversation

@kaste
Copy link
Copy Markdown
Collaborator

@kaste kaste commented Mar 23, 2026

This may be used in the status view; first we collect the data.

kaste added 15 commits March 22, 2026 20:41
Add `update_detected` to workspace package entries and set it when a
package's `last_modified` value changes across successful crawls.

- `update_detected` is not set on first discovery
- `update_detected` is unset on the next run
After writing results, collect packages whose `update_detected_at`
matches the run timestamp and print a readable summary line in the crawl
output.

Also move Oxford-list name formatting into `scripts._utils` and reuse it
from both `crawl.py` and `crawl_libraries.py`.
Add an early `NOW_TS` export in the crawl workflow job and reuse that
frozen value when building notes and invoking `scripts.collect_logs`.

This aligns run artifacts and log timestamps to one shared run marker,
and prepares `scripts.crawl` to read the same `NOW_TS` value.
Update `scripts.crawl` to honor `NOW_TS` first to freeze the time during
runtime.

`NOW_TS` is accepted as Unix epoch seconds.
Make `--timestamp` optional for collect_logs.

Timestamp precedence is now:
1) explicit `--timestamp`
2) `NOW_TS` environment variable

If neither is available, keep the previous failure behavior and raise
`collect_logs: missing --timestamp`.

Add tests for NOW_TS fallback, explicit-arg precedence over NOW_TS, and
missing-timestamp failure.
Rename `forced_timestamp` to `runtime_ts` in collect_logs for clearer
naming aligned with run-level timestamp semantics.
Remove `--timestamp` from the collect_logs invocation in crawl.yml.
collect_logs now resolves the run timestamp from `NOW_TS` when no
explicit timestamp argument is provided.
Add an expanded comment block above the NOW_TS export step in crawl.yml.

The comment explains why the timestamp is frozen once per run and calls
out that both scripts/crawl.py and scripts/collect_logs.py consume the
same run-level timestamp.
Extend collect_logs with optional --workspace support so each new log
entry can include structured package update detections for the run.

When a workspace path is provided, collect_logs now derives
`found_updates` by matching package `update_detected` timestamps against
the frozen run timestamp, emits deterministic name ordering, and keeps
`published_at` optional when `last_modified` is missing.

Also wire the crawl workflow to pass --workspace so production logs
include found_updates, and add focused tests for matching, ordering,
empty lists, dedupe behavior, and timestamp precedence.
Make collect_logs.retention_cutoff require a reference datetime instead
of accepting an optional argument.

The helper is only called with an explicit reference at its sole call
site, so the optional branch was unnecessary.
Inline the retention cutoff calculation in update_logs since the helper
only wrapped a single trivial expression.

This removes one indirection while keeping behavior unchanged.
Replace collect_logs' `now_utc()` helper with a `now_ts()` helper that
matches crawl's NOW_TS-aware behavior.

This keeps retention pruning aligned with the same frozen run timestamp
mechanism when NOW_TS is exported.

Also update collect_logs tests to patch `now_ts` instead of `now_utc`.
Add typed structures for collect_logs output entries:
- LogEntry
- FoundUpdateEntry

This makes the logs.json schema explicit in code and clarifies the
optional found_updates payload shape.
Align collect_logs with crawl invariants: if `update_detected` matches
this run, `last_modified` must be present too.

`found_updates.published_at` is hence a required string.
- actions/cache v4 -> v5
- actions/checkout v4 -> v6
- actions/setup-python v5 -> v6
- actions/upload-artifact v4 -> v7
@kaste kaste merged commit 328a617 into main Mar 23, 2026
3 checks passed
@kaste kaste deleted the update-latency-data branch March 23, 2026 19:26
@kaste
Copy link
Copy Markdown
Collaborator Author

kaste commented Mar 23, 2026

"update detection timestamps" that means record the timestamp when we discovered an update.

a) we print out in the release notes if we found any updates which is nice to have
b) we can compute how long it took from package release to crawler detected it which is usually a base metric for a crawler (the lower the better within a given 💵 budget)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant