fix(distill): per-day bucketing, UUID7 filenames, content-based timestamps#212
Conversation
…tamps (sageox#211) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: SageOx <ox@sageox.ai>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: SageOx <ox@sageox.ai>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: SageOx <ox@sageox.ai>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe PR refactors the distillation system to use a plan-driven architecture with per-day UUID-based memory files, high-water inference from existing files, and grouped discussion facts handling to address multi-run and missed-day issues in daily/weekly/monthly summarization. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as runDistill
participant StateLoader as loadDistillStateV2
participant HighWater as High-water Inference
participant FileSystem as Filesystem
participant Planner as determineLayers
participant Distiller as distillDaily/Weekly/Monthly
Client->>StateLoader: Load state (projectRoot, tcPath)
StateLoader->>FileSystem: Check if state file exists
alt State file exists
FileSystem-->>StateLoader: Return state
else State file missing
StateLoader->>HighWater: Infer marks from memory
HighWater->>FileSystem: Read daily/weekly/monthly files
FileSystem-->>HighWater: Return file metadata & dates
HighWater-->>StateLoader: Return inferred high-water times
end
StateLoader-->>Client: Return state
Client->>Planner: determineLayers(state, explicit, now)
Planner-->>Client: Return distillPlan (Daily, Weeks, Months)
alt plan.isEmpty() == false
Client->>FileSystem: Read pending observations & discussions
FileSystem-->>Client: Return data since high-water
opt Daily enabled
Client->>Distiller: distillDaily(plan.Daily data)
Distiller->>FileSystem: Write memory/daily/YYYY-MM-DD-{uuid7}.md
end
opt Weeks defined
Client->>Distiller: distillWeekly(plan.Weeks data)
Distiller->>FileSystem: Write memory/weekly/YYYY-W{wk}.md
end
opt Months defined
Client->>Distiller: distillMonthly(plan.Months data)
Distiller->>FileSystem: Write memory/monthly/YYYY-MM.md
end
Client->>StateLoader: Save updated state with new high-water
StateLoader->>FileSystem: Persist to cache directory
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cmd/ox/distill.go (1)
490-493:⚠️ Potential issue | 🟠 Major
--dry-runstill saves state at the end ofrunDistill.The PR objective states "Prevents
--dry-runfrom saving state as a side effect," and line 465 correctly guards the post-discussion-extraction save. However, line 491 unconditionally saves state after all distillation steps, including dry-run mode.🐛 Proposed fix to skip state save during dry-run
// save updated state + if distillDryRun { + return nil + } if err := saveDistillStateV2(projectRoot, state); err != nil { slog.Warn("failed to save distill state", "error", err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ox/distill.go` around lines 490 - 493, runDistill currently always calls saveDistillStateV2(projectRoot, state) at the end (unconditional on dry-run); change it to skip saving when the dry-run flag is set by wrapping that call in the same dry-run check used earlier (e.g., if dryRun / opts.DryRun / cfg.DryRun is true then do not call saveDistillStateV2). Ensure you reference the same dry-run boolean used at the earlier guard around the post-discussion save so the final save uses the identical condition and no state is persisted during --dry-run.
🧹 Nitpick comments (2)
cmd/ox/distill.go (2)
530-557: Week enumeration logic is correct but dense.The loop correctly identifies completed ISO weeks between
lastTimeandnow. The 13-week (91-day) max lookback for fresh states prevents unbounded processing.Consider adding a brief inline comment explaining that
weekEnd.After(cursor) && !weekEnd.After(now)selects only completed weeks where the Sunday has passed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ox/distill.go` around lines 530 - 557, The enumerateWeeks function is correct but a bit dense; add a short inline comment near the conditional that explains the intent of the check `weekEnd.After(cursor) && !weekEnd.After(now)` so future readers understand we only include ISO weeks whose Sunday (weekEnd from isoWeekRange) has passed relative to cursor and is not after now; place this comment next to the conditional inside the for loop (in the enumerateWeeks function, referencing cursor, weekEnd and isoWeekRange) and keep it concise.
761-761:DailyCountonly tracks observations, not discussion facts.Line 761 increments
state.DailyCountbylen(dayObs)but excludeslen(dayFacts). If this counter is meant to track total items distilled, facts should also be counted. If it's intentionally observation-only (for backward compatibility with v1'sObservationCount), consider adding a comment to clarify.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ox/distill.go` at line 761, state.DailyCount is only incremented by len(dayObs) but excludes len(dayFacts); decide intent and fix accordingly: if DailyCount should represent total distilled items, update the increment at the state.DailyCount line to add both len(dayObs) and len(dayFacts) (i.e., state.DailyCount += len(dayObs) + len(dayFacts)); if it is intentionally observation-only for compatibility, add a clear comment next to the state.DailyCount update documenting that it tracks only observations (and mention v1 ObservationCount) so future readers aren’t confused; refer to the variables state.DailyCount, dayObs and dayFacts to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/ai/uat/distill-fix-uat.md`:
- Line 28: The docs claim `last_daily` is "end of latest day" but the code
(distill.go, symbol state.LastDaily) sets it to the current processing time via
now.Format(time.RFC3339); update the documentation (distill-state-v2.json
example and the prose) to state that last_daily is the actual processing
timestamp in RFC3339 format (or alternatively change the code to set
state.LastDaily to the end-of-day timestamp if you prefer code behavior), and
mention the exact format (RFC3339) so the doc matches the implementation.
---
Outside diff comments:
In `@cmd/ox/distill.go`:
- Around line 490-493: runDistill currently always calls
saveDistillStateV2(projectRoot, state) at the end (unconditional on dry-run);
change it to skip saving when the dry-run flag is set by wrapping that call in
the same dry-run check used earlier (e.g., if dryRun / opts.DryRun / cfg.DryRun
is true then do not call saveDistillStateV2). Ensure you reference the same
dry-run boolean used at the earlier guard around the post-discussion save so the
final save uses the identical condition and no state is persisted during
--dry-run.
---
Nitpick comments:
In `@cmd/ox/distill.go`:
- Around line 530-557: The enumerateWeeks function is correct but a bit dense;
add a short inline comment near the conditional that explains the intent of the
check `weekEnd.After(cursor) && !weekEnd.After(now)` so future readers
understand we only include ISO weeks whose Sunday (weekEnd from isoWeekRange)
has passed relative to cursor and is not after now; place this comment next to
the conditional inside the for loop (in the enumerateWeeks function, referencing
cursor, weekEnd and isoWeekRange) and keep it concise.
- Line 761: state.DailyCount is only incremented by len(dayObs) but excludes
len(dayFacts); decide intent and fix accordingly: if DailyCount should represent
total distilled items, update the increment at the state.DailyCount line to add
both len(dayObs) and len(dayFacts) (i.e., state.DailyCount += len(dayObs) +
len(dayFacts)); if it is intentionally observation-only for compatibility, add a
clear comment next to the state.DailyCount update documenting that it tracks
only observations (and mention v1 ObservationCount) so future readers aren’t
confused; refer to the variables state.DailyCount, dayObs and dayFacts to locate
the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a759030d-0d77-49d9-b48e-c8aefea1828d
📒 Files selected for processing (5)
cmd/ox/distill.gocmd/ox/distill_discussions.gocmd/ox/distill_discussions_test.gocmd/ox/distill_test.godocs/ai/uat/distill-fix-uat.md
…ount, docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: SageOx <ox@sageox.ai>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
cmd/ox/distill.go (1)
58-60:inferDailyHighWatercomment does not match the return value.The comment says end-of-day, but the function currently returns start-of-day (
time.Parse("2006-01-02", ...)). Please align the comment with behavior to avoid future regressions.Also applies to: 87-91
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ox/distill.go` around lines 58 - 60, The docstring for inferDailyHighWater is inaccurate: the function returns the start-of-day (parsed via time.Parse("2006-01-02", ...)) but the comment says end-of-day; update the comment to state that inferDailyHighWater returns the start-of-day (midnight) UTC for the latest YYYY-MM-DD prefix, or zero time if no files, and make the equivalent wording change in the similar comment block around the other high-water inference (the block covering lines 87-91 / the other infer*HighWater function).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/ox/distill.go`:
- Around line 517-521: The monthly-rollover check uses a 30-day duration and
calls enumerateMonths from lastMonthly which can both miss month transitions and
re-include the already-processed month; change the logic to compare calendar
month/year (e.g., now.Year()/now.Month() vs
lastMonthly.Year()/lastMonthly.Month()) instead of a 30*24*time.Hour threshold,
and call enumerateMonths starting at the first day of the next month (e.g.,
lastMonthly.AddDate(0,1,0) or adjust enumerateMonths to exclude its start) so
you only generate truly new months; apply the same change to the other identical
block around enumerateMonths (lines ~564-585) and keep references to
lastMonthlyTime(), enumerateMonths, and the explicit variable in the updated
logic.
- Around line 194-199: In groupObservationsByDay, skip any observation whose
RecordedAt is the zero (sentinel) time before bucketing to avoid grouping into
"0001-01-01"; add a guard in the loop checking obs.RecordedAt.IsZero() (or
equivalent) and continue to the next observation so only valid RecordedAt values
are formatted and appended to the groups map.
In `@docs/ai/uat/distill-fix-uat.md`:
- Around line 58-61: The test expectation is stricter than current behavior
because inferDailyHighWater (in cmd/ox/distill.go) seeds the high-water mark to
start-of-day which lets pre-existing same-day observations be re-processed;
change inferDailyHighWater so it seeds the high-water mark to exclude
already-present same-day data (e.g., set the initial high-water to the end of
the previous day or to the
start-of-next-day/just-after-the-last-existing-timestamp) so recovery runs only
surface truly new observations for today; update the logic in
inferDailyHighWater and any callers that rely on its returned boundary to ensure
comparisons use a strict greater-than (>) check for new observations.
---
Nitpick comments:
In `@cmd/ox/distill.go`:
- Around line 58-60: The docstring for inferDailyHighWater is inaccurate: the
function returns the start-of-day (parsed via time.Parse("2006-01-02", ...)) but
the comment says end-of-day; update the comment to state that
inferDailyHighWater returns the start-of-day (midnight) UTC for the latest
YYYY-MM-DD prefix, or zero time if no files, and make the equivalent wording
change in the similar comment block around the other high-water inference (the
block covering lines 87-91 / the other infer*HighWater function).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c25652e3-c265-4006-94fa-4e1fe302aac5
📒 Files selected for processing (2)
cmd/ox/distill.godocs/ai/uat/distill-fix-uat.md
…erate fix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: SageOx <ox@sageox.ai>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cmd/ox/distill.go (1)
339-340:⚠️ Potential issue | 🟡 MinorReject invalid
--layervalues explicitly instead of silent no-op.A typo in
--layercurrently falls through toplan.isEmpty()and prints “Nothing to distill”, which is misleading for CLI users.Proposed fix
@@ now := time.Now().UTC() + + switch distillLayer { + case "", "daily", "weekly", "monthly": + default: + return fmt.Errorf("invalid --layer %q (expected: daily, weekly, monthly)", distillLayer) + } // determine which layers to run plan := determineLayers(state, distillLayer, now)Also applies to: 505-530
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ox/distill.go` around lines 339 - 340, Validate the --layer flag early instead of letting invalid values fall through to plan.isEmpty(): after flags are parsed (where distillCmd and distillLayer are defined), check if distillLayer is non-empty and is one of the allowed values ("daily","weekly","monthly"); if not, return/exit with a clear error message stating the invalid value and the allowed options. Apply the same validation in the other distill code path that also reads distillLayer (the second distill handler around the other diff block), and avoid treating an invalid layer as "Nothing to distill" by performing this check before creating or evaluating plan.isEmpty().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/ox/distill.go`:
- Around line 475-490: The loop currently returns on first failure and only
saves state once at the end, causing already-completed periods to be
reprocessed; after each successful call to distillDaily, distillWeekly, and
distillMonthly (inside the plan.Daily block and both plan.Weeks and plan.Months
loops) persist the progress by invoking the existing state-save/commit helper
used later in this file (the same logic currently executed at the end of
runDistill) and handle/return any save errors wrapped similarly, so each
completed period is recorded immediately and a later failure won’t cause
re-distill of finished periods.
- Around line 721-774: The loop currently writes and commits each day's memory
(writeMemoryFile, commitMemoryFile) but only updates state.LastDaily after the
whole loop, which causes retries to reprocess already-successful days; fix by
updating the checkpoint per successful day: after a successful write+commit for
a given day, set state.DailyCount += len(dayObs)+len(dayFacts), set
state.LastDaily = now.Format(time.RFC3339) (or a per-day timestamp you prefer)
and state.LastDailyHash = "" and then persist the state immediately using the
repository's existing state-save routine (the same persistence used elsewhere in
the codebase) so that retries will skip already-distilled days.
---
Outside diff comments:
In `@cmd/ox/distill.go`:
- Around line 339-340: Validate the --layer flag early instead of letting
invalid values fall through to plan.isEmpty(): after flags are parsed (where
distillCmd and distillLayer are defined), check if distillLayer is non-empty and
is one of the allowed values ("daily","weekly","monthly"); if not, return/exit
with a clear error message stating the invalid value and the allowed options.
Apply the same validation in the other distill code path that also reads
distillLayer (the second distill handler around the other diff block), and avoid
treating an invalid layer as "Nothing to distill" by performing this check
before creating or evaluating plan.isEmpty().
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0c9b30c0-7579-49f3-a1c1-325fc05f7dc1
📒 Files selected for processing (2)
cmd/ox/distill.godocs/ai/uat/distill-fix-uat.md
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/ai/uat/distill-fix-uat.md
…act files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: SageOx <ox@sageox.ai>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
cmd/ox/distill.go (1)
57-59: AligninferDailyHighWaterdocs with actual behavior.The comment says it returns end-of-day, but the implementation intentionally returns start-of-day. Keeping the contract precise avoids incorrect reuse by future callers.
Proposed fix
-// inferDailyHighWater scans memory/daily/ for the latest YYYY-MM-DD prefix. -// Returns end-of-day UTC for that date, or zero time if no files. +// inferDailyHighWater scans memory/daily/ for the latest YYYY-MM-DD prefix. +// Returns start-of-day UTC for that date, or zero time if no files.Also applies to: 84-90
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ox/distill.go` around lines 57 - 59, The function comment for inferDailyHighWater is inaccurate: it states the function returns the end-of-day UTC but the implementation returns the start-of-day; update the docstring to state it returns the start-of-day UTC for the latest YYYY-MM-DD prefix (or alternatively change inferDailyHighWater to compute end-of-day if that was intended) so the contract matches the actual behavior; reference the inferDailyHighWater function and ensure the same wording is corrected for the related comment block around lines 84-90.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/ox/distill_discussions.go`:
- Around line 243-249: The footer date match returned by factFooterDateRe
(inside parseFactDate) must be validated and rejected if it is a
sentinel/invalid date (e.g. "0001-01-01") before returning it; update
parseFactDate to parse the matched string (from factFooterDateRe) into a date
and only return it when it represents a valid/non-sentinel day (for example year
>= 2 or after a sensible epoch), otherwise continue to fallback logic
(factFilenameDateRe) or return empty; ensure you reference the
factFooterDateRe/factFilenameDateRe match handling so the invalid footer
timestamp is not used for bucketing.
In `@cmd/ox/distill.go`:
- Around line 246-249: Replace the silent-continue on read errors with a
fail-fast error return so unreadable distilled inputs don't get dropped: locate
the os.ReadFile call that assigns data, err (the line using
filepath.Join(dailyDir, f.name)) and the analogous block around the 306-308
region, and change the error branch to return/wrap the error (e.g. return
fmt.Errorf("reading distilled input %s: %w", filepath.Join(dailyDir, f.name),
err)) or propagate it up through the enclosing function instead of using
continue; ensure any callers handle the returned error appropriately.
---
Nitpick comments:
In `@cmd/ox/distill.go`:
- Around line 57-59: The function comment for inferDailyHighWater is inaccurate:
it states the function returns the end-of-day UTC but the implementation returns
the start-of-day; update the docstring to state it returns the start-of-day UTC
for the latest YYYY-MM-DD prefix (or alternatively change inferDailyHighWater to
compute end-of-day if that was intended) so the contract matches the actual
behavior; reference the inferDailyHighWater function and ensure the same wording
is corrected for the related comment block around lines 84-90.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 14073526-a176-4187-b7b8-ee0a206415a2
📒 Files selected for processing (2)
cmd/ox/distill.gocmd/ox/distill_discussions.go
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: SageOx <ox@sageox.ai>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: SageOx <ox@sageox.ai>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Summary
Fixes #211 —
ox distillhad several logical issues with how observations and discussions are assigned to daily summary files.Problems fixed:
YYYY-MM-DD-{uuid7}.md)RecordedAtdate, producing one LLM call + one file per day(created YYYY-MM-DD)or filename prefixenumerateWeeks/enumerateMonths--dry-runwould write the state file as a side effect, breaking subsequent runsChanges
cmd/ox/distill.goinferDailyHighWater,inferWeeklyHighWater,inferMonthlyHighWater— scan existing memory files to set high-water marks on fresh clonesendOfDay,endOfMonth,isoWeekRange,groupObservationsByDay,readDailyFilesForDateRange,readWeeklyFilesForMonthdistillPlan,isoWeek— structured plan with per-period enumerationdetermineLayersreturnsdistillPlanwith explicit week/month lists instead of[]stringdistillDailyloops per-day with UUID7 filenamesdistillWeeklytakes a specificisoWeek, reads only that week's daily filesdistillMonthlytakes a specific month, reads only overlapping weekly filesloadDistillStateV2acceptstcPathparameter; infers high-water marks when no state existsLastDailyuses actual processing time (not end-of-day) to support intra-day re-runscmd/ox/distill_discussions.godiscussionFactEntrywith parsed datereadPendingDiscussionFactsreturnsmap[string][]discussionFactEntrygrouped by dateFlow diagram
graph TD A[runDistill] --> B[loadDistillStateV2] B -->|v2 exists| C[Use existing state] B -->|v1 exists| D[Migrate v1 → v2] B -->|no state| E[Infer high-water from files] A --> F[determineLayers] F --> G[distillPlan] G -->|Daily=true| H[distillDaily] G -->|Weeks list| I[distillWeekly per week] G -->|Months list| J[distillMonthly per month] H --> K[groupObservationsByDay] K --> L[Per-day loop] L --> M[LLM call per day] M --> N["Write YYYY-MM-DD-{uuid7}.md"] I --> O[readDailyFilesForDateRange] O --> P[LLM call per week] P --> Q[Write YYYY-WXX.md] J --> R[readWeeklyFilesForMonth] R --> S[LLM call per month] S --> T[Write YYYY-MM.md]Test plan
distillPlanreturn type andloadDistillStateV2signaturego vetandgo buildcleancmd/ox/tests passdocs/ai/uat/distill-fix-uat.md🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
Tests