Skip to content

feat(link): auto-group split archives into one package (task 31)#144

Merged
mpiton merged 5 commits intomainfrom
feat/task-31-split-archives
May 3, 2026
Merged

feat(link): auto-group split archives into one package (task 31)#144
mpiton merged 5 commits intomainfrom
feat/task-31-split-archives

Conversation

@mpiton
Copy link
Copy Markdown
Owner

@mpiton mpiton commented May 3, 2026

Summary

Wires split-archive detection and idempotent grouping in Link Grabber resolve flow. Clusters URLs by base name + format into distinct Packages with source_type=SplitArchive. Format-namespaced external_id prevents RAR/ZIP collision.

Why

PRD-v2 §P1.12 requirement. Closes the auto-grouping feature for multi-part archives (mirrors existing playlist grouper pattern).

Changes

  • New SplitArchiveGrouper service + GroupSplitArchivesCommand handler. Detects 5 archive formats (modern/legacy RAR, 7z, ZIP, tar variants) via regex matchers.
  • New SplitArchiveIncomplete domain event emitted when parts are missing.
  • Frontend types SplitArchiveLinkInput/SplitArchiveGroupResult for IPC bridge.
  • Shared group_lock module eliminates duplication between PlaylistGrouper and SplitArchiveGrouper.
  • DoS cap at MAX_LINKS=500 per call.
  • Format-namespaced external_id (split-archive:part-rar:base vs split-archive:zip:base) prevents same-base collision across archive types.

Testing

  • 26 service unit tests (detection fixtures + clustering + missing parts + DoS cap + concurrent reuse).
  • 3 handler integration tests (idempotency + missing repo error).
  • Full suite: 1399 Rust passed, 659 vitest passed.

```bash
cargo test --workspace
npx vitest run
```

Related Issues

Checklist

  • Tests added and passing locally
  • CHANGELOG.md updated
  • No secrets, debug prints, or commented-out code
  • Self-reviewed + /simplify pass
  • CI green

Summary by cubic

Auto-group split-archive URLs into one package per base name and format in Link Grabber. Prevents duplicate packages on re-resolve and surfaces missing parts before extraction. Aligns with PRD-v2 §P1.12 (Linear task 31).

  • New Features

    • Added SplitArchiveGrouper + GroupSplitArchivesCommand to detect modern/legacy RAR (legacy header .rar as part 0), 7z, ZIP, and tar.{gz,bz2,xz}; groups by base + format.
    • Packages use source_type = SplitArchive and external_id = "split-archive:{format_tag}:{base}" to avoid cross-format collisions.
    • Emits SplitArchiveIncomplete → frontend split-archive-incomplete.
    • New IPC link_group_split_archives with SplitArchiveLinkInput and SplitArchiveGroupResult for preview (sorted URLs, missing parts).
    • Caps: MAX_LINKS = 500, MAX_PART_INDEX = 10_000.
  • Refactors

    • Shared group_lock for grouper find-then-save; used by PlaylistGrouper and SplitArchiveGrouper. Lock released before event publish to reduce contention.
    • Hardened grouping: gate on distinct part numbers (not raw link count), reject absurd part indices, and use defensive checked_add in legacy RAR detection.

Written for commit 144c2ba. Summary will update on new commits.

Summary by CodeRabbit

  • New Features

    • Automatic grouping of split-archive downloads into packages (RAR—including legacy terminal .rar as part 0—7z, zip, tar). Batch preview of grouped packages (up to 500 links) with created-flag, ordered URLs, and missing-part indicators; grouping requires a minimum number of distinct parts.
  • Bug Fixes / Reliability

    • Better safeguards for very large part indices and malformed inputs; improved concurrency handling to avoid duplicate packages and release locks before publishing events.
  • Frontend

    • IPC and types added for submitting links and receiving grouping results and "split-archive-incomplete" notifications.

mpiton added 2 commits May 3, 2026 17:53
Cluster Link-Grabber URLs by detected archive base name (`*.partNN.rar`,
`*.rNN`, `*.7z.NNN`, `*.zip.NNN`, `*.tar.{gz,bz2,xz}.NNN`) and create one
`Package` per cluster with `source_type = SplitArchive` and
`external_id = "split-archive:{base}"` for idempotent re-resolve. New
`GroupSplitArchivesCommand` handler + `link_group_split_archives` Tauri
IPC mirror the playlist grouper flow.

Gaps in the part numbering emit a new `DomainEvent::SplitArchiveIncomplete
{ package_id, base_name, missing_parts }` event (forwarded to the
frontend as `split-archive-incomplete`) so the UI can warn the user
before extraction blocks.

Frontend `SplitArchiveLinkInput` / `SplitArchiveGroupResult` types added
in `src/types/media.ts`. 25 service unit tests (matcher fixtures +
grouping integration) + 3 handler tests cover the contract.

Refs PRD-v2 §P1.12, PRD §6.3.
Code-review pass on the task-31 split-archive grouper:

- Format-namespaced `external_id` (`split-archive:{format_tag}:{base}`)
  so a RAR set and a ZIP set sharing a base name produce two distinct
  packages instead of silently collapsing under one row. Replaces the
  previous base-only key documented as "acceptable" in the test that
  has now been rewritten to assert the correct behaviour.
- New `application/services/group_lock` module shared by
  `PlaylistGrouper` and `SplitArchiveGrouper`, removing a duplicated
  OnceLock + poisoned-mutex-recovery pair.
- `group_one_base` now releases the grouper lock before publishing
  `PackageCreated` / `SplitArchiveIncomplete` events so synchronous
  subscribers cannot block other concurrent grouping calls. Same
  refactor also collapses the duplicated reuse-vs-created branching
  into a single tail block.
- `MAX_LINKS = 500` cap on `group_all` mirrors `MAX_URLS` in
  `resolve_links` so a runaway IPC payload cannot allocate unbounded
  cluster state. New `test_group_all_caps_link_count_to_avoid_dos`
  test covers the path.
- `SplitArchiveFormat`, `DetectedPart`, `detect_from_filename`
  demoted to `pub(crate)`: they were exported but only consumed
  inside the module. Reduces the public API surface.
@github-actions github-actions Bot added documentation Improvements or additions to documentation rust frontend labels May 3, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7e775721-7c0f-4d90-92e0-a410f6a4666a

📥 Commits

Reviewing files that changed from the base of the PR and between 8c11130 and 144c2ba.

📒 Files selected for processing (1)
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • CHANGELOG.md

📝 Walkthrough

Walkthrough

Adds a SplitArchive grouper service, command, and Tauri IPC to cluster resolved archive parts into idempotent Packages, detect missing numbered parts and emit DomainEvent::SplitArchiveIncomplete; adds a shared process-wide grouper lock and narrows lock scope to lookup+save before publishing events.

Changes

Split-Archive Auto-Grouping Feature

Layer / File(s) Summary
Domain Event
src-tauri/src/domain/event.rs
Adds DomainEvent::SplitArchiveIncomplete { package_id, base_name, missing_parts }.
Shared Grouping Lock
src-tauri/src/application/services/group_lock.rs
New module exposing acquire_grouper_lock() via a process-wide OnceLock<Mutex<()>> with poisoned-mutex recovery; lock scope documented to cover lookup+save only.
Service Index & Exports
src-tauri/src/application/services/mod.rs
Adds group_lock and split_archive_grouper modules; re-exports SplitArchiveGrouper, SplitArchiveLink, SplitArchiveGroupResult.
Split-Archive Data, Detection & Limits
src-tauri/src/application/services/split_archive_grouper.rs
Introduces SplitArchiveFormat, DetectedPart, SplitArchiveLink, SplitArchiveGroupResult, MAX_LINKS, MAX_PART_INDEX, detection matchers (modern/legacy RAR, 7z, zip, tarballs), legacy .rar header treated as part 0, and format-specific helpers.
Core Grouper Implementation
src-tauri/src/application/services/split_archive_grouper.rs
Implements SplitArchiveGrouper::new, group_all, group_one_base, compute_missing_parts; enforces MAX_LINKS, trims/skips empty filenames, clusters by (base, format), requires minimum distinct part numbers per cluster, computes missing-part suffixes, persists idempotently under the grouper lock with race-retry, publishes PackageCreated on new create, and publishes SplitArchiveIncomplete when gaps exist.
Grouper Tests
src-tauri/src/application/services/split_archive_grouper.rs
Extensive unit tests for detection, suffix rendering, clustering, distinct-part gating, idempotency, incomplete-event emission, legacy-header behavior, and MAX_LINKS rejection.
Playlist Grouper Integration
src-tauri/src/application/services/playlist_grouper.rs
Replaces local lock helpers with acquire_grouper_lock() and narrows lock scope to lookup+save; releases lock before publishing events.
Command Type & Handler
src-tauri/src/application/commands/mod.rs, src-tauri/src/application/commands/group_split_archives.rs
Adds GroupSplitArchivesCommand and CommandBus::handle_group_split_archives which builds the grouper, reads repo from context (validation error if missing), computes now_ms, invokes group_all, and includes handler unit tests.
Tauri IPC & DTOs
src-tauri/src/adapters/driving/tauri_ipc.rs, src-tauri/src/lib.rs
Adds link_group_split_archives IPC command and DTOs SplitArchiveLinkInputDto / SplitArchiveGroupResultDto; maps DTO↔domain, returns camelCase JSON, maps AppError::Validation messages to string responses, logs other failures, registers and re-exports the command.
Event Bridge & Logging
src-tauri/src/adapters/driven/event/tauri_bridge.rs, src-tauri/src/adapters/driven/logging/download_log_bridge.rs
Tauri bridge maps SplitArchiveIncomplete"split-archive-incomplete" and serializes packageId, baseName, missingParts; download logging bridge treats the event as ignored/no-op.
Frontend Types
src/types/media.ts
Adds SplitArchiveLinkInput and SplitArchiveGroupResult TypeScript interfaces to mirror IPC DTOs.
Changelog
CHANGELOG.md
Documents the auto-grouping feature and the new shared grouper lock and lock scoping behavior.

Sequence Diagram

sequenceDiagram
    actor Frontend
    participant IPC as "Tauri IPC\nlink_group_split_archives"
    participant Bus as "CommandBus\nhandle_group_split_archives"
    participant Grouper as "SplitArchiveGrouper\ngroup_all"
    participant Lock as "Shared Grouper Lock"
    participant Repo as "PackageRepository"
    participant EventBus as "EventBus"

    Frontend->>IPC: invoke link_group_split_archives(links)
    IPC->>Bus: GroupSplitArchivesCommand { links }
    Bus->>Grouper: group_all(links, now_ms)

    Grouper->>Grouper: detect_from_filename per link
    Grouper->>Grouper: cluster by (base, format)

    loop for each cluster
        Grouper->>Lock: acquire_grouper_lock()
        Lock->>Grouper: guard
        Grouper->>Repo: find_by_external_id(external_id)
        alt exists
            Repo-->>Grouper: Some(package)
        else not found
            Grouper->>Repo: save(package)
            alt save succeeds
                Repo-->>Grouper: saved package
            else save race/fails
                Grouper->>Repo: find_by_external_id(retry)
                Repo-->>Grouper: winner package
            end
        end
        Grouper->>Grouper: compute_missing_parts
        alt created
            Grouper->>EventBus: DomainEvent::PackageCreated
        end
        alt missing parts
            Grouper->>EventBus: DomainEvent::SplitArchiveIncomplete
        end
        Grouper->>Lock: release guard
    end

    Grouper-->>IPC: Vec<SplitArchiveGroupResult>
    IPC-->>Frontend: Vec<SplitArchiveGroupResultDto> (camelCase JSON)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related PRs

"🐰
I sniffed filenames beneath the moon,
I numbered tails and hummed a tune.
I lock my paw, then save with care,
Missing hops I call out fair—
Hop, grouped parts now safe and soon."

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR implements split-archive auto-grouping but issue #31 requires implementing missing keyboard shortcuts (Ctrl+F, Ctrl+N, Ctrl+A, Space, Delete, Ctrl+Shift+P, Escape). No keyboard shortcut changes are present in the changeset. Either rebase this PR to address only split-archive grouping without referencing issue #31, or add the missing keyboard shortcut implementations required by issue #31.
Out of Scope Changes check ⚠️ Warning The PR introduces split-archive grouping functionality, but this is out of scope for issue #31 which requires global keyboard shortcut implementations. The changeset does not match the linked issue's requirements. Remove the reference to issue #31 or ensure the PR contains the keyboard shortcut implementations specified in the issue.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(link): auto-group split archives into one package (task 31)' accurately summarizes the main change: a new split-archive auto-grouping feature.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/task-31-split-archives

Review rate limit: 3/5 reviews remaining, refill in 21 minutes and 40 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0b5dc387cd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// Match `name.r00`, `name.r01`, …. The trailing digits are 2+ wide so
// we don't accidentally pick up names that just happen to end in
// `.r1` (which would be an unusual archive convention anyway).
let re = RE.get_or_init(|| Regex::new(r"^(?P<base>.+?)\.r(?P<num>\d{2,})$").unwrap());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Detect legacy RAR header volume when grouping

Legacy multi-volume RAR sets commonly include the first volume as name.rar followed by name.r00, name.r01, etc., but this matcher only accepts \.r\d{2,} names, so the .rar header volume is never counted. In batches like movie.rar + movie.r00, the cluster ends up with a single detected part and is dropped by the MIN_PARTS_TO_GROUP check, so no package is created even though the archive is actually split. This breaks auto-grouping for a valid legacy format and can leave extraction flow ungrouped for users with older RAR naming.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chatgpt-codex-connector

Legacy multi-volume RAR sets commonly include the first volume as name.rar followed by name.r00, name.r01, etc., but this matcher only accepts \.r\d{2,} names, so the .rar header volume is never counted.

Caught it. Fixed in b642345. Added match_legacy_rar_header returning part 0 for terminal .rar (ordered after the more specific matchers in detect_from_filename so name.partNN.rar and name.rNN still win), and extended compute_missing_parts to start at 0 for LegacyRar so a missing header is reported as rar instead of being silently absent. New tests test_group_all_legacy_rar_includes_terminal_header, test_group_all_legacy_rar_reports_missing_header, test_group_all_drops_lone_legacy_rar_header cover the three cases.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src-tauri/src/application/services/playlist_grouper.rs (1)

103-151: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Shared grouper lock is held too long (includes event publishing).

The guard created on Line 103 stays alive until function return, so PackageCreated publish runs under the global lock. This broadens contention across both groupers and can block unrelated grouping calls.

💡 Suggested lock scoping (match split-archive grouper behavior)
-        let _guard = acquire_grouper_lock();
-
-        if let Some(existing) = self.repo.find_by_external_id(trimmed_id)? {
-            return Ok(PlaylistGroupResult {
-                package_id: existing.id().clone(),
-                package_name: existing.name().to_string(),
-                created: false,
-                item_count: group.item_count,
-            });
-        }
+        let (package_id, package_name, created) = {
+            let _guard = acquire_grouper_lock();
+
+            if let Some(existing) = self.repo.find_by_external_id(trimmed_id)? {
+                (
+                    existing.id().clone(),
+                    existing.name().to_string(),
+                    false,
+                )
+            } else {
+                let trimmed_name = group.playlist_name.trim();
+                let name = if trimmed_name.is_empty() {
+                    fallback_name()
+                } else {
+                    trimmed_name.to_string()
+                };
+
+                let package_id = PackageId::new(Uuid::new_v4().to_string());
+                let mut package = Package::new(
+                    package_id.clone(),
+                    name.clone(),
+                    PackageSourceType::Playlist,
+                    created_at_ms,
+                );
+                package.set_external_id(Some(trimmed_id.to_string()));
+
+                if let Err(save_err) = self.repo.save(&package) {
+                    if let Some(existing) = self.repo.find_by_external_id(trimmed_id)? {
+                        (
+                            existing.id().clone(),
+                            existing.name().to_string(),
+                            false,
+                        )
+                    } else {
+                        return Err(save_err.into());
+                    }
+                } else {
+                    (package_id, name, true)
+                }
+            }
+        };
@@
-        self.event_bus.publish(DomainEvent::PackageCreated {
-            id: package_id.clone(),
-            name: name.clone(),
-        });
+        if created {
+            self.event_bus.publish(DomainEvent::PackageCreated {
+                id: package_id.clone(),
+                name: package_name.clone(),
+            });
+        }
 
         Ok(PlaylistGroupResult {
             package_id,
-            package_name: name,
-            created: true,
+            package_name,
+            created,
             item_count: group.item_count,
         })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src-tauri/src/application/services/playlist_grouper.rs` around lines 103 -
151, The function holds the guard from acquire_grouper_lock() across the entire
save-and-return path so self.event_bus.publish(DomainEvent::PackageCreated { ...
}) runs while the global grouper lock is held; to fix, limit the lock scope by
dropping the guard before publishing the event (either wrap the
find/save/re-query logic in its own block or call std::mem::drop(_guard)
immediately after the successful save and re-query checks), ensuring you still
return the correct PlaylistGroupResult when reusing an existing package via
self.repo.find_by_external_id(...) and only publish the PackageCreated event
(via self.event_bus.publish) after the lock is released.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src-tauri/src/application/services/split_archive_grouper.rs`:
- Around line 258-277: The legacy RAR header file (`base.rar`) is never
recognized because match_legacy_rar only captures `.rNN` and
compute_missing_parts iterates parts from 1..=max; update detection so the
terminal `.rar` is treated as part 0 and included in completeness checks. Modify
match_legacy_rar (or add a similar detection path) to accept an optional
trailing `.rar` and return DetectedPart with part_num 0 when the filename ends
with `.rar` (keep existing behavior for `.rNN` mapping to raw_num+1), and ensure
compute_missing_parts (and any other logic referenced at lines ~424-437)
considers the 0..=max range when checking for missing parts so the `.rar` header
can be grouped and reported as missing when appropriate.

---

Outside diff comments:
In `@src-tauri/src/application/services/playlist_grouper.rs`:
- Around line 103-151: The function holds the guard from acquire_grouper_lock()
across the entire save-and-return path so
self.event_bus.publish(DomainEvent::PackageCreated { ... }) runs while the
global grouper lock is held; to fix, limit the lock scope by dropping the guard
before publishing the event (either wrap the find/save/re-query logic in its own
block or call std::mem::drop(_guard) immediately after the successful save and
re-query checks), ensuring you still return the correct PlaylistGroupResult when
reusing an existing package via self.repo.find_by_external_id(...) and only
publish the PackageCreated event (via self.event_bus.publish) after the lock is
released.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ce3630a9-6ee9-44c1-a9a8-19d0c1d0403c

📥 Commits

Reviewing files that changed from the base of the PR and between 9caa21b and 0b5dc38.

📒 Files selected for processing (13)
  • CHANGELOG.md
  • src-tauri/src/adapters/driven/event/tauri_bridge.rs
  • src-tauri/src/adapters/driven/logging/download_log_bridge.rs
  • src-tauri/src/adapters/driving/tauri_ipc.rs
  • src-tauri/src/application/commands/group_split_archives.rs
  • src-tauri/src/application/commands/mod.rs
  • src-tauri/src/application/services/group_lock.rs
  • src-tauri/src/application/services/mod.rs
  • src-tauri/src/application/services/playlist_grouper.rs
  • src-tauri/src/application/services/split_archive_grouper.rs
  • src-tauri/src/domain/event.rs
  • src-tauri/src/lib.rs
  • src/types/media.ts

Comment thread src-tauri/src/application/services/split_archive_grouper.rs
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 13 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src-tauri/src/application/services/split_archive_grouper.rs">

<violation number="1" location="src-tauri/src/application/services/split_archive_grouper.rs:263">
P1: Legacy RAR detection omits the terminal `.rar` header file, so valid sets like `name.rar` + `name.r00` are not grouped.</violation>
</file>

<file name="src-tauri/src/application/services/playlist_grouper.rs">

<violation number="1" location="src-tauri/src/application/services/playlist_grouper.rs:103">
P2: Release the shared grouper lock before publishing `PackageCreated`; it is currently held across `event_bus.publish`, which can block or deadlock re-entrant grouping paths.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread src-tauri/src/application/services/split_archive_grouper.rs Outdated
Comment thread src-tauri/src/application/services/playlist_grouper.rs Outdated
@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented May 3, 2026

Merging this PR will improve performance by 19.44%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 4 improved benchmarks
✅ 22 untouched benchmarks

Performance Changes

Benchmark BASE HEAD Efficiency
normalize_max_concurrent 179.2 ns 150 ns +19.44%
complex_with_port_and_path 3.1 µs 2.6 µs +18.01%
with_userinfo 3 µs 2.6 µs +15.02%
reject_invalid 581.1 ns 493.6 ns +17.73%

Comparing feat/task-31-split-archives (144c2ba) with main (9caa21b)

Open in CodSpeed

…blish

Legacy multi-volume RAR sets ship the first volume as `name.rar`
followed by `name.r00`, `name.r01`, … The previous matcher only
captured `.rNN`, so a valid `name.rar + name.r00` set produced a
single-part cluster and was dropped by MIN_PARTS_TO_GROUP. Add
`match_legacy_rar_header` (part 0), order it after the other matchers
so the more specific patterns still win, and extend
`compute_missing_parts` to walk from 0 for LegacyRar so a missing
header is reported as `rar` instead of silently dropped.

PlaylistGrouper used to hold the shared grouper lock across
`event_bus.publish`, which can block re-entrant grouping paths and
matches the bug already fixed in SplitArchiveGrouper. Scope the lock
to the find-then-save window and publish after the guard drops.

Addresses CodeRabbit, Codex, and Cubic review feedback on PR #144.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b642345201

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


let mut out = Vec::new();
for ((base, format), mut parts) in clusters {
if parts.len() < MIN_PARTS_TO_GROUP {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require distinct part numbers before grouping

MIN_PARTS_TO_GROUP is checked against raw link count, not unique part numbers. If a batch contains mirrors of the same file (for example two URLs for movie.part01.rar, or two backup.rar links), parts.len() reaches 2 and this path creates a split-archive package even though only one actual part is present. That violates the singleton guard and can produce misleading “complete” groups (missing_parts stays empty when the duplicated part is also the max). Deduplicate by part_num (or gate on distinct part count) before applying the minimum-parts check.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chatgpt-codex-connector

MIN_PARTS_TO_GROUP is checked against raw link count, not unique part numbers. If a batch contains mirrors of the same file [...] parts.len() reaches 2 and this path creates a split-archive package even though only one actual part is present.

Fixed in 8c11130. The cluster loop in group_all now collects part numbers into a HashSet and gates on distinct_parts.len() < MIN_PARTS_TO_GROUP, so two mirrors of movie.part01.rar no longer satisfy the threshold. Two new tests cover both directions: test_group_all_distinct_parts_required_for_min_threshold (mirrors → no package) and test_group_all_distinct_parts_threshold_groups_two_real_volumes (sanity check on the happy path).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src-tauri/src/application/services/split_archive_grouper.rs`:
- Around line 351-356: The check that gates creating a split-archive package
currently uses parts.len() which counts raw links (so duplicates/mirrors of the
same part_num inflate the count); change the gating in the cluster loop to count
distinct part numbers instead (e.g. compute the number of unique part_num values
from parts before continuing) and use that unique count to compare to
MIN_PARTS_TO_GROUP; ensure you still sort/handle parts for group_one_base as
before and call self.group_one_base(&base, format, &parts, created_at_ms) only
when the unique part_num count meets the threshold.
- Around line 203-206: Introduce a MAX_PART_INDEX constant (e.g. 10_000) and
enforce it when parsing part indices: in match_part_rar and match_legacy_rar
validate the parsed part_num (from caps.name("num")) is <= MAX_PART_INDEX and
reject otherwise; for legacy detection also ensure the regex/parse path still
rejects indices > MAX_PART_INDEX. In compute_missing_parts replace raw_num + 1
with checked_add(1) and bail/return an error or None if the add would overflow
or if any parsed index > MAX_PART_INDEX to avoid huge iteration ranges. Finally,
in group_all change the grouping threshold check to count distinct part numbers
(use a HashSet of part_num) rather than parts.len() so duplicate URLs for the
same part don’t falsely satisfy MIN_PARTS_TO_GROUP.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d13df129-f807-4066-9952-d5cc3dacac51

📥 Commits

Reviewing files that changed from the base of the PR and between 0b5dc38 and b642345.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • src-tauri/src/application/services/playlist_grouper.rs
  • src-tauri/src/application/services/split_archive_grouper.rs
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md

Comment thread src-tauri/src/application/services/split_archive_grouper.rs
Comment thread src-tauri/src/application/services/split_archive_grouper.rs
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src-tauri/src/application/services/split_archive_grouper.rs">

<violation number="1" location="src-tauri/src/application/services/split_archive_grouper.rs:195">
P2: Matching all `.rar` files as legacy split headers can misclassify regular single-volume RAR links as split archives when duplicates/mirrors are present.</violation>

<violation number="2" location="src-tauri/src/application/services/split_archive_grouper.rs:195">
P2: Require distinct part numbers before applying the minimum-parts gate; otherwise duplicate mirrors of the same detected part can create a split-archive package with only one actual volume.</violation>

<violation number="3" location="src-tauri/src/application/services/split_archive_grouper.rs:460">
P1: Add an upper bound for accepted part indices before gap scanning; iterating `start..=max` with attacker-controlled large part numbers can cause very large CPU loops and stall grouping.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread src-tauri/src/application/services/split_archive_grouper.rs
if let Some(part) = match_legacy_rar(file_name) {
return Some(part);
}
if let Some(part) = match_legacy_rar_header(file_name) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Matching all .rar files as legacy split headers can misclassify regular single-volume RAR links as split archives when duplicates/mirrors are present.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src-tauri/src/application/services/split_archive_grouper.rs, line 195:

<comment>Matching all `.rar` files as legacy split headers can misclassify regular single-volume RAR links as split archives when duplicates/mirrors are present.</comment>

<file context>
@@ -192,6 +192,9 @@ pub(crate) fn detect_from_filename(file_name: &str) -> Option<DetectedPart> {
     if let Some(part) = match_legacy_rar(file_name) {
         return Some(part);
     }
+    if let Some(part) = match_legacy_rar_header(file_name) {
+        return Some(part);
+    }
</file context>

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cubic-dev-ai

P2: Matching all .rar files as legacy split headers can misclassify regular single-volume RAR links as split archives when duplicates/mirrors are present.

Resolved indirectly in 8c11130. The misclassification path required the singleton-guard bypass (two mirrors of lonely.rar slipping through MIN_PARTS_TO_GROUP), which is now closed by the distinct-part_num gate: two mirrors of lonely.rar produce one distinct part (0) and fall below the threshold. Existing test test_group_all_drops_lone_legacy_rar_header still passes; the new test_group_all_distinct_parts_required_for_min_threshold proves the duplicate path is also closed. A standalone .rar (single link or mirrored) stays out of the package list.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I've saved this as a new learning to improve future reviews.

Comment thread src-tauri/src/application/services/split_archive_grouper.rs Outdated
Three converging review findings on PR #144:

1. `MIN_PARTS_TO_GROUP` was checked against raw link count, so two
   mirrors of `movie.part01.rar` slipped through the singleton guard
   and produced a misleading "complete" group with `missing_parts`
   empty. Replace `parts.len()` with the distinct `part_num` count.

2. The modern RAR (`\d+`) and legacy RAR (`\d{2,}`) regexes accepted
   unbounded suffixes, letting a hostile `name.part1000000000.rar`
   force `compute_missing_parts` into a multi-billion-step iteration.
   Add `MAX_PART_INDEX = 10_000` and reject anything above it inside
   `detect_from_filename` (after the matcher cascade so a rejected
   index does not silently cascade into `match_legacy_rar_header`).

3. Defensive `checked_add` on `raw_num + 1` in `match_legacy_rar` so
   the (now-impossible after the cap) `u32` overflow path is closed.

Addresses CodeRabbit, Codex, and Cubic feedback.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="CHANGELOG.md">

<violation number="1" location="CHANGELOG.md:12">
P3: The changelog test-count claim is inaccurate: split-archive service tests are 31, not 35.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread CHANGELOG.md Outdated
@mpiton
Copy link
Copy Markdown
Owner Author

mpiton commented May 3, 2026

@cubic-dev-ai on CHANGELOG.md:12

P3: The changelog test-count claim is inaccurate: split-archive service tests are 31, not 35.

Fixed in 144c2ba. Counted the #[test] attributes in split_archive_grouper.rs and confirmed 31, matching cargo test split_archive_grouper output (31 passed). The "35" in the previous entry was an arithmetic slip when summing the new tests on top of the prior count.

@cubic-dev-ai
Copy link
Copy Markdown

cubic-dev-ai Bot commented May 3, 2026

@cubic-dev-ai on CHANGELOG.md:12

P3: The changelog test-count claim is inaccurate: split-archive service tests are 31, not 35.

Fixed in 144c2ba. Counted the #[test] attributes in split_archive_grouper.rs and confirmed 31, matching cargo test split_archive_grouper output (31 passed). The "35" in the previous entry was an arithmetic slip when summing the new tests on top of the prior count.

@mpiton The correction in 144c2ba accurately reflects the state of the codebase. I suggested this change because keeping the test count synchronized ensures the CHANGELOG.md remains a reliable source of truth for the scope of the feature's verification.

Confirmed that src-tauri/src/application/services/split_archive_grouper.rs now contains exactly 31 #[test] definitions, covering the new edge cases added during the hardening pass:

  • MAX_PART_INDEX rejection logic.
  • Legacy RAR header (part 0) detection and missing-part reporting.
  • Distinct part_num gating for mirrors.
  • The MAX_LINKS DoS cap.

This alignment correctly documents the technical contract for the split-archive auto-grouping feature.

@mpiton mpiton merged commit 1c783c0 into main May 3, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend rust

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[QA] MEDIUM: Missing keyboard shortcuts — Ctrl+F, Ctrl+N, Ctrl+A, Space, Delete, Escape

1 participant