Skip to content

feat(iso): concat Blu-ray main feature across clips and discs#599

Merged
javi11 merged 30 commits into
mainfrom
session/suspicious-torvalds-8598c3
May 30, 2026
Merged

feat(iso): concat Blu-ray main feature across clips and discs#599
javi11 merged 30 commits into
mainfrom
session/suspicious-torvalds-8598c3

Conversation

@javi11
Copy link
Copy Markdown
Owner

@javi11 javi11 commented May 20, 2026

Summary

  • Parse BDMV/PLAYLIST/*.mpls to identify the main feature playlist and its ordered M2TS clip list, replacing the heuristic that kept only the largest M2TS per ISO.
  • Detect multi-disc releases by stripping DISC|CD|PART_<n> suffixes from the ISO 9660 volume label (with a filename fallback), grouping discs that arrive in the same NZB.
  • Emit one Content per group whose NestedSources chain spans every M2TS in disc-then-playlist order, so the player sees a single seekable virtual .m2ts end-to-end via WebDAV / FUSE / Stremio.
  • Shared the new logic as archive.ExpandISOContents and removed the duplicated expandISOContents from rar and sevenzip.

Why

Long Blu-ray releases (the trigger here was AVATAR_FIRE_AND_ASH_DISC_1 / _DISC_2) split the main feature across both axes. The old picker dropped clips 2..N from each disc and treated each disc as an unrelated movie. The metadata layer's MetadataVirtualFile.createNestedReader already concatenates NestedSource chains with mixed encrypted/unencrypted members — we only needed to teach the importer to produce that ordered list.

Files

New:

  • internal/importer/archive/iso/mpls.go + _test.go — minimal BDA-spec MPLS parser (clip names, IN/OUT ticks, multi-angle PlayItems skipped via length prefix).
  • internal/importer/archive/iso/volume.go + _test.go — reads the 9660 PVD volume label from sector 16 (hybrid BD ISOs always carry one).
  • internal/importer/archive/iso/bluray.go + _test.go — locates BDMV/PLAYLIST/*.mpls, picks the longest playlist, resolves clip names to ordered BDMV/STREAM/*.M2TS entries.
  • internal/importer/archive/iso_expansion.go + _test.go — shared ExpandISOContents, disc-group regex, main-feature concat assembly.

Modified:

  • internal/importer/archive/iso/types.go — new AnalyzedISO struct (VolumeLabel, Files, MainFeature, DurationTicks).
  • internal/importer/archive/iso/processor.goAnalyzeISO replaces AnalyzeISOContent; encrypted/unencrypted file build paths factored.
  • internal/importer/archive/rar/aggregator.go and sevenzip/aggregator.go — call archive.ExpandISOContents; deleted the duplicated local implementations.

Behaviour

  • BDMV disc with one M2TS → concat = single clip (same bytes as before, but now via NestedSources).
  • BDMV disc with N clips in the main playlist → concat = N clips end-to-end (new).
  • Two ISOs whose labels share a stripped base name → one merged Content spanning every clip across both discs in disc-then-playlist order (new).
  • DVD VIDEO_TS / software disc / unparseable MPLS → falls back to the legacy "largest file" picker per ISO (no regression).
  • Mixed BDMV + non-BDMV in the same group → falls back to per-disc handling (defensive against false groupings).

Tests

  • go test -race ./... passes across the whole repo.
  • go tool golangci-lint run ./internal/importer/archive/... clean.
  • New unit tests cover: MPLS parsing edge cases (single / 5 PlayItems / multi-angle / wrong magic / truncated / bad offset), PVD volume label reading, main-playlist selection by duration, disc-group regex variants (DISC/CD/PART/letter), encrypted vs unencrypted NestedSource conversion, two-disc concat assembly.

Test plan

  • Import the actual AVATAR_FIRE_AND_ASH two-disc NZB; confirm a single virtual .m2ts appears at the library path with size ≈ sum of all main-feature M2TS across both discs.
  • Scrub through it in VLC / Stremio across the disc boundary; verify seeking around the disc-1 total size lands at the start of disc 2's first clip.
  • Re-import a standard single-disc BDMV release; confirm the main feature now plays end-to-end across clips that previously got dropped.
  • Re-import a non-BDMV ISO (e.g. plain MKV inside an ISO); confirm legacy behaviour unchanged.

Out of scope

  • Cross-NZB linking when disc 1 and disc 2 ship as separate posts (current scope: both discs in one NZB).
  • DVD VIDEO_TS / IFO playlist parsing.
  • M2TS container rewriting for perfectly seamless seeking across clip joins (players typically tolerate this).

javi11 added 30 commits May 20, 2026 19:51
Long Blu-ray releases split the main feature two ways: across multiple
M2TS clips within a disc (joined by BDMV/PLAYLIST/*.mpls), and across
multiple discs in one NZB (e.g. AVATAR_FIRE_AND_ASH_DISC_1 / _DISC_2).
The importer previously kept only the single largest M2TS per ISO,
which both dropped the rest of the movie within a disc and treated each
disc as an unrelated file.

Now ExpandISOContents (shared between rar and sevenzip aggregators)
parses the main MPLS playlist on each ISO, reads the 9660 PVD volume
label, groups ISOs by stripped base name with a DISC|CD|PART suffix
regex, and emits a single Content whose NestedSources chain spans every
M2TS in disc-then-playlist order. The metadata layer's existing nested
multi-reader produces one seamless seekable virtual file. Non-BDMV
discs and unparseable playlists fall back to the legacy largest-file
behaviour so nothing regresses.
On a 3D-only Blu-ray release (e.g. AVATAR_FIRE_AND_ASH_3D), the main
feature playlist references clips that exist only as SSIF files in
BDMV/STREAM/SSIF/ — the M2TS directory holds short extras. The previous
resolver indexed only M2TS, so the long 3D playlist failed to resolve
any clips and a short extras playlist won by default, producing a ~177
MB virtual file for a movie whose NZB carries ~88 GB of source data.

Resolve clip names against M2TS first (preserves the smaller, more
compatible 2D version on hybrid 3D releases) and fall back to SSIF when
only it can satisfy the playlist. Two new test cases cover the
3D-only-with-SSIF and hybrid-prefers-M2TS paths.
A repeated 88GB-NZB run is still producing a 177MB virtual file with
clips=2 — byte-identical to the pre-SSIF-fix output. Three hypotheses
remain: stale binary, 'no actual SSIF in this BDMV' (release uses M2TS
only), or SSIF lives at a non-standard path.

Add one summary log per ISO (total files, playlist count, M2TS and SSIF
clip counts, 12 sample paths) and one log per evaluated MPLS (resolved
clip count, unresolved count, duration ticks, summed stream bytes) plus
one 'picked' line. All prefixed with [DEBUG-isobd] for cheap cleanup
and to confirm the new binary is live (the prefix won't appear in
prior builds).
Real-ISO run shows all 38 playlists with items=1, max duration 80s,
max stream bytes 141MB — yet the NZB carries ~88GB across 2 ISOs.
Either ListISOFiles is dropping huge files (UDF alloc-type 2/3 not
handled) or reading wrong sizes for them. Add to the bdmv-scan log:
- sum of every file size (across all entries)
- sum of M2TS-only and SSIF-only sizes
- the 6 largest files with human-readable sizes

One log line will distinguish 'sizes truncated', 'big files missing',
and 'release is genuinely tiny'.
Real run shows all_files_sum_bytes=1.13 GiB across 295 files, biggest
single file 135 MiB. NZB is 88 GiB across 2 ISOs. Need to know whether
src.Size (claimed ISO bytes from the outer RAR archive) matches the
sum of what ListISOFiles enumerated, or whether the walker is missing
multi-GB files. One [DEBUG-isobd] iso analyse line per ISO now prints
filename, iso_size, listed_files, listed_sum, and coverage_pct so the
discrepancy is impossible to miss.
Root cause of the 'main feature M2TS files invisible' bug. udfReadDirEntries
parsed every File Identifier Descriptor in a directory but only ever read
the FIRST 2048-byte sector of each allocation descriptor's extent — even
when the extent's ad.length claimed it spanned many sectors. A Blu-ray
BDMV/STREAM/ directory with ~2500 FIDs (~30 KiB of FID data) lost every
entry past the first sector, including the multi-GB main-feature clips
00016/00017/00022/00023/00028/00029 and the corresponding SSIF files.

Local repro against AVATAR_FIRE_AND_ASH_3D_DISC_1.iso (37 GiB):
- Before: listed_files=298  sum=1.16 GiB  coverage=3.1%   (no clip >135 MiB)
- After:  listed_files=2523 sum=74 GiB                    (00022.m2ts=17 GiB ✓)

Fix factors readMetaExtent / readICBExtent helpers that walk every sector
of an extent until ad.length is exhausted. Both fail-soft on EOF so a
malformed image returns partial data rather than aborting the import.

The pre-existing TestUDFReadDirEntriesShortADClampsExtentLength was
pinning the BUGGY behaviour (it asserted the walker would truncate to one
sector); renamed to TestUDFReadDirEntriesTruncatedExtent and now asserts
the new contract: when an extent claims more sectors than the image
contains, the walker returns whatever data it could read without an error.

Adds fs_local_test.go: an ALTMOUNT_LOCAL_ISO=<path> gated integration test
that catches this class of bug instantly against a real ISO. Skipped in CI.

Also strips the [DEBUG-isobd] / [DEBUG-walk] instrumentation added during
the investigation and tones the resolver / processor logs down to one
production-grade INFO line per ISO and per main-feature pick.
The directory-listing fix exposed a second latent bug downstream: the
walker only stored ONE allocation descriptor's LBA per file even though
huge Blu-ray clips are split across hundreds of extents (Avatar's
00022.m2ts: 945, 00023.m2ts: 945, 00028.m2ts: 294, 00016.m2ts: 238).
For every multi-extent file, downstream reads of bytes past the first
extent's length returned wrong sectors (whatever happened to live next
to extent 1 on disc) instead of the file's real data — silent
corruption ~50× the size of the visible bug.

Changes:
- isoFileEntry now carries []isoExtent instead of a single lba field.
- collectFileExtents() walks every inline AD and chases Allocation
  Extent Descriptor (UDF tag 258) chains so files with more ADs than
  fit in the FE sector are fully enumerated. Caps total extent bytes
  at info_length so a malformed FE can't yield more data than the
  file claims.
- ISOFileContent gains a Sources []ISONestedSource slice (one per
  extent) and drops the single-Segments / single-NestedSource fields.
- buildFileContent emits one ISONestedSource per extent: unencrypted
  ISOs pre-slice outer segments to cover each extent; encrypted ISOs
  keep the full outer segments and seek via InnerOffset (AES-CBC IV
  chain still anchors at byte 0 of the outer ISO).
- archive.isoFileContentToNestedSource → isoFileContentToNestedSources
  fans the slice out into one archive.NestedSource per extent.
- buildMainFeatureContent and buildLargestFileContent thread the
  multi-source path so the final concat Content carries every extent
  of every clip in disc-then-playlist order.

Verified against the real Avatar disc 1 ISO via fs_local_test.go:
00022.m2ts: 945 extents, sum-of-extent-lengths == 17 GiB info_length.
TestLocalISO_DiscoverBigFiles asserts >=2 extents and full coverage
for the sentinel big-clip set.
A BD3D SSIF often emits a dozen separate UDF allocation descriptors for
what's a single contiguous run of sectors on disc. After the multi-
extent fix, each AD became its own NestedSource — bloating the proto
metadata, the validation-sample surface, and the per-file open-handle
count for what is logically one extent.

coalesceExtents merges adjacent extents whose physical sectors follow
the previous extent's last sector. Measured against the real Avatar
disc 1 ISO:
- BDMV/STREAM/SSIF/00022.ssif (22 GiB): 23 extents -> 2
- BDMV/STREAM/SSIF/00028.ssif  (7 GiB):  7 extents -> 1
- BDMV/STREAM/SSIF/00016.ssif  (6 GiB):  6 extents -> 1
M2TS files keep their full extent list because BD authoring genuinely
interleaves the M2TS clips with the SSIF dependent-view data on disc.

Note: the recent import failure ("not a valid ISO 9660 or UDF image"
on disc 1, segment "44c89668..." unreachable during validation) is a
Usenet-side issue — disc 2 analysed cleanly in 30 seconds with the
same code path; disc 1 timed out reading its first sectors for 9
minutes before giving up. The coalescing change reduces the surface
where transient flakes can bite but cannot eliminate it.
Extract the Content -> FileMetadata mapping body (previously duplicated
in rar.CreateFileMetadataFromRarContent and
sevenzip.CreateFileMetadataFromSevenZipContent) into a shared package-level
function archive.NewFileMetadataFromContent.

Both processor methods now delegate to the shared function so the Processor
interfaces and all existing callers (aggregator.go, test mocks) keep
working unchanged. Behaviour is byte-for-byte preserved: same Status
default, same AES handling, same NestedSegmentSource copy loop.

This prepares Task 3 (ISO expansion) to persist FileMetadata for files
discovered inside bare ISOs without depending on the RAR or 7z packages.
The UDF walker previously had seven sites where it silently dropped a file
from its listing (continue/break with no log), making it impossible to
diagnose missing files like BDMV/STREAM/00022.m2ts on Avatar disc 1.

Thread context.Context through ListISOFiles -> udfWalkAll ->
udfReadDirEntries -> collectFileExtents and emit slog.WarnContext at every
silent drop site with the file path and a distinct reason. Behavior is
unchanged; only diagnostics are added.

A new in-memory test (TestUDFWalk_LogsWhenFileICBHasUnknownTag) drives the
"unexpected tag" branch and asserts a WARN line is emitted with the file
path and bogus tag id.
Today udfWalkAll has no ctx.Err() check between files, so cancellation
only surfaces when the next sector read times out at the NNTP layer.
On a degraded network this can stretch a normal ~16ms/file walk into
minutes per ISO. Same for the AED-chain loop in collectFileExtents.

Add a ctx.Err() check at the top of each loop:

  - udfWalkAll: returns the partial result + the cancellation error
    immediately. iso_expansion.go already treats any error from the
    walk as 'keep ISO as-is', so no caller change needed.

  - collectFileExtents: returns []isoExtent (no error), so emit a
    WARN in the existing 'AED chain truncated' style and break out
    of the chain cleanly with whatever extents we have.

New test TestUDFWalk_StopsWhenContextCanceled builds a 3-FID synthetic
UDF blob, cancels the ctx before calling the walker, and asserts that
udfWalkAll returns context.Canceled within 100ms with an empty result
(i.e. no file ICB was read past the cancel point).
A degraded NNTP provider could stall iso.AnalyzeISO for 9+ minutes per
disc, blocking the whole importer. Wrap each AnalyzeISO call in a hard
context.WithTimeout (default 120s, knob: Import.IsoAnalyzeTimeoutSeconds)
so the existing fallback at iso_expansion.go takes over within a bounded
window instead of waiting indefinitely.
…emux correctly

The streaming remux disabled itself on any unaligned start because it probed
for the packet sync at byte 0/4 of the read offset. ffprobe seeks to a
non-packet-aligned near-EOF offset to estimate duration, so the tail was served
raw and the duration stayed wrong. Derive packet framing from the known clip
byte grid (BDAV-192) instead of probing; pass leading mid-packet payload bytes
through and rewrite full packets from the next boundary. Adds unaligned-start
determinism coverage that reproduced the bug.
ISO analysis (filesystem walk + Blu-ray playlist resolution over NNTP) can
take tens of seconds, during which the queue item's progress bar sat frozen
and—for RAR/7z-wrapped ISOs—mislabeled as "Analyzing archive".

Thread a progress.Tracker end-to-end through the ISO analysis chain so the
bar advances with an "Analyzing ISO" stage:

- progress.Tracker gains a nil-safe Slice(idx,count) helper that carves a
  child tracker covering one Nth of the parent's range.
- ExpandISOContents/AnalyzeISO/ResolveMainFeature accept a tracker;
  ResolveMainFeature reports per-playlist progress (each .mpls is an NNTP
  round-trip), ExpandISOContents gives each ISO its slice of the band.
- Bare ISOs (processor.go) get a dedicated 10->30 tracker.
- RAR/7z aggregators derive an "Analyzing ISO" tracker from the archive
  tracker via Slice(0,1).WithStage without mutating it; archives with no
  ISO emit no updates, so the common case is unchanged.
@javi11 javi11 merged commit f24a3dd into main May 30, 2026
2 checks passed
@javi11 javi11 deleted the session/suspicious-torvalds-8598c3 branch May 30, 2026 16:42
javi11 added a commit that referenced this pull request May 31, 2026
* feat(iso): concat Blu-ray main feature across clips and discs (#599)

* perf(iso): coalesce Blu-ray playlist reads into sequential runs

ResolveMainFeature read every .mpls in BDMV/PLAYLIST/ one file at a time
via readISOFile (Seek + ReadFull per extent). The backing DecryptingFile
tears down its NNTP reader on every Seek and rebuilds a fresh UsenetReader
+ download manager on the next Read, with no segment cache. Since playlist
files never end on sector boundaries, each .mpls paid a full reader
teardown + fresh NNTP fetch, re-fetching the same clustered segments once
per file (each fetch 1-5s on Usenet).

Add readPlaylistsCoalesced: flatten all playlist extents, sort by disc
offset, group contiguous neighbours into runs (split on gaps > 4 MiB or
runs > 64 MiB), read each run with a single Seek + ReadFull, then
reconstruct each playlist's bytes from the run buffers — byte-identical to
readISOFile's multi-extent concat. A real PLAYLIST directory collapses to
one sequential read the UsenetReader can prefetch across.

Tests: TestReadPlaylistsCoalesced covers single/multi-run, multi-extent
order, overlaps, zero-extent, read-error isolation, a differential test
proving equivalence to readISOFile, and a seek-count test pinning the
one-Seek-per-run property.

* fix(parser): avoid shadowing ctx with WithTimeout in fetchAllFirstSegments

Renamed the derived context variable from ctx to c to prevent shadowing
the parent context, making the timeout scope explicit and avoiding
potential misuse of the already-cancelled context in surrounding code.
yoshitaka420 pushed a commit to yoshitaka420/altmount that referenced this pull request Jun 1, 2026
yoshitaka420 pushed a commit to yoshitaka420/altmount that referenced this pull request Jun 1, 2026
…11#630)

* feat(iso): concat Blu-ray main feature across clips and discs (javi11#599)

* perf(iso): coalesce Blu-ray playlist reads into sequential runs

ResolveMainFeature read every .mpls in BDMV/PLAYLIST/ one file at a time
via readISOFile (Seek + ReadFull per extent). The backing DecryptingFile
tears down its NNTP reader on every Seek and rebuilds a fresh UsenetReader
+ download manager on the next Read, with no segment cache. Since playlist
files never end on sector boundaries, each .mpls paid a full reader
teardown + fresh NNTP fetch, re-fetching the same clustered segments once
per file (each fetch 1-5s on Usenet).

Add readPlaylistsCoalesced: flatten all playlist extents, sort by disc
offset, group contiguous neighbours into runs (split on gaps > 4 MiB or
runs > 64 MiB), read each run with a single Seek + ReadFull, then
reconstruct each playlist's bytes from the run buffers — byte-identical to
readISOFile's multi-extent concat. A real PLAYLIST directory collapses to
one sequential read the UsenetReader can prefetch across.

Tests: TestReadPlaylistsCoalesced covers single/multi-run, multi-extent
order, overlaps, zero-extent, read-error isolation, a differential test
proving equivalence to readISOFile, and a seek-count test pinning the
one-Seek-per-run property.

* fix(parser): avoid shadowing ctx with WithTimeout in fetchAllFirstSegments

Renamed the derived context variable from ctx to c to prevent shadowing
the parent context, making the timeout scope explicit and avoiding
potential misuse of the already-cancelled context in surrounding code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant