Skip to content

Content Validation

rhoopr edited this page Jun 1, 2026 · 4 revisions

Content Validation

Every downloaded file is validated before being renamed from its temp suffix to the final path. Validation catches corrupted downloads, truncated transfers, and CDN error pages served as HTTP 200.

Magic Byte Signatures

Known media types are validated by checking the first bytes of the file against expected signatures. A mismatch produces a warning (not an error) -- the file is still saved to disk. Hard rejection only happens for HTML / JSON error pages from Apple's CDN (see CDN Error-Page Rejection below).

Extension(s) Offset Accepted bytes ASCII / notes
.jpg, .jpeg 0 FF D8 JPEG SOI marker
.png 0 89 50 4E 47 \x89PNG
.gif 0 47 49 46 38 GIF8 (covers both GIF87a and GIF89a)
.tiff, .tif, .dng 0 49 49 2A 00 or 4D 4D 00 2A Little-endian (II*\0) or big-endian (MM\0*) TIFF
.webp 0 and 8 52 49 46 46 at 0, 57 45 42 50 at 8 RIFF container with WEBP chunk; bytes 4-7 are the file size
.heic, .heif 4 66 74 79 70 ISO-BMFF ftyp box (strict)
.mp4, .m4v 4 66 74 79 70 ISO-BMFF ftyp box (strict)
.mov 4 ftyp, wide, mdat, moov, free, skip, pnot Any QuickTime top-level atom (see below)

Files with unknown extensions skip the magic-byte check entirely and are written as-is.

ISO-BMFF versus classic QuickTime

Modern Apple formats (HEIC, HEIF, MP4, M4V) are defined by the ISO Base Media File Format spec and must start with an ftyp box at offset 4. A non-ftyp first box on those extensions is genuinely anomalous and emits a warning.

.mov is looser. The QuickTime file format predates ISO-BMFF and allows any top-level atom first. Apple's Photos pipeline routinely serves live-photo companion videos and HEVC captures in classic QuickTime form, where the first atom is padding (wide) rather than ftyp. kei accepts the full set of documented top-level atoms without warning:

Atom Purpose
ftyp Modern ISO-BMFF file-type box
wide 8-byte padding atom, typically placed before mdat for 64-bit offsets
mdat Media data (sample bytes)
moov Movie header / metadata
free Unused space
skip Alias for free
pnot Preview resource (rare, older QuickTime)

For files that do start with ftyp, the four-byte compatible brand at offset 8 identifies the variant:

Brand Format
heic HEIC (HEIF with HEVC)
heix HEIC variant
mif1 HEIF still image
msf1 HEIF image sequence
hevc HEVC video
qt QuickTime MOV
isom ISO MP4 base
mp42 MP4 v2
M4V Apple M4V

kei does not check the brand; the ftyp box name alone is sufficient to pass validation.

CDN Error-Page Rejection

Apple's CDN occasionally serves error or rate-limit pages as HTTP 200 with an HTML body. HTML content is always rejected as a hard error regardless of file extension. Detection checks the first non-whitespace bytes for:

  • <! (catches <!DOCTYPE html>)
  • <html (case-insensitive)

HTML rejection is retryable -- the download is deleted and re-attempted on the next pass with a fresh CDN URL.

There is also an HTTP-level check: responses with Content-Type: text/html are rejected before any bytes are written to disk.

JSON CDN error responses are rejected the same way. If the response says Content-Type: application/json or the first bytes look like an Apple JSON error object, kei treats it as a retryable download failure before publishing a file. Unknown or valid media bytes are still saved when size checks pass.

XML content (e.g., AAE sidecar files which are Apple plist XML) is not rejected. The <?xml prefix does not trigger HTML detection.

Apple UTI to Extension Mapping

iCloud returns file types as Apple Uniform Type Identifiers (UTIs) in the asset_type field. kei maps these to file extensions when the original filename's extension doesn't match:

UTI Extension
public.heic HEIC
public.heif HEIF
public.jpeg JPG
public.png PNG
com.apple.quicktime-movie MOV
com.adobe.raw-image DNG
com.canon.cr2-raw-image CR2
com.canon.crw-raw-image CRW
com.canon.cr3-raw-image CR3
com.sony.arw-raw-image ARW
com.fuji.raw-image RAF
com.panasonic.rw2-raw-image RW2
com.nikon.nrw-raw-image NRF
com.nikon.raw-image NEF
com.pentax.raw-image PEF
com.olympus.raw-image ORF
com.olympus.or-raw-image ORF
org.webmproject.webp WEBP

Assets with unrecognized UTIs keep their original filename extension unchanged.

Content-Length Verification

Before checksum comparison, the pipeline verifies that the number of bytes received matches the Content-Length header. This catches truncated downloads early and triggers an automatic retry, avoiding wasted time computing a SHA-256 on incomplete data.

For resumed downloads, kei also validates Content-Range before appending to an existing .kei-tmp file. If the server sends a range that does not match the existing byte count, kei discards the partial file and starts over instead of appending mismatched bytes.

Checksum Note

Apple's fileChecksum field in the CloudKit API is an MMCS (MobileMe Chunked Storage) compound signature used for internal chunk routing -- it is not a SHA-1 or SHA-256 content hash. File integrity after download is verified by kei verify --checksums, which compares a locally-computed SHA-256 (stored at download time) against the file on disk.

Commands

Getting Started

Features

Clone this wiki locally