-
-
Notifications
You must be signed in to change notification settings - Fork 5
Content Validation
Every downloaded file is validated before being renamed from its temp suffix to the final path. Validation catches corrupted downloads, truncated transfers, and CDN error pages served as HTTP 200.
Known media types are validated by checking the first bytes of the file against expected signatures. A mismatch produces a warning (not an error) -- the file is still saved to disk. Hard rejection only happens for HTML / JSON error pages from Apple's CDN (see CDN Error-Page Rejection below).
| Extension(s) | Offset | Accepted bytes | ASCII / notes |
|---|---|---|---|
.jpg, .jpeg
|
0 | FF D8 |
JPEG SOI marker |
.png |
0 | 89 50 4E 47 |
\x89PNG |
.gif |
0 | 47 49 46 38 |
GIF8 (covers both GIF87a and GIF89a) |
.tiff, .tif, .dng
|
0 |
49 49 2A 00 or 4D 4D 00 2A
|
Little-endian (II*\0) or big-endian (MM\0*) TIFF |
.webp |
0 and 8 |
52 49 46 46 at 0, 57 45 42 50 at 8 |
RIFF container with WEBP chunk; bytes 4-7 are the file size |
.heic, .heif
|
4 | 66 74 79 70 |
ISO-BMFF ftyp box (strict) |
.mp4, .m4v
|
4 | 66 74 79 70 |
ISO-BMFF ftyp box (strict) |
.mov |
4 |
ftyp, wide, mdat, moov, free, skip, pnot
|
Any QuickTime top-level atom (see below) |
Files with unknown extensions skip the magic-byte check entirely and are written as-is.
Modern Apple formats (HEIC, HEIF, MP4, M4V) are defined by the ISO Base Media File Format spec and must start with an ftyp box at offset 4. A non-ftyp first box on those extensions is genuinely anomalous and emits a warning.
.mov is looser. The QuickTime file format predates ISO-BMFF and allows any top-level atom first. Apple's Photos pipeline routinely serves live-photo companion videos and HEVC captures in classic QuickTime form, where the first atom is padding (wide) rather than ftyp. kei accepts the full set of documented top-level atoms without warning:
| Atom | Purpose |
|---|---|
ftyp |
Modern ISO-BMFF file-type box |
wide |
8-byte padding atom, typically placed before mdat for 64-bit offsets |
mdat |
Media data (sample bytes) |
moov |
Movie header / metadata |
free |
Unused space |
skip |
Alias for free
|
pnot |
Preview resource (rare, older QuickTime) |
For files that do start with ftyp, the four-byte compatible brand at offset 8 identifies the variant:
| Brand | Format |
|---|---|
heic |
HEIC (HEIF with HEVC) |
heix |
HEIC variant |
mif1 |
HEIF still image |
msf1 |
HEIF image sequence |
hevc |
HEVC video |
qt |
QuickTime MOV |
isom |
ISO MP4 base |
mp42 |
MP4 v2 |
M4V |
Apple M4V |
kei does not check the brand; the ftyp box name alone is sufficient to pass validation.
Apple's CDN occasionally serves error or rate-limit pages as HTTP 200 with an HTML body. HTML content is always rejected as a hard error regardless of file extension. Detection checks the first non-whitespace bytes for:
-
<!(catches<!DOCTYPE html>) -
<html(case-insensitive)
HTML rejection is retryable -- the download is deleted and re-attempted on the next pass with a fresh CDN URL.
There is also an HTTP-level check: responses with Content-Type: text/html are rejected before any bytes are written to disk.
JSON CDN error responses are rejected the same way. If the response says Content-Type: application/json or the first bytes look like an Apple JSON error object, kei treats it as a retryable download failure before publishing a file. Unknown or valid media bytes are still saved when size checks pass.
XML content (e.g., AAE sidecar files which are Apple plist XML) is not rejected. The
<?xmlprefix does not trigger HTML detection.
iCloud returns file types as Apple Uniform Type Identifiers (UTIs) in the asset_type field. kei maps these to file extensions when the original filename's extension doesn't match:
| UTI | Extension |
|---|---|
public.heic |
HEIC |
public.heif |
HEIF |
public.jpeg |
JPG |
public.png |
PNG |
com.apple.quicktime-movie |
MOV |
com.adobe.raw-image |
DNG |
com.canon.cr2-raw-image |
CR2 |
com.canon.crw-raw-image |
CRW |
com.canon.cr3-raw-image |
CR3 |
com.sony.arw-raw-image |
ARW |
com.fuji.raw-image |
RAF |
com.panasonic.rw2-raw-image |
RW2 |
com.nikon.nrw-raw-image |
NRF |
com.nikon.raw-image |
NEF |
com.pentax.raw-image |
PEF |
com.olympus.raw-image |
ORF |
com.olympus.or-raw-image |
ORF |
org.webmproject.webp |
WEBP |
Assets with unrecognized UTIs keep their original filename extension unchanged.
Before checksum comparison, the pipeline verifies that the number of bytes received matches the Content-Length header. This catches truncated downloads early and triggers an automatic retry, avoiding wasted time computing a SHA-256 on incomplete data.
For resumed downloads, kei also validates Content-Range before appending to an existing .kei-tmp file. If the server sends a range that does not match the existing byte count, kei discards the partial file and starts over instead of appending mismatched bytes.
Apple's fileChecksum field in the CloudKit API is an MMCS (MobileMe Chunked Storage) compound signature used for internal chunk routing -- it is not a SHA-1 or SHA-256 content hash. File integrity after download is verified by kei verify --checksums, which compares a locally-computed SHA-256 (stored at download time) against the file on disk.