-
-
Notifications
You must be signed in to change notification settings - Fork 5
Content Validation
Every downloaded file is validated before being renamed from its temp suffix to the final path. Validation catches corrupted downloads, truncated transfers, and CDN error pages served as HTTP 200.
Known media types are validated by checking the first bytes of the file against expected signatures. A mismatch produces a warning (not an error) because format variants exist -- for example, classic QuickTime MOV files start with wide or mdat instead of ftyp.
| Extension(s) | Offset | Expected Bytes | Notes |
|---|---|---|---|
.jpg, .jpeg
|
0 | FF D8 |
JPEG SOI marker |
.png |
0 | 89 50 4E 47 |
\x89PNG |
.heic, .heif, .mov, .mp4, .m4v
|
4 | 66 74 79 70 |
ISO BMFF ftyp atom |
.gif |
0 | 47 49 46 38 |
GIF8 (covers GIF87a/GIF89a) |
.tiff, .tif
|
0 |
49 49 2A 00 or 4D 4D 00 2A
|
Little-endian (II) or big-endian (MM) TIFF |
.webp |
0 + 8 |
52 49 46 46 at 0, 57 45 42 50 at 8 |
RIFF container with WEBP chunk |
Files with unknown extensions skip magic byte checks entirely.
The ISO Base Media File Format family (HEIC, HEIF, MOV, MP4, M4V) is identified by the ftyp atom at offset 4. The first 4 bytes are the atom size (big-endian uint32), followed by the four-byte atom type. Common brand codes at offset 8:
| Brand | Format |
|---|---|
heic |
HEIC (HEIF with HEVC) |
mif1 |
HEIF |
qt |
QuickTime MOV |
isom |
ISO MP4 |
mp42 |
MP4 v2 |
M4V |
Apple M4V |
Classic QuickTime files may start with other atoms (wide, mdat, moov, free) instead of ftyp. These produce a magic byte mismatch warning but are accepted. This is expected for live photo companion MOV files from older iPhones.
Apple's CDN occasionally serves error or rate-limit pages as HTTP 200 with an HTML body. HTML content is always rejected as a hard error regardless of file extension. Detection checks the first non-whitespace bytes for:
-
<!(catches<!DOCTYPE html>) -
<html(case-insensitive)
HTML rejection is retryable -- the download is deleted and re-attempted on the next pass with a fresh CDN URL.
There is also an HTTP-level check: responses with Content-Type: text/html are rejected before any bytes are written to disk.
XML content (e.g., AAE sidecar files which are Apple plist XML) is not rejected. The
<?xmlprefix does not trigger HTML detection.
iCloud returns file types as Apple Uniform Type Identifiers (UTIs) in the asset_type field. kei maps these to file extensions when the original filename's extension doesn't match:
| UTI | Extension |
|---|---|
public.heic |
HEIC |
public.heif |
HEIF |
public.jpeg |
JPG |
public.png |
PNG |
com.apple.quicktime-movie |
MOV |
com.adobe.raw-image |
DNG |
com.canon.cr2-raw-image |
CR2 |
com.canon.crw-raw-image |
CRW |
com.canon.cr3-raw-image |
CR3 |
com.sony.arw-raw-image |
ARW |
com.fuji.raw-image |
RAF |
com.panasonic.rw2-raw-image |
RW2 |
com.nikon.nrw-raw-image |
NRF |
com.nikon.raw-image |
NEF |
com.pentax.raw-image |
PEF |
com.olympus.raw-image |
ORF |
com.olympus.or-raw-image |
ORF |
org.webmproject.webp |
WEBP |
Assets with unrecognized UTIs keep their original filename extension unchanged.
Before checksum comparison, the pipeline verifies that the number of bytes received matches the Content-Length header. This catches truncated downloads early and triggers an automatic retry, avoiding wasted time computing a SHA-256 on incomplete data.
Apple's fileChecksum field in the CloudKit API is an MMCS (MobileMe Chunked Storage) compound signature used for internal chunk routing -- it is not a SHA-1 or SHA-256 content hash. File integrity after download is verified by kei verify --checksums, which compares a locally-computed SHA-256 (stored at download time) against the file on disk.