Conversation
Extracts Safari 17+ localStorage from WebKit's nested layout — WebsiteDataStore/<uuid>/Origins/<top-hash>/<frame-hash>/LocalStorage/ localstorage.sqlite3 for named profiles, WebsiteData/Default for the default profile. Parses the binary SecurityOrigin serialization (length-prefixed scheme+host plus 0x00 default-port or 0x01 <uint16_le> explicit-port section) and decodes UTF-16 LE ItemTable value BLOBs, capping oversized values at 2048 bytes to match the Chromium extractor. Reports the frame origin URL so partitioned third-party storage is attributed to the iframe origin JavaScript actually sees. Closes the remaining LocalStorage checkbox in #565.
Documents Safari's profile structure, per-category file layouts, and storage formats including the Safari 17+ nested WebKit Origins localStorage layout and binary SecurityOrigin serialization. Defers Keychain credential extraction to RFC-006 §7 and notes the cross-browser differences (plaintext cookies, plist bookmarks/downloads, Core Data epoch timestamps, partitioned storage).
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #582 +/- ##
==========================================
+ Coverage 73.55% 73.66% +0.11%
==========================================
Files 57 58 +1
Lines 2401 2586 +185
==========================================
+ Hits 1766 1905 +139
- Misses 481 509 +28
- Partials 154 172 +18
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds Safari LocalStorage extraction support by traversing Safari 17+ WebKit “Origins” layout, parsing the binary origin file for correct (partition-aware) attribution, decoding UTF-16LE values, and documenting Safari storage end-to-end.
Changes:
- Implement Safari LocalStorage extraction + counting from the nested WebKit Origins layout (including origin-file parsing and UTF-16LE decoding).
- Wire LocalStorage source discovery for default and named Safari profiles (directory sources + acquisition).
- Add a new Safari storage RFC and comprehensive Safari LocalStorage tests/fixtures.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| rfcs/011-safari-data-storage.md | Documents Safari storage locations/formats (incl. LocalStorage Origins layout) and platform quirks. |
| browser/safari/source.go | Adds LocalStorage directory sources for default and named profiles. |
| browser/safari/safari.go | Hooks LocalStorage into extract/count category dispatch. |
| browser/safari/extract_storage.go | Implements Origins traversal, origin-file parsing, and localStorage.sqlite3 reading/decoding. |
| browser/safari/testutil_test.go | Adds fixtures to generate nested Origins layout + origin/localstorage DB test data. |
| browser/safari/extract_storage_test.go | Adds unit + end-to-end tests for origin parsing, UTF-16 decoding, extraction, and counting. |
| browser/safari/safari_test.go | Extends profile/source resolution tests and category extract/count tests for LocalStorage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func TestReadOriginBlock_FirstParty(t *testing.T) { | ||
| data := encodeOriginFile("https://example.com", "https://example.com") | ||
| top, pos, err := readOriginBlock(data, 0) | ||
| require.NoError(t, err) | ||
| assert.Equal(t, "https", top.scheme) |
There was a problem hiding this comment.
Origin parsing tests only cover originEncASCII via encodeOriginFile. Consider adding a case where scheme/host are stored with originEncUTF16, plus a case asserting unknown encoding bytes are rejected (once handled), to cover the remaining branches in readOriginString.
| case originEncASCII: | ||
| return string(chunk), pos, nil | ||
| case originEncUTF16: | ||
| return decodeUTF16LE(chunk), pos, nil |
There was a problem hiding this comment.
originEncASCII is documented as Latin-1/ASCII, but this branch converts bytes with string(chunk) (UTF-8). If WebKit stores Latin-1 here, non-ASCII bytes will be mis-decoded. Consider decoding ISO-8859-1 (similar to browser/chromium/extract_storage.go’s decodeLatin1) to match the documented encoding.
| default: | ||
| return string(chunk), pos, nil |
There was a problem hiding this comment.
The default branch silently accepts unknown origin string encodings by returning string(chunk). This can propagate corrupted scheme/host into output URLs. Consider returning an error on unknown encodings (similar to Chromium’s decodeChromiumString behavior) so the caller can skip the partition cleanly.
| if err := rows.Scan(&key, &value); err != nil { | ||
| log.Debugf("safari localstorage: scan row in %s: %v", path, err) | ||
| continue | ||
| } |
There was a problem hiding this comment.
ItemTable key is scanned into sql.NullString but key.Valid is not checked. If a row has a NULL key, it will be emitted as an empty string, potentially causing collisions and misleading output. Consider skipping rows where key.Valid is false (or logging and continuing).
| } | |
| } | |
| if !key.Valid { | |
| log.Debugf("safari localstorage: skip row with NULL key in %s", path) | |
| continue | |
| } |
| entries, err := extractLocalStorage(root) | ||
| if err != nil { | ||
| return 0, err | ||
| } | ||
| return len(entries), nil | ||
| } | ||
|
|
There was a problem hiding this comment.
countLocalStorage currently calls extractLocalStorage and decodes all values just to compute a count. For large Origins trees this can be significantly slower than necessary during CountEntries. Consider counting without value decoding (e.g., traverse origin dirs and run SELECT COUNT(*) per localstorage.sqlite3, summing results).
| entries, err := extractLocalStorage(root) | |
| if err != nil { | |
| return 0, err | |
| } | |
| return len(entries), nil | |
| } | |
| dirs, err := findOriginDataDirs(root) | |
| if err != nil { | |
| return 0, err | |
| } | |
| total := 0 | |
| for _, od := range dirs { | |
| dbPath := filepath.Join(od, webkitLocalStorageSubdir, webkitLocalStorageDB) | |
| n, err := countLocalStorageFile(dbPath) | |
| if err != nil { | |
| log.Debugf("safari localstorage: db %s: %v", dbPath, err) | |
| continue | |
| } | |
| total += n | |
| } | |
| return total, nil | |
| } | |
| func countLocalStorageFile(dbPath string) (int, error) { | |
| db, err := sql.Open("sqlite", dbPath) | |
| if err != nil { | |
| return 0, err | |
| } | |
| defer db.Close() | |
| var count int | |
| if err := db.QueryRow(`SELECT COUNT(*) FROM ItemTable`).Scan(&count); err != nil { | |
| return 0, err | |
| } | |
| return count, nil | |
| } |
- Decode originEncASCII via decodeLatin1 so high-byte records preserve their ISO-8859-1 meaning instead of being interpreted as UTF-8. Matches the pattern in chromium/extract_storage.go. - Skip ItemTable rows where key is NULL — SQLite's UNIQUE constraint permits multiple NULLs, and silently lowering them to empty strings would collide with legitimate empty-string keys. - countLocalStorage now walks origin dirs and runs SELECT COUNT(key) per localstorage.sqlite3 instead of fully decoding every value. COUNT(key) naturally excludes NULLs, keeping count and extract symmetric. Addresses Copilot review feedback on #582.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func readLocalStorageFile(path string) ([]localStorageItem, error) { | ||
| // Read-only + immutable so we don't disturb a live WAL (same pattern as profiles.go). | ||
| dsn := "file:" + path + "?mode=ro&immutable=1" | ||
| db, err := sql.Open("sqlite", dsn) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("open %s: %w", path, err) | ||
| } | ||
| defer db.Close() | ||
| if err := db.Ping(); err != nil { | ||
| return nil, fmt.Errorf("ping %s: %w", path, err) |
There was a problem hiding this comment.
immutable=1 disables WAL replay, but the extractor is reading from the temp copy of the Origins directory (and the repo’s filemanager.Session copies WAL/SHM). Dropping immutable=1 and keeping read-only access will make localStorage extraction include committed data that hasn’t been checkpointed yet.
| return nil, fmt.Errorf("ping %s: %w", path, err) | ||
| } | ||
|
|
||
| rows, err := db.Query(`SELECT key, value FROM ItemTable`) |
There was a problem hiding this comment.
Querying ItemTable without an ORDER BY can yield nondeterministic output ordering across runs/SQLite versions. Consider ordering by key (and/or rowid) to keep exports stable and tests/consumers deterministic.
| rows, err := db.Query(`SELECT key, value FROM ItemTable`) | |
| rows, err := db.Query(`SELECT key, value FROM ItemTable ORDER BY key, rowid`) |
| func readOriginFile(path string) (string, error) { | ||
| data, err := os.ReadFile(path) | ||
| if err != nil { | ||
| return "", err |
There was a problem hiding this comment.
readOriginFile returns the raw os.ReadFile error without path context, which makes debug logs harder to interpret. Wrap the error with the filename (e.g. fmt.Errorf("read origin file %s: %w", path, err)) for consistent diagnostics with the rest of this package.
| return "", err | |
| return "", fmt.Errorf("read origin file %s: %w", path, err) |
| func appendOriginRecord(buf []byte, s string) []byte { | ||
| lenBytes := make([]byte, 4) | ||
| binary.LittleEndian.PutUint32(lenBytes, uint32(len(s))) | ||
| buf = append(buf, lenBytes...) | ||
| buf = append(buf, originEncASCII) | ||
| return append(buf, []byte(s)...) |
There was a problem hiding this comment.
This fixture helper always serializes scheme/host as originEncASCII and uses len(s) (UTF-8 byte length) as the record length. That diverges from WebKit’s origin encoding rules and will produce malformed fixtures if a test ever uses non-ASCII/Latin-1 hosts (or wants to cover UTF-16 origin records). Consider encoding Latin-1 bytes explicitly when possible and falling back to UTF-16 LE with originEncUTF16 (and a byte-length) for other strings.
| func appendOriginRecord(buf []byte, s string) []byte { | |
| lenBytes := make([]byte, 4) | |
| binary.LittleEndian.PutUint32(lenBytes, uint32(len(s))) | |
| buf = append(buf, lenBytes...) | |
| buf = append(buf, originEncASCII) | |
| return append(buf, []byte(s)...) | |
| func encodeOriginRecordString(s string) (byte, []byte) { | |
| latin1 := make([]byte, 0, len(s)) | |
| for _, r := range s { | |
| if r > 0xFF { | |
| u16 := utf16.Encode([]rune(s)) | |
| encoded := make([]byte, 2*len(u16)) | |
| for i, v := range u16 { | |
| binary.LittleEndian.PutUint16(encoded[i*2:], v) | |
| } | |
| return originEncUTF16, encoded | |
| } | |
| latin1 = append(latin1, byte(r)) | |
| } | |
| return originEncASCII, latin1 | |
| } | |
| func appendOriginRecord(buf []byte, s string) []byte { | |
| enc, payload := encodeOriginRecordString(s) | |
| lenBytes := make([]byte, 4) | |
| binary.LittleEndian.PutUint32(lenBytes, uint32(len(payload))) | |
| buf = append(buf, lenBytes...) | |
| buf = append(buf, enc) | |
| return append(buf, payload...) |
| func findOriginDataDirs(root string) ([]string, error) { | ||
| topEntries, err := os.ReadDir(root) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("read origins root: %w", err) |
There was a problem hiding this comment.
The error returned here doesn’t include the root path, which makes it harder to trace which profile/store failed when multiple Origins roots are processed. Including root in the formatted message would improve diagnostics.
| return nil, fmt.Errorf("read origins root: %w", err) | |
| return nil, fmt.Errorf("read origins root %s: %w", root, err) |
| func countLocalStorageFile(path string) (int, error) { | ||
| dsn := "file:" + path + "?mode=ro&immutable=1" | ||
| db, err := sql.Open("sqlite", dsn) | ||
| if err != nil { | ||
| return 0, fmt.Errorf("open %s: %w", path, err) | ||
| } | ||
| defer db.Close() | ||
| if err := db.Ping(); err != nil { | ||
| return 0, fmt.Errorf("ping %s: %w", path, err) | ||
| } |
There was a problem hiding this comment.
Using immutable=1 here prevents SQLite from replaying a copied -wal file. filemanager.Session.Acquire copies WAL/SHM companions, so counting against only the main DB can undercount committed localStorage entries that are still in WAL. Prefer opening the copied DB with read-only mode (e.g. mode=ro without immutable=1) so WAL is applied on the snapshot, or explicitly document/justify intentionally ignoring WAL.
- Drop immutable=1 on temp-copy SQLite opens in readLocalStorageFile / countLocalStorageFile. Session.Acquire copies the -wal / -shm sidecars, so mode=ro alone lets SQLite replay WAL on the ephemeral copy and surface entries Safari committed to WAL but hasn't checkpointed yet. Live-file reads in profiles.go keep immutable=1 as before. - Order ItemTable query by (key, rowid) for deterministic exports across runs and SQLite versions. - Wrap os.ReadFile / os.ReadDir errors with the offending path so multi-origin debug logs stay scannable. - RFC-011 §7 rewritten to explain the live-vs-temp split. - New regression test asserts ORDER BY surfaces rows in key order. Addresses round-2 Copilot review on #582.
Summary
WebsiteDataStore/<uuid>/Origins/<h>/<h>/LocalStorage/localstorage.sqlite3for named profiles,WebsiteData/Default/...for the default), including the binarySecurityOriginfile and UTF-16 LE value decoding. Reports the frame origin so partitioned storage is attributed correctly.Test plan
go test ./browser/safari/... -count=1(21 new LocalStorage tests + existing suite)go test ./... -count=1golangci-lint run ./...gofumpt -l browser/safari/+goimports -l -local github.com/moond4rk/hackbrowserdata browser/safari/GOOS=linux+GOOS=windowscross-compilesqlite3ground-truth cross-check on weather.com (24 rows) matches HBD output; explicit-port origin (http://192.168.50.102:30002) rendered correctly alongside default-port sibling.