Skip to content

feat(safari): localstorage extraction#582

Merged
moonD4rk merged 4 commits intomainfrom
feat/safari-localstorage
Apr 21, 2026
Merged

feat(safari): localstorage extraction#582
moonD4rk merged 4 commits intomainfrom
feat/safari-localstorage

Conversation

@moonD4rk
Copy link
Copy Markdown
Owner

Summary

  • Closes the last unchecked item in feat: add Safari browser support on macOS #565 — Safari localStorage.
  • Handles Safari 17+'s nested WebKit layout (WebsiteDataStore/<uuid>/Origins/<h>/<h>/LocalStorage/localstorage.sqlite3 for named profiles, WebsiteData/Default/... for the default), including the binary SecurityOrigin file and UTF-16 LE value decoding. Reports the frame origin so partitioned storage is attributed correctly.
  • New RFC-011 documents Safari data storage end-to-end; defers Keychain content to RFC-006 §7.

Test plan

  • go test ./browser/safari/... -count=1 (21 new LocalStorage tests + existing suite)
  • go test ./... -count=1
  • golangci-lint run ./...
  • gofumpt -l browser/safari/ + goimports -l -local github.com/moond4rk/hackbrowserdata browser/safari/
  • GOOS=linux + GOOS=windows cross-compile
  • Live run on macOS Safari (2 profiles): 110 entries across 18 origins; sqlite3 ground-truth cross-check on weather.com (24 rows) matches HBD output; explicit-port origin (http://192.168.50.102:30002) rendered correctly alongside default-port sibling.

Extracts Safari 17+ localStorage from WebKit's nested layout —
WebsiteDataStore/<uuid>/Origins/<top-hash>/<frame-hash>/LocalStorage/
localstorage.sqlite3 for named profiles, WebsiteData/Default for the
default profile. Parses the binary SecurityOrigin serialization
(length-prefixed scheme+host plus 0x00 default-port or 0x01 <uint16_le>
explicit-port section) and decodes UTF-16 LE ItemTable value BLOBs,
capping oversized values at 2048 bytes to match the Chromium extractor.
Reports the frame origin URL so partitioned third-party storage is
attributed to the iframe origin JavaScript actually sees.

Closes the remaining LocalStorage checkbox in #565.
Documents Safari's profile structure, per-category file layouts, and
storage formats including the Safari 17+ nested WebKit Origins
localStorage layout and binary SecurityOrigin serialization. Defers
Keychain credential extraction to RFC-006 §7 and notes the cross-browser
differences (plaintext cookies, plist bookmarks/downloads, Core Data
epoch timestamps, partitioned storage).
Copilot AI review requested due to automatic review settings April 21, 2026 11:59
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 76.04167% with 46 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.66%. Comparing base (d75738b) to head (3c0ffe0).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
browser/safari/extract_storage.go 74.01% 28 Missing and 18 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #582      +/-   ##
==========================================
+ Coverage   73.55%   73.66%   +0.11%     
==========================================
  Files          57       58       +1     
  Lines        2401     2586     +185     
==========================================
+ Hits         1766     1905     +139     
- Misses        481      509      +28     
- Partials      154      172      +18     
Flag Coverage Δ
unittests 73.66% <76.04%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Safari LocalStorage extraction support by traversing Safari 17+ WebKit “Origins” layout, parsing the binary origin file for correct (partition-aware) attribution, decoding UTF-16LE values, and documenting Safari storage end-to-end.

Changes:

  • Implement Safari LocalStorage extraction + counting from the nested WebKit Origins layout (including origin-file parsing and UTF-16LE decoding).
  • Wire LocalStorage source discovery for default and named Safari profiles (directory sources + acquisition).
  • Add a new Safari storage RFC and comprehensive Safari LocalStorage tests/fixtures.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
rfcs/011-safari-data-storage.md Documents Safari storage locations/formats (incl. LocalStorage Origins layout) and platform quirks.
browser/safari/source.go Adds LocalStorage directory sources for default and named profiles.
browser/safari/safari.go Hooks LocalStorage into extract/count category dispatch.
browser/safari/extract_storage.go Implements Origins traversal, origin-file parsing, and localStorage.sqlite3 reading/decoding.
browser/safari/testutil_test.go Adds fixtures to generate nested Origins layout + origin/localstorage DB test data.
browser/safari/extract_storage_test.go Adds unit + end-to-end tests for origin parsing, UTF-16 decoding, extraction, and counting.
browser/safari/safari_test.go Extends profile/source resolution tests and category extract/count tests for LocalStorage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +17 to +21
func TestReadOriginBlock_FirstParty(t *testing.T) {
data := encodeOriginFile("https://example.com", "https://example.com")
top, pos, err := readOriginBlock(data, 0)
require.NoError(t, err)
assert.Equal(t, "https", top.scheme)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Origin parsing tests only cover originEncASCII via encodeOriginFile. Consider adding a case where scheme/host are stored with originEncUTF16, plus a case asserting unknown encoding bytes are rejected (once handled), to cover the remaining branches in readOriginString.

Copilot uses AI. Check for mistakes.
Comment on lines +208 to +211
case originEncASCII:
return string(chunk), pos, nil
case originEncUTF16:
return decodeUTF16LE(chunk), pos, nil
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originEncASCII is documented as Latin-1/ASCII, but this branch converts bytes with string(chunk) (UTF-8). If WebKit stores Latin-1 here, non-ASCII bytes will be mis-decoded. Consider decoding ISO-8859-1 (similar to browser/chromium/extract_storage.go’s decodeLatin1) to match the documented encoding.

Copilot uses AI. Check for mistakes.
Comment thread browser/safari/extract_storage.go Outdated
Comment on lines +212 to +213
default:
return string(chunk), pos, nil
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default branch silently accepts unknown origin string encodings by returning string(chunk). This can propagate corrupted scheme/host into output URLs. Consider returning an error on unknown encodings (similar to Chromium’s decodeChromiumString behavior) so the caller can skip the partition cleanly.

Copilot uses AI. Check for mistakes.
if err := rows.Scan(&key, &value); err != nil {
log.Debugf("safari localstorage: scan row in %s: %v", path, err)
continue
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ItemTable key is scanned into sql.NullString but key.Valid is not checked. If a row has a NULL key, it will be emitted as an empty string, potentially causing collisions and misleading output. Consider skipping rows where key.Valid is false (or logging and continuing).

Suggested change
}
}
if !key.Valid {
log.Debugf("safari localstorage: skip row with NULL key in %s", path)
continue
}

Copilot uses AI. Check for mistakes.
Comment thread browser/safari/extract_storage.go Outdated
Comment on lines +86 to +92
entries, err := extractLocalStorage(root)
if err != nil {
return 0, err
}
return len(entries), nil
}

Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

countLocalStorage currently calls extractLocalStorage and decodes all values just to compute a count. For large Origins trees this can be significantly slower than necessary during CountEntries. Consider counting without value decoding (e.g., traverse origin dirs and run SELECT COUNT(*) per localstorage.sqlite3, summing results).

Suggested change
entries, err := extractLocalStorage(root)
if err != nil {
return 0, err
}
return len(entries), nil
}
dirs, err := findOriginDataDirs(root)
if err != nil {
return 0, err
}
total := 0
for _, od := range dirs {
dbPath := filepath.Join(od, webkitLocalStorageSubdir, webkitLocalStorageDB)
n, err := countLocalStorageFile(dbPath)
if err != nil {
log.Debugf("safari localstorage: db %s: %v", dbPath, err)
continue
}
total += n
}
return total, nil
}
func countLocalStorageFile(dbPath string) (int, error) {
db, err := sql.Open("sqlite", dbPath)
if err != nil {
return 0, err
}
defer db.Close()
var count int
if err := db.QueryRow(`SELECT COUNT(*) FROM ItemTable`).Scan(&count); err != nil {
return 0, err
}
return count, nil
}

Copilot uses AI. Check for mistakes.
- Decode originEncASCII via decodeLatin1 so high-byte records preserve
  their ISO-8859-1 meaning instead of being interpreted as UTF-8.
  Matches the pattern in chromium/extract_storage.go.
- Skip ItemTable rows where key is NULL — SQLite's UNIQUE constraint
  permits multiple NULLs, and silently lowering them to empty strings
  would collide with legitimate empty-string keys.
- countLocalStorage now walks origin dirs and runs SELECT COUNT(key)
  per localstorage.sqlite3 instead of fully decoding every value.
  COUNT(key) naturally excludes NULLs, keeping count and extract
  symmetric.

Addresses Copilot review feedback on #582.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +271 to +280
func readLocalStorageFile(path string) ([]localStorageItem, error) {
// Read-only + immutable so we don't disturb a live WAL (same pattern as profiles.go).
dsn := "file:" + path + "?mode=ro&immutable=1"
db, err := sql.Open("sqlite", dsn)
if err != nil {
return nil, fmt.Errorf("open %s: %w", path, err)
}
defer db.Close()
if err := db.Ping(); err != nil {
return nil, fmt.Errorf("ping %s: %w", path, err)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

immutable=1 disables WAL replay, but the extractor is reading from the temp copy of the Origins directory (and the repo’s filemanager.Session copies WAL/SHM). Dropping immutable=1 and keeping read-only access will make localStorage extraction include committed data that hasn’t been checkpointed yet.

Copilot uses AI. Check for mistakes.
Comment thread browser/safari/extract_storage.go Outdated
return nil, fmt.Errorf("ping %s: %w", path, err)
}

rows, err := db.Query(`SELECT key, value FROM ItemTable`)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Querying ItemTable without an ORDER BY can yield nondeterministic output ordering across runs/SQLite versions. Consider ordering by key (and/or rowid) to keep exports stable and tests/consumers deterministic.

Suggested change
rows, err := db.Query(`SELECT key, value FROM ItemTable`)
rows, err := db.Query(`SELECT key, value FROM ItemTable ORDER BY key, rowid`)

Copilot uses AI. Check for mistakes.
Comment thread browser/safari/extract_storage.go Outdated
func readOriginFile(path string) (string, error) {
data, err := os.ReadFile(path)
if err != nil {
return "", err
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readOriginFile returns the raw os.ReadFile error without path context, which makes debug logs harder to interpret. Wrap the error with the filename (e.g. fmt.Errorf("read origin file %s: %w", path, err)) for consistent diagnostics with the rest of this package.

Suggested change
return "", err
return "", fmt.Errorf("read origin file %s: %w", path, err)

Copilot uses AI. Check for mistakes.
Comment on lines +208 to +213
func appendOriginRecord(buf []byte, s string) []byte {
lenBytes := make([]byte, 4)
binary.LittleEndian.PutUint32(lenBytes, uint32(len(s)))
buf = append(buf, lenBytes...)
buf = append(buf, originEncASCII)
return append(buf, []byte(s)...)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture helper always serializes scheme/host as originEncASCII and uses len(s) (UTF-8 byte length) as the record length. That diverges from WebKit’s origin encoding rules and will produce malformed fixtures if a test ever uses non-ASCII/Latin-1 hosts (or wants to cover UTF-16 origin records). Consider encoding Latin-1 bytes explicitly when possible and falling back to UTF-16 LE with originEncUTF16 (and a byte-length) for other strings.

Suggested change
func appendOriginRecord(buf []byte, s string) []byte {
lenBytes := make([]byte, 4)
binary.LittleEndian.PutUint32(lenBytes, uint32(len(s)))
buf = append(buf, lenBytes...)
buf = append(buf, originEncASCII)
return append(buf, []byte(s)...)
func encodeOriginRecordString(s string) (byte, []byte) {
latin1 := make([]byte, 0, len(s))
for _, r := range s {
if r > 0xFF {
u16 := utf16.Encode([]rune(s))
encoded := make([]byte, 2*len(u16))
for i, v := range u16 {
binary.LittleEndian.PutUint16(encoded[i*2:], v)
}
return originEncUTF16, encoded
}
latin1 = append(latin1, byte(r))
}
return originEncASCII, latin1
}
func appendOriginRecord(buf []byte, s string) []byte {
enc, payload := encodeOriginRecordString(s)
lenBytes := make([]byte, 4)
binary.LittleEndian.PutUint32(lenBytes, uint32(len(payload)))
buf = append(buf, lenBytes...)
buf = append(buf, enc)
return append(buf, payload...)

Copilot uses AI. Check for mistakes.
Comment thread browser/safari/extract_storage.go Outdated
func findOriginDataDirs(root string) ([]string, error) {
topEntries, err := os.ReadDir(root)
if err != nil {
return nil, fmt.Errorf("read origins root: %w", err)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error returned here doesn’t include the root path, which makes it harder to trace which profile/store failed when multiple Origins roots are processed. Including root in the formatted message would improve diagnostics.

Suggested change
return nil, fmt.Errorf("read origins root: %w", err)
return nil, fmt.Errorf("read origins root %s: %w", root, err)

Copilot uses AI. Check for mistakes.
Comment on lines +107 to +116
func countLocalStorageFile(path string) (int, error) {
dsn := "file:" + path + "?mode=ro&immutable=1"
db, err := sql.Open("sqlite", dsn)
if err != nil {
return 0, fmt.Errorf("open %s: %w", path, err)
}
defer db.Close()
if err := db.Ping(); err != nil {
return 0, fmt.Errorf("ping %s: %w", path, err)
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using immutable=1 here prevents SQLite from replaying a copied -wal file. filemanager.Session.Acquire copies WAL/SHM companions, so counting against only the main DB can undercount committed localStorage entries that are still in WAL. Prefer opening the copied DB with read-only mode (e.g. mode=ro without immutable=1) so WAL is applied on the snapshot, or explicitly document/justify intentionally ignoring WAL.

Copilot uses AI. Check for mistakes.
- Drop immutable=1 on temp-copy SQLite opens in readLocalStorageFile /
  countLocalStorageFile. Session.Acquire copies the -wal / -shm sidecars,
  so mode=ro alone lets SQLite replay WAL on the ephemeral copy and
  surface entries Safari committed to WAL but hasn't checkpointed yet.
  Live-file reads in profiles.go keep immutable=1 as before.
- Order ItemTable query by (key, rowid) for deterministic exports across
  runs and SQLite versions.
- Wrap os.ReadFile / os.ReadDir errors with the offending path so
  multi-origin debug logs stay scannable.
- RFC-011 §7 rewritten to explain the live-vs-temp split.
- New regression test asserts ORDER BY surfaces rows in key order.

Addresses round-2 Copilot review on #582.
@moonD4rk moonD4rk merged commit 7a5db25 into main Apr 21, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants