Skip to content

Add Freighter-mobile best practices LLM reference docs#810

Merged
leofelix077 merged 26 commits intomainfrom
lf-add-freighter-mobile-best-practices-skill
May 5, 2026
Merged

Add Freighter-mobile best practices LLM reference docs#810
leofelix077 merged 26 commits intomainfrom
lf-add-freighter-mobile-best-practices-skill

Conversation

@leofelix077
Copy link
Copy Markdown
Collaborator

@leofelix077 leofelix077 commented Apr 9, 2026

Skill Eval: freighter-mobile-best-practices — Benchmark Report

Date: 2026-04-09
Iterations: 1
Repo: stellar/freighter-mobile
Model: Claude Opus 4.6

Summary

on further passes both get close to 100%, as Claude picks up automatically on the rules and the entry points

Metric With Skill Without Skill Delta
Pass rate 87.1% 67.2% +19.9pp

Assertion Results (aggregated across 3 iterations)

architecture-screen

# Assertion With Skill Without Skill
1 All components use arrow function expressions (not function declarations) 1/1 (100%) 1/1 (100%)
2 All user-facing text uses t() via useAppTranslation hook 1/1 (100%) 1/1 (100%)
3 Creates proper screen directory: screens/ with index.tsx, components/, hooks/ su... 0/1 (0%) 0/1 (0%)
4 Creates typed route param list for the staking navigator 1/1 (100%) 1/1 (100%)
5 Provides both en and pt translations 0/1 (0%) 0/1 (0%)
6 Screen files use default export; hooks/helpers use named exports 1/1 (100%) 1/1 (100%)
7 Uses absolute imports from src/ root (no relative paths) 1/1 (100%) 0/1 (0%)
8 Uses route enum constants, not raw strings 1/1 (100%) 1/1 (100%)

architecture-zustand

# Assertion With Skill Without Skill
1 Does not show generic error without context about what operation failed 1/1 (100%) 1/1 (100%)
2 Error handling uses normalizeError() from config/logger 1/1 (100%) 0/1 (0%)
3 Follows Zustand async pattern: set({ isLoading: true, error: null }) -> try/catc... 1/1 (100%) 1/1 (100%)
4 No direct store mutations (uses set() to create new state) 1/1 (100%) 1/1 (100%)
5 No empty catch blocks 1/1 (100%) 1/1 (100%)
6 Reports unexpected errors to Sentry (not expected validation errors) 1/1 (100%) 1/1 (100%)
7 Shows user-facing errors via Toast, NOT Alert.alert() 0/1 (0%) 0/1 (0%)
8 Uses absolute imports 1/1 (100%) 1/1 (100%)
9 Uses validateTransactionParams() before building transaction 1/1 (100%) 1/1 (100%)

code-style-hook

# Assertion With Skill Without Skill
1 Error handling uses normalizeError() 1/1 (100%) 0/1 (0%)
2 Hook file uses named export (not default) 1/1 (100%) 1/1 (100%)
3 Hook name starts with use prefix: useTokenPrices 1/1 (100%) 1/1 (100%)
4 JSDoc comment present on the hook function 1/1 (100%) 1/1 (100%)
5 Return value is memoized (wrapped in useMemo) 1/1 (100%) 0/1 (0%)
6 Uses ?? (nullish coalescing) instead of for fallback values
7 Uses absolute imports (not relative paths) 1/1 (100%) 1/1 (100%)
8 Uses arrow function expression (not function declaration) 1/1 (100%) 1/1 (100%)

code-style-naming

# Assertion With Skill Without Skill
1 Changes to arrow function expression (not function declaration) 1/1 (100%) 1/1 (100%)
2 Changes to default export (it's a screen) 1/1 (100%) 1/1 (100%)
3 Fixes imports to absolute paths (not relative ../../) 1/1 (100%) 1/1 (100%)
4 List item wrapped in React.memo() 1/1 (100%) 0/1 (0%)
5 No hardcoded colors 1/1 (100%) 0/1 (0%)
6 Replaces Alert.alert with Toast for error display 1/1 (100%) 0/1 (0%)
7 Replaces Image with FastImage for remote URLs 1/1 (100%) 0/1 (0%)
8 Replaces ScrollView+map with FlatList for virtualization 1/1 (100%) 1/1 (100%)
9 Replaces StyleSheet.create with NativeWind className 1/1 (100%) 0/1 (0%)
10 Replaces hardcoded strings with t() calls 1/1 (100%) 1/1 (100%)
11 Uses stable key (not array index) 1/1 (100%) 1/1 (100%)
12 Uses useAppTranslation() instead of raw useTranslation() 1/1 (100%) 1/1 (100%)
13 Uses useShallow for multi-field Zustand selectors 1/1 (100%) 0/1 (0%)
14 Zustand stores accessed via selectors, not full destructuring 1/1 (100%) 0/1 (0%)
15 totalValue computed in useMemo 1/1 (100%) 1/1 (100%)

err-handling-retry

# Assertion With Skill Without Skill
1 Does not show generic 'Something went wrong' without context 1/1 (100%) 1/1 (100%)
2 Implements retry with exponential backoff (1s, 2s, 4s, 8s, 16s) for HTTP 504 0/1 (0%) 1/1 (100%)
3 Maps Horizon error codes to translated user-facing messages using t() 1/1 (100%) 1/1 (100%)
4 Maximum 5 retry attempts 1/1 (100%) 1/1 (100%)
5 Reports unexpected errors to Sentry 1/1 (100%) 1/1 (100%)
6 Shows user-facing errors via Toast, NOT Alert.alert() 1/1 (100%) 1/1 (100%)
7 Uses absolute imports (not relative paths) 1/1 (100%) 0/1 (0%)
8 Uses normalizeError() for error normalization 1/1 (100%) 0/1 (0%)

err-handling-zustand

# Assertion With Skill Without Skill
1 Async actions follow set({ isLoading: true, error: null }) -> try/catch -> set r... 1/1 (100%) 0/1 (0%)
2 Clears state on account switch (not just on unmount) 1/1 (100%) 1/1 (100%)
3 Does NOT use console.error for error reporting - uses Sentry 1/1 (100%) 1/1 (100%)
4 Error handling uses normalizeError() from config/logger 1/1 (100%) 0/1 (0%)
5 No direct store mutations (no get().array.push()) 1/1 (100%) 1/1 (100%)
6 Store interface defines both state fields and action functions 1/1 (100%) 1/1 (100%)
7 Uses create() from Zustand with typed interface 1/1 (100%) 1/1 (100%)
8 Uses named export for the store hook (useTransactionHistoryStore) 1/1 (100%) 1/1 (100%)

i18n-settings

# Assertion With Skill Without Skill
1 All user-facing strings come from t() calls 1/1 (100%) 0/1 (0%)
2 Component name has Screen suffix 1/1 (100%) 1/1 (100%)
3 Component uses arrow function expression 1/1 (100%) 1/1 (100%)
4 Provides both English (en) and Portuguese (pt) translation entries 1/1 (100%) 1/1 (100%)
5 Screen component uses default export 1/1 (100%) 1/1 (100%)
6 Translation keys use nested dot notation 1/1 (100%) 1/1 (100%)
7 Uses NativeWind className for styling (not StyleSheet.create) 1/1 (100%) 1/1 (100%)
8 Uses absolute imports 1/1 (100%) 1/1 (100%)
9 Uses useAppTranslation() hook (not raw useTranslation) 1/1 (100%) 1/1 (100%)

nav-typed-routes

# Assertion With Skill Without Skill
1 All user-facing text uses t() via useAppTranslation 1/1 (100%) 1/1 (100%)
2 Creates route enum with named constants for each screen 1/1 (100%) 1/1 (100%)
3 Creates typed param list type for the navigator 1/1 (100%) 1/1 (100%)
4 Deep link config uses correct scheme (freighterdev:// for dev) 1/1 (100%) 1/1 (100%)
5 Navigation uses enum constants, never raw strings 1/1 (100%) 1/1 (100%)
6 Optional params marked with ? in param list type 1/1 (100%) 1/1 (100%)
7 Screen components use arrow functions and default exports 1/1 (100%) 1/1 (100%)
8 navigation.navigate() calls are fully typed 1/1 (100%) 0/1 (0%)

performance-flatlist

# Assertion With Skill Without Skill
1 FlatList has keyExtractor with stable ID (not array index) 1/1 (100%) 1/1 (100%)
2 FlatList has maxToRenderPerBatch prop 1/1 (100%) 1/1 (100%)
3 FlatList has removeClippedSubviews prop 1/1 (100%) 1/1 (100%)
4 FlatList has windowSize prop 1/1 (100%) 1/1 (100%)
5 List item component wrapped in React.memo() 1/1 (100%) 1/1 (100%)
6 No inline arrow functions in JSX (onPress, etc.) 1/1 (100%) 1/1 (100%)
7 Uses FastImage (not React Native Image) for remote token icons 1/1 (100%) 0/1 (0%)
8 Uses NativeWind className for styling (not StyleSheet.create) 0/1 (0%) 0/1 (0%)
9 Uses useShallow for multi-field Zustand selectors 1/1 (100%) 0/1 (0%)
10 Zustand store accessed via selectors, not full store destructuring 1/1 (100%) 1/1 (100%)
11 renderItem callback wrapped in useCallback 1/1 (100%) 1/1 (100%)

performance-selectors

# Assertion With Skill Without Skill
1 All user-facing text uses t() via useAppTranslation 1/1 (100%) 0/1 (0%)
2 Derived values computed in useMemo 1/1 (100%) 1/1 (100%)
3 Does NOT create inline objects/arrays as props to child components 0/1 (0%) 1/1 (100%)
4 Uses NativeWind className for styling 1/1 (100%) 0/1 (0%)
5 Uses absolute imports 1/1 (100%) 1/1 (100%)
6 Uses arrow function expression for component 1/1 (100%) 1/1 (100%)
7 Uses useShallow from Zustand for selecting multiple fields 1/1 (100%) 0/1 (0%)
8 Zustand stores accessed via specific selectors, NOT destructuring entire store 1/1 (100%) 1/1 (100%)

security-storage

# Assertion With Skill Without Skill
1 Does NOT use AsyncStorage directly for keys, seeds, or passwords 1/1 (100%) 1/1 (100%)
2 Does not use hardcoded test keys - references environment variables for test dat... 1/1 (100%) 0/1 (0%)
3 Error handling uses normalizeError() + Sentry 1/1 (100%) 0/1 (0%)
4 Never logs key material even in DEV mode 1/1 (100%) 1/1 (100%)
5 Uses absolute imports throughout 1/1 (100%) 0/1 (0%)
6 Uses dataStorage (AsyncStorage) ONLY for non-sensitive metadata (e.g., lastBacku... 1/1 (100%) 1/1 (100%)
7 Uses secureDataStorage (keychain/keystore) for storing encrypted seed data 1/1 (100%) 1/1 (100%)

security-walletconnect

# Assertion With Skill Without Skill
1 Checks and sets hasRespondedRef before responding 0/1 (0%) 0/0 (0%)
2 Error responses use JSON-RPC format { code: 5000, message: '...' } 0/1 (0%) 0/0 (0%)
3 Implements Blockaid scanning: malicious=auto-reject, suspicious/scan-failed=warn... 0/1 (0%) 0/0 (0%)
4 Never trusts dApp display names/icons for security decisions 1/1 (100%) 0/0 (0%)
5 User-facing strings wrapped in t() via useAppTranslation 1/1 (100%) 0/0 (0%)
6 Uses hasRespondedRef (React ref) to prevent duplicate responses 1/1 (100%) 0/0 (0%)
7 Uses validation functions from walletKitValidation.ts 1/1 (100%) 0/0 (0%)
8 Validates chain matches active Stellar network (stellar:pubnet or stellar:testne... 1/1 (100%) 0/0 (0%)

styling-card

# Assertion With Skill Without Skill
1 All user-facing text uses t() 1/1 (100%) 0/1 (0%)
2 Checks/uses SDS components from src/components/sds/ where applicable 1/1 (100%) 1/1 (100%)
3 Component wrapped in React.memo() 0/1 (0%) 0/1 (0%)
4 Does NOT use StyleSheet.create 1/1 (100%) 1/1 (100%)
5 Uses FastImage for the remote token icon, with resizeMode specified 0/1 (0%) 0/1 (0%)
6 Uses NativeWind className as primary styling approach 0/1 (0%) 0/1 (0%)
7 Uses absolute imports 1/1 (100%) 1/1 (100%)
8 Uses arrow function expression 1/1 (100%) 1/1 (100%)
9 Uses named export (component, not screen) 1/1 (100%) 0/1 (0%)

testing-zustand

# Assertion With Skill Without Skill
1 Mocks use absolute paths matching import convention 1/1 (100%) 1/1 (100%)
2 Test file in tests/ directory mirroring src/ structure 0/1 (0%) 0/1 (0%)
3 Test file uses .test.ts extension 1/1 (100%) 1/1 (100%)
4 Tests network failure path (reports unexpected errors to Sentry) 0/1 (0%) 0/1 (0%)
5 Tests success path with proper state assertions 1/1 (100%) 1/1 (100%)
6 Tests validation error path (does NOT report expected validation errors to Sentr... 1/1 (100%) 1/1 (100%)
7 Uses renderHook and act from testing utilities 0/1 (0%) 0/1 (0%)
8 Uses useMyStore.setState() for setting up store state 1/1 (100%) 1/1 (100%)

Copilot AI review requested due to automatic review settings April 9, 2026 16:15
@leofelix077 leofelix077 self-assigned this Apr 9, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a “freighter mobile best practices” documentation skill and supporting context files to help AI agents and contributors navigate Freighter Mobile’s architecture, tooling, and development conventions.

Changes:

  • Introduces llms.txt and CLAUDE.md as entry-point context/reference docs for the repo.
  • Adds docs/skills/freighter-mobile-best-practices/ with a skill definition and focused reference guides (architecture, code style, security, WalletConnect, testing, etc.).

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
llms.txt High-level repo index for docs, dev, testing, and key concepts
CLAUDE.md Consolidated AI agent / contributor context (tooling, structure, conventions)
docs/skills/freighter-mobile-best-practices/SKILL.md Skill definition + reference index for best-practices topics
docs/skills/freighter-mobile-best-practices/references/architecture.md Architecture + layering + duck/store patterns reference
docs/skills/freighter-mobile-best-practices/references/anti-patterns.md Common mistakes/anti-pattern guidance
docs/skills/freighter-mobile-best-practices/references/code-style.md Formatting, ESLint/Prettier rules, naming conventions
docs/skills/freighter-mobile-best-practices/references/dependencies.md Dependency management + native dependency workflow
docs/skills/freighter-mobile-best-practices/references/error-handling.md Error normalization + store async patterns + WC error responses
docs/skills/freighter-mobile-best-practices/references/git-workflow.md Branching/commit/PR/release process guidance
docs/skills/freighter-mobile-best-practices/references/i18n.md i18n framework usage + key structure + lint enforcement notes
docs/skills/freighter-mobile-best-practices/references/navigation.md Navigator hierarchy + typing + deep links conventions
docs/skills/freighter-mobile-best-practices/references/performance.md Performance rules/checklist and optimization guidance
docs/skills/freighter-mobile-best-practices/references/security.md Storage tiers + auth/security-sensitive areas overview
docs/skills/freighter-mobile-best-practices/references/styling.md NativeWind/SDS/bottom-sheet/modal styling guidance
docs/skills/freighter-mobile-best-practices/references/testing.md Jest + Maestro structure, commands, and e2e guidance
docs/skills/freighter-mobile-best-practices/references/walletconnect.md WalletConnect architecture, request handling, and validations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread llms.txt Outdated
Comment thread docs/skills/freighter-mobile-best-practices/SKILL.md Outdated
Comment thread docs/best-practices/walletconnect.md
Comment thread docs/skills/freighter-mobile-best-practices/references/testing.md Outdated
Comment thread docs/best-practices/security.md
Comment thread docs/skills/freighter-mobile-best-practices/references/performance.md Outdated
Comment thread docs/skills/freighter-mobile-best-practices/references/performance.md Outdated
Comment thread docs/skills/freighter-mobile-best-practices/references/code-style.md Outdated
Comment thread docs/best-practices/architecture.md
Comment thread docs/best-practices/testing.md
Rewrite AGENTS.md as the single AI agent entry point:
- Add glossary section with domain-specific terminology
- Add documentation link index (replaces llms.txt)
- Remove sections that duplicate best-practices reference files
  (code style details, branch conventions, PR instructions)
- Keep unique context: repo map, architecture orientation (ducks/nav/WC),
  security alert list, known complexity/gotchas, pre-submission checklist
- Delete llms.txt (content absorbed into AGENTS.md)
- Delete CLAUDE.md (content absorbed into AGENTS.md)
@aristidesstaffieri
Copy link
Copy Markdown
Contributor

Code review

Found 1 issue:

  1. performance.md claims "No FastImage adoption despite availability" and lists "Adopt FastImage for remote images" as a P1 action item. However, FastImage (@d11/react-native-fast-image) is already adopted and actively used in 4 files: src/components/sds/Token/index.tsx, src/helpers/validateIconUrl.ts, src/ducks/tokenIcons.ts, and src/components/analytics/DebugBottomSheet.tsx. The "Image Optimization Score: 4/10" and the P1 recommendation are stale and will give incorrect guidance.

## Image Optimization -- Score: 4/10
No FastImage adoption despite availability. React Native's default Image has no
HTTP caching.
**RULE: Use FastImage for ALL remote images (token icons, NFTs, profile
images).**

| **P0** | Add FlatList optimization props to all lists | Improves scroll perf on 9 list components | 6/10 → 9/10 |
| **P1** | Adopt FastImage for remote images | Adds HTTP caching for all images | 4/10 → 8/10 |
| **P1** | Extract 123 inline handlers to useCallback | Stabilizes reference equality | 6/10 → 8/10 |

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Comment thread AGENTS.md Outdated
Comment thread AGENTS.md Outdated
Comment thread AGENTS.md
Comment thread AGENTS.md
Comment thread AGENTS.md Outdated
Comment thread AGENTS.md Outdated
Comment thread AGENTS.md Outdated
Comment thread docs/skills/freighter-mobile-best-practices/references/navigation.md Outdated
Comment thread docs/skills/freighter-mobile-best-practices/references/anti-patterns.md Outdated
Comment thread docs/skills/freighter-mobile-best-practices/references/architecture.md Outdated
Comment thread docs/skills/freighter-mobile-best-practices/references/dependencies.md Outdated
Comment thread docs/skills/freighter-mobile-best-practices/references/dependencies.md Outdated
leofelix077 and others added 2 commits April 15, 2026 09:30
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread docs/skills/freighter-mobile-best-practices/references/error-handling.md Outdated
Copy link
Copy Markdown
Contributor

@CassioMG CassioMG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another finding — the Reanimated rule is too absolute given existing legacy Animated usage.

Comment thread docs/skills/freighter-mobile-best-practices/references/performance.md Outdated
Copy link
Copy Markdown
Contributor

@CassioMG CassioMG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another finding — the Hook Return Memoization example has multiple TypeScript errors.

Comment thread docs/best-practices/performance.md
Copy link
Copy Markdown
Contributor

@CassioMG CassioMG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another finding — the getItemLayout claim doesn't match codebase reality.

Comment thread docs/skills/freighter-mobile-best-practices/references/performance.md Outdated
Copy link
Copy Markdown
Contributor

@CassioMG CassioMG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another finding — the Provider Layer list mentions a non-existent ThemeProvider.

Comment thread docs/skills/freighter-mobile-best-practices/references/architecture.md Outdated
Copy link
Copy Markdown
Contributor

@CassioMG CassioMG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another finding — enableFreeze is presented as if used but isn't.

Comment thread docs/best-practices/performance.md
@CassioMG CassioMG mentioned this pull request Apr 25, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/best-practices/code-style.md
Copy link
Copy Markdown
Contributor

@CassioMG CassioMG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verifying the latest changes — 13 of my outstanding comments addressed in 24d64b3, 7 outstanding comments still open. One new red flag introduced by the SDS examples added in this commit.

Comment thread docs/skills/freighter-mobile-best-practices/references/styling.md Outdated
leofelix077 and others added 2 commits April 29, 2026 09:56
….com:stellar/freighter-mobile into lf-add-freighter-mobile-best-practices-skill

Co-authored-by: Copilot <copilot@github.com>
Copy link
Copy Markdown
Contributor

@CassioMG CassioMG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small inconsistency in error-handling.md spotted

Comment thread docs/best-practices/error-handling.md Outdated

`normalizeError()` feeds directly into Sentry for crash reporting. Always
normalize errors before sending to Sentry to ensure consistent, actionable
reports.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Comment 22 — Suggestion] Sentry Integration section contradicts the earlier guidance

This says:

"normalizeError() feeds directly into Sentry for crash reporting. Always normalize errors before sending to Sentry to ensure consistent, actionable reports."

But the earlier "Error Normalization" section (lines 34-35) says:

"Use logger.error() to report errors — it normalizes and forwards to Sentry internally. Do not call Sentry.captureException() directly."

And the Rules section at line 159:

"Never call Sentry.captureException() directly — go through logger.error()..."

The phrasing in the Sentry Integration section implies the agent should "send to Sentry" themselves after normalizing — which contradicts the rule against calling Sentry directly. An agent reading the bottom section in isolation might still try to call Sentry.captureException(normalizedError).

Suggest collapsing the redundancy:

-## Sentry Integration
-
-`normalizeError()` feeds directly into Sentry for crash reporting. Always
-normalize errors before sending to Sentry to ensure consistent, actionable
-reports.
+## Sentry Integration
+
+Sentry receives normalized errors automatically when you call `logger.error()` —
+the logger normalizes via `normalizeError()` and forwards internally. You don't
+need to call Sentry yourself.

Or delete the section entirely — the same info is already covered in "Error Normalization" and the Rules section.

* fix(troubleshooting): add emulator recreation tip for persistent memory issues

Android Studio sometimes doesn't apply RAM changes to existing AVDs reliably;
recreating the emulator is the definitive fix.

* cleanup runtime information and clipboard not pasting

* Update code review troubleshooting comments

* cleanup references naming

* clean up troubleshooting guide for stable xcode and IDE config

* breakdown commands on troubleshooting guide for better readability

* remove xcode 26 regression issue
Copy link
Copy Markdown
Contributor

@CassioMG CassioMG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one tiny reminder before merging.

Comment thread docs/troubleshooting-guide.md Outdated
@@ -0,0 +1,442 @@
# Troubleshooting Guide: Freighter Mobile

_Last updated: 2026-04-08_
Copy link
Copy Markdown
Contributor

@CassioMG CassioMG Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Comment 23 — Nit] Update the "Last updated" date before merging

This says 2026-04-08 but the guide has been iterated through 2026-04-29. Worth bumping to today's date so the staleness signal is accurate.

@leofelix077
Copy link
Copy Markdown
Collaborator Author

Benchmark Report — 4-Config Comparison

Structured after stellar/freighter#2687 comment: how much does each layer of guidance add over a cold-start agent with no project context?

4 configurations, 2 independent agents per new config, same 4 eval tasks,
same 41 binary assertions (15 + 8 + 11 + 7).

Baseline and Minimal refs are new runs (May 2026).
Full refs and With skill reuse the Refs-only/With-skill run from the
prior benchmark (same assertions, same eval tasks).

## What each config gets

Config A — Baseline (no skill, no refs):
  Task prompt only. Agent may read existing source code to understand
  patterns but receives no explicit best-practices guidance. No SKILL.md,
  no reference files, no Quick Rules.

Config B — Minimal refs (Quick Rules only, ~30 lines):
  Task prompt + the 13-rule Quick Rules section from SKILL.md. No full
  reference files. Represents the minimum targeted guidance covering the
  most commonly missed patterns.

Config C — Full refs (all 13 reference files):
  Task prompt + AGENTS.md routing directly to all 13 reference files (no
  SKILL.md, no Quick Rules primer). Simulates the state after deleting the
  skill mechanism and moving docs/skills/.../references/ → docs/best-
  practices/ as proposed in the extension PR comment.

Config D — With skill:
  Task prompt + AGENTS.md as-is → SKILL.md (Quick Rules + routing table) →
  relevant reference files. The full skill mechanism.

All agents had access to the same source code on disk.

## Results

  Config              Eval1  Eval2  Eval3  Eval4    Total
  Baseline (A)        10/15   7/8    9/11   4/7    30/41 (73%)
  Minimal refs (B)    13/15   8/8    8/11   6/7    35/41 (85%)
  Full refs (C)       14/15   8/8    8/11   6/7    36/41 (88%)
  With skill (D)      14/15   8/8    8/11   6/7    36/41 (88%)

  A→B delta: +5 assertions (+12pp)  — Quick Rules alone
  B→C delta: +1 assertion  (+3pp)   — full reference docs over Quick Rules
  C→D delta: 0 assertions  (0pp)    — skill mechanism over full refs

## What the Quick Rules uniquely fixed (A→B)

  Assertion                          A    B    Root cause
  Toast (not Alert.alert)            ❌   ✅   Explicit rule: "never Alert.alert"
  FastImage (not Image)              ❌   ✅   Explicit rule: "FastImage for all remote images"
  useShallow for multi-field selectors❌   ✅   Explicit rule: "Multi-field selectors require useShallow"
  Zustand via selectors              ❌   ✅   Explicit rule: "via selectors, not destructuring"
  normalizeError in catch            ❌   ✅   Explicit rule: "normalizeError() for error message"
  hasRespondedRef guard check        ❌   ✅   Explicit rule: "always check and set hasRespondedRef"
  Blockaid user-decides              ❌   ✅   Explicit rule: "malicious → warning, user decides"

  7 assertions gained; 2 regressions offset the gain slightly:
  - Minimal refs (B) failed StyleSheet.create (used for fixed-size image — not
    dynamic — but the agent internalized the rule narrowly). Baseline didn't
    trigger this because it used RN Image with className.
  - usdTotal not in useMemo: failed in both A and B (rule not in Quick Rules).

## What both A and B failed — still failing in C and D

  Assertion                       A    B    C    D    Root cause
  usdTotal in useMemo             ❌   ❌   ❌   ❌   Rule not in Quick Rules.
                                                       Only Eval 1 references
                                                       it but no config
                                                       reliably applied it.
  Default export for screen       ✅   ✅   ❌   ❌   Hallucination: C and D
                                                       verbally acknowledge
                                                       export default but write
                                                       export const. A and B
                                                       correctly exported it
                                                       this run (N=1 variance).
  FlatList windowSize             ❌†  ❌†  ❌†  ❌†  B/C/D used FlashList for
  FlatList maxToRenderPerBatch    ❌†  ❌†  ❌†  ❌†  50-200 items (correct per
  FlatList removeClippedSubviews  ❌†  ❌†  ❌†  ❌†  docs). A used FlatList
                                                       (correct for its range)
                                                       and passed. Assertions
                                                       penalise the right answer
                                                       for B/C/D.
  walletKitValidation.ts          ❌   ❌   ❌   ❌   All configs ignored it.
                                                       Quick Rules say "validate
                                                       with walletKitValidation.ts"
                                                       but neither the rule nor
                                                       the reference file give a
                                                       concrete function example.

## The reference files and the skill add little over Quick Rules

  Layer                   Cumulative score   Delta over previous
  Baseline                30/41  (73%)       —
  + Quick Rules (~30 ln)  35/41  (85%)       +12pp
  + Full ref files        36/41  (88%)       +3pp
  + Skill mechanism       36/41  (88%)       +0pp

  The Quick Rules provide the largest single lift. The 13 reference files add
  only 1 assertion (+3pp) on top — specifically the default export rule for
  screens, which the reference files encode but the Quick Rules primer also
  explicitly state (the difference here is likely N=1 variance rather than a
  structural advantage of the reference files).

  The skill mechanism adds zero measurable value over full refs at N=1.
  This matches the extension PR prediction and the prior Refs-only vs
  With-skill run (88% vs 88%, 0pp).

## Persistent gap: walletKitValidation.ts

  Every config (baseline, minimal refs, full refs, with skill) failed to call
  walletKitValidation.ts functions for the new handler. The Quick Rules say
  "Validate all request parameters with functions from walletKitValidation.ts
  before processing" but neither that rule nor the walletconnect.md reference
  give a concrete example of which specific function to call. Without an
  example, agents default to inline validation or delegation to the existing
  approveSessionRequest.

## Persistent gap: useMemo for derived values

  usdTotal / derived computed values are not in useMemo across all configs.
  The Quick Rules have no explicit rule for this; the reference files mention
  it in context but it does not reliably transfer.

## Proposed path forward

1. The Quick Rules are the highest-value layer: 30 lines, +12pp lift over
   baseline. They should be kept and applied even if the full skill mechanism
   is removed.

2. The full reference files add marginal value (+3pp) over Quick Rules at N=1.
   At N=1 the 95% CI is ~±15pp, so this 3pp difference is within noise.
   Multiple runs would establish whether the delta is real.

3. The skill mechanism (routing layer + SKILL.md) adds zero measurable value
   over just having reference files accessible directly. The extension PR
   restructure (delete SKILL.md, move refs to docs/best-practices/, update
   AGENTS.md routing) is supported by this data — but the Quick Rules should
   be preserved somewhere in the routing chain.

4. Update Eval 3 assertions to be FlashList-aware: if the agent used FlashList,
   check for estimatedItemSize instead of windowSize/maxToRenderPerBatch.
   Current assertions penalise the architecturally correct decision for any
   config that follows the ">100 items → FlashList" guidance.

5. Add a concrete before/after example to the walletKitValidation.ts rule
   (one function call showing which validator to use for XDR). The existing
   mandatory language in walletconnect.md is not enough — agents skip it
   without an example to pattern-match.

6. Add a useMemo rule to the Quick Rules: "Derived values (totals, filters,
   transformations) computed from store data → wrap in useMemo."

Per-Eval Breakdown

Eval 1 — code-style-naming

Assertion Baseline Minimal refs Full refs With skill
Arrow fn expression
Default export (it's a screen)
Absolute imports
List item in React.memo()
No hardcoded colors
Replaces Alert.alert with Toast
Replaces Image with FastImage
Replaces ScrollView+map with FlatList
StyleSheet.create → NativeWind className
Hardcoded strings → t()
Stable key (not index)
useAppTranslation, not raw useTranslation
useShallow for multi-field selectors
Zustand via selectors, not destructuring
usdTotal in useMemo
Score 10/15 13/15 14/15 14/15

Default export: A and B correctly exported this run; C and D produced the export-default hallucination (verbally correct, code wrong). N=1 variance.
usdTotal in useMemo: failed in all 4 configs — not in Quick Rules.

Eval 2 — architecture-zustand

Assertion Baseline Minimal refs Full refs With skill
{ isLoading: true, error: null } before try
normalizeError() from config/logger
logger.error(), not console.error
Error via Toast pattern (store sets error, component calls showToast)
No direct store mutations
create<StoreState>() with typed interface
Named export for store hook
Absolute imports
Score 7/8 8/8 8/8 8/8

Baseline missed normalizeError — wrote manual err.message extraction instead. All other configs used it correctly.

Eval 3 — performance-flatlist

Assertion Baseline Minimal refs Full refs With skill
keyExtractor stable ID
maxToRenderPerBatch ❌ † ❌ † ❌ †
removeClippedSubviews ❌ † ❌ † ❌ †
windowSize ❌ † ❌ † ❌ †
List item in React.memo()
No inline arrow functions in JSX
FastImage for remote icons
NativeWind className
useShallow for multi-field selectors
Zustand via selectors
renderItem in useCallback
Score 9/11 8/11 8/11 8/11

† B/C/D used FlashList for 50–200 items — the correct decision per the ">100 items → FlashList" guidance. windowSize, maxToRenderPerBatch, and removeClippedSubviews are FlatList-only props; the assertions penalise the correct architectural decision for any config that follows the guidance.

Baseline (A) used FlatList (no guidance, defaulted to the familiar component) and happened to include all three performance props — scoring higher on Eval 3 than the guided configs while missing useShallow and Zustand selectors.

Eval 4 — security-walletconnect

Assertion Baseline Minimal refs Full refs With skill
Checks and sets hasRespondedRef before responding
Blockaid: malicious/suspicious/scan-failed → warning, user decides; benign → proceed
Never trusts dApp display names for security
User-facing strings via t() / useAppTranslation
hasRespondedRef is a React useRef
Uses validation functions from walletKitValidation.ts
Validates chain matches active Stellar network
Score 4/7 6/7 6/7 6/7

Baseline auto-rejected malicious Blockaid results rather than letting the user decide. The Quick Rules state "user decides" explicitly; all guided configs handled this correctly.

walletKitValidation.ts: failed in all 4 configs. The rule exists in both Quick Rules and walletconnect.md but no concrete function example is given. Agents defaulted to inline XDR string validation or delegation to approveSessionRequest.


Post-benchmark doc updates (commit 0d9541b)

Three changes were committed after the Refs-only/With-skill run, addressing
findings from the report and the Blockaid doc conflict discovered during setup:

Change Addresses
SKILL.md — Quick Rules added Formalises the 13 primer rules into SKILL.md.
walletconnect.md — mandatory walletKitValidation.ts rule now bolded and
walletKitValidation.ts language required. No concrete per-function example yet.
error-handling.md — store/toast Adds code example for the store-sets-error /
pattern example added component-calls-showToast pattern.
SKILL.md — Blockaid fix Corrects Quick Rules to match code:
malicious → user decides (not auto-reject).

Outstanding from proposed path forward:

  • Items 1–3: not yet acted on (restructure, more iterations, FlashList evals)
  • Item 5 partially done: mandatory language added; no concrete function example
  • Items 4 and 6: new items added from the 4-config run above

@leofelix077
Copy link
Copy Markdown
Collaborator Author

@CassioMG unlike the extension, mobile doesnt seem to have much difference between skill and bare refs, so I moved them out of the skills folder and left only the agents.md file

Copy link
Copy Markdown
Contributor

@CassioMG CassioMG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small consistency finding from the final pass.

Comment thread docs/best-practices/walletconnect.md Outdated
```tsx
if (hasRespondedRef.current) return;
hasRespondedRef.current = true;
await walletKit.respondSessionRequest({ ... });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Comment 24 — Suggestion] Use the rejectSessionRequest helper here for consistency with error-handling.md

The hasRespondedRef anti-replay example here still uses the raw walletKit.respondSessionRequest({ ... }), but error-handling.md:165-168 (updated in commit 06ffd5cb) uses the rejectSessionRequest helper for the exact same pattern:

// error-handling.md
if (hasRespondedRef.current) return;
hasRespondedRef.current = true;
await rejectSessionRequest({ sessionRequest, message });

The helper was added in Comment 15 to keep callers from re-implementing the JSON-RPC error structure. Worth aligning walletconnect.md to the same pattern so both files show consistent guidance.

Suggested:

+import { rejectSessionRequest } from "helpers/walletKitUtil";
+
 if (hasRespondedRef.current) return;
 hasRespondedRef.current = true;
-await walletKit.respondSessionRequest({ ... });
+await rejectSessionRequest({ sessionRequest, message });

@CassioMG
Copy link
Copy Markdown
Contributor

CassioMG commented May 4, 2026

@CassioMG unlike the extension, mobile doesnt seem to have much difference between skill and bare refs, so I moved them out of the skills folder and left only the agents.md file

@leofelix077 removing the SKILL sounds good to me, thanks for running the benchmark.

I think the PR should be good to merge as soon as the 3 open comments are resolved (here, here and here).

Could you please update the PR title and description to reflect the current state of the PR now that it's not adding a SKILL? Thanks

@leofelix077 leofelix077 changed the title add v1 of freighter mobile best practices skill Add Freighter-mobile best practices LLM reference docs May 5, 2026
@leofelix077
Copy link
Copy Markdown
Collaborator Author

@CassioMG made the adjustments. will mege it then

@leofelix077 leofelix077 merged commit 8c82122 into main May 5, 2026
28 of 29 checks passed
@leofelix077 leofelix077 deleted the lf-add-freighter-mobile-best-practices-skill branch May 5, 2026 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants