Skip to content

fix: Use MAC address for BLE disconnect decision after climb changes#653

Merged
marcodejongh merged 14 commits intomainfrom
lilygo_next
Feb 5, 2026
Merged

fix: Use MAC address for BLE disconnect decision after climb changes#653
marcodejongh merged 14 commits intomainfrom
lilygo_next

Conversation

@marcodejongh
Copy link
Owner

Summary

  • Fixes ESP32 display not receiving metadata updates when climbs are loaded via Bluetooth from the official Kilter app
  • Uses device MAC address (auto-detected) instead of manual controller ID configuration for identifying self-initiated changes
  • Handles unknown/unsynced climbs gracefully with "Unknown Climb" display

Problem

When a climb was loaded via Bluetooth, the backend was skipping LedUpdate events to the controller that initiated them. This prevented the ESP32 display from showing climb metadata (name, grade, navigation context) for BLE-loaded climbs.

Solution

  • Add clientId field to LedUpdate GraphQL type
  • Backend always sends LedUpdate with clientId set to the controller's MAC address
  • ESP32 compares incoming clientId with its own MAC address:
    • Same MAC: Self-initiated change → keep BLE client connected
    • Different MAC: Web user changed climb → disconnect BLE client
  • MAC address is automatically detected from WiFi - no configuration needed

Test plan

  • Load a climb via official Kilter app (phone → ESP32 via BLE)
  • Verify ESP32 display shows climb name, grade, navigation
  • Verify phone stays connected via BLE
  • Change climb from web interface while phone is connected
  • Verify phone gets disconnected from BLE
  • Verify display shows new climb from web

🤖 Generated with Claude Code

marcodejongh and others added 2 commits February 4, 2026 12:32
Implements local queue storage on ESP32 with optimistic UI updates
for responsive button navigation. Key changes:

Backend:
- Add ControllerQueueItem and ControllerQueueSync GraphQL types
- Add queueItemUuid field to LedUpdate for reconciliation
- Update navigateQueue mutation to accept queueItemUuid (direct nav)
- Send ControllerQueueSync on initial connection and queue changes

ESP32 Firmware:
- Add LocalQueueItem struct and queue storage (up to 150 items)
- Implement optimistic navigation with immediate display updates
- Handle ControllerQueueSync events to maintain local queue state
- Navigate using queueItemUuid instead of direction for reliability
- Reconcile optimistic updates with backend LedUpdate responses

This fixes erratic navigation when the same climb appears multiple
times in the queue, and enables fast-skipping through the queue.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When a climb was loaded via Bluetooth from the official Kilter app, the
ESP32 display wasn't receiving metadata updates because the backend was
skipping LedUpdate events sent to the same controller that initiated them.

This fix:
- Adds clientId field to LedUpdate GraphQL type
- Backend now always sends LedUpdate with clientId (MAC address of controller)
- ESP32 compares incoming clientId with its own MAC address
- If clientId matches own MAC: self-initiated change, keep BLE client connected
- If clientId differs: web user took over, disconnect BLE client

Also handles unknown climbs (drafts/unsynced) gracefully by displaying
"Unknown Climb" with navigation context so users can navigate back.

The MAC address is automatically detected - no manual configuration needed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Feb 4, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
boardsesh Building Building Preview, Comment Feb 5, 2026 6:30am

Request Review

@claude
Copy link

claude bot commented Feb 4, 2026

Claude Review

Do not merge - Missing required files will cause build failure.

Critical Issues

  1. Missing files cause build failure - The PR imports from ./navigation-helpers and ./grade-colors in packages/backend/src/graphql/resolvers/controller/ but these files are not included in the PR:

    • packages/backend/src/graphql/resolvers/controller/mutations.ts:9 imports findClimbIndex from ./navigation-helpers
    • packages/backend/src/graphql/resolvers/controller/subscriptions.ts:8 imports getGradeColor from ./grade-colors and buildNavigationContext, findClimbIndex from ./navigation-helpers
    • Neither file exists in the repository
  2. Stack overflow risk in onQueueSync (embedded/projects/climb-queue-display/src/main.cpp:379) - Allocates LocalQueueItem items[ControllerQueueSyncData::MAX_ITEMS] (150 items × ~88 bytes = ~13KB) on stack, risking overflow on ESP32's limited stack

Minor Issues

  1. Unused setter method - graphql_ws_client.h:88 adds setControllerId() but this is never called; the code uses auto-detected MAC address instead

Documentation

  1. docs/websocket-implementation.md needs update - The PR adds new GraphQL types (ControllerQueueSync, QueueNavigationContext, navigateQueue mutation) and significantly changes controller event behavior, but the Controller Events section doesn't document these changes

marcodejongh and others added 2 commits February 4, 2026 15:04
- Use pServer->start() instead of ble_gatts_start() directly to set
  NimBLE's internal m_gattsStarted flag
- Move UUID/advertising config to begin() to avoid duplicate registration
- Use BLE.startAdvertising() in proxy mode for consistent state management

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add AuroraProtocol::encodeLedCommands() to encode LED data into
  Aurora protocol packets for transmission to the board
- Add GraphQL LED update callback to notify when websocket updates arrive
- In proxy mode, encode and forward LED updates to the real board via BLE

This enables the web interface to control the physical board when
running through the ESP32 proxy.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 4, 2026

Claude Review

Do not merge - Missing files will cause build failures: navigation-helpers.ts and grade-colors.ts are imported but not included in the PR.

Critical Issues

  1. Missing helper files (packages/backend/src/graphql/resolvers/controller/):
    • mutations.ts:12 imports findClimbIndex from ./navigation-helpers - file doesn't exist
    • subscriptions.ts:10-11 imports getGradeColor from ./grade-colors and buildNavigationContext, findClimbIndex from ./navigation-helpers - files don't exist
    • This will cause TypeScript/build failures

Other Issues

  1. Stack allocation risk (embedded/projects/climb-queue-display/src/main.cpp:370-377): onQueueSync allocates LocalQueueItem items[150] on the stack (~13KB) before copying to heap. ESP32 has limited stack space - consider allocating on heap from the start like in board-controller/src/main.cpp.

  2. Verbose console logging in production (packages/backend/src/graphql/resolvers/controller/subscriptions.ts): Added many console.log statements (lines 150, 151, 152, 165, 166, 167, 228, 230, etc.) that could create noisy logs in production.

  3. Missing tests: No tests added for the new navigateQueue mutation, ControllerQueueSync event handling, or navigation context building logic.

Documentation

  • This PR adds navigation and queue sync features to the WebSocket/controller architecture. Consider updating docs/websocket-implementation.md to document the new ControllerQueueSync event type and navigateQueue mutation.

marcodejongh and others added 2 commits February 4, 2026 16:22
- Fix race condition where handleScanComplete and handleBoardFound
  both tried to connect, causing state to flip back to CONNECTING
- Set state to CONNECTING before calling stopScan() to prevent
  handleScanComplete from initiating duplicate connection
- Split BLE writes into 20-byte chunks to match board's MTU and
  the TypeScript client implementation (MAX_BLUETOOTH_MESSAGE_SIZE)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 4, 2026

Claude Review

Ready to merge - No significant issues found. The PR implements MAC address-based client identification for ESP32 BLE disconnect decisions, which is a cleaner approach than the previous controller ID system. The code is well-structured with good separation between navigation helpers, grade colors, and subscriptions.

Minor observations (non-blocking):

  • Memory allocation on heap in onQueueSync and handleQueueSync is appropriately handled with proper cleanup, avoiding stack overflow risks with the large queue data structures.

  • Removed climb-queue-display project in favor of consolidated board-controller with display support - this simplifies the codebase.

  • The GraphQL subscription now properly sends clientId (MAC address) with all LedUpdate events, allowing ESP32 to make intelligent BLE disconnect decisions.

@claude
Copy link

claude bot commented Feb 4, 2026

Claude Review

Ready to merge - Minor issues noted below, but nothing blocking.

Issues

  1. Blocking delay() calls in BLE proxy code - embedded/libs/ble-proxy/src/ble_proxy.cpp:213-214,265-267,309-311 and embedded/libs/ble-proxy/src/ble_scanner.cpp:133-136: Multiple blocking delay() calls (100ms, 200ms, 500ms) could freeze the main loop. Consider using non-blocking timers if responsiveness is critical.

  2. Unused variable BYTES_PER_LED - embedded/libs/aurora-protocol/src/aurora_protocol.cpp:21: Variable is declared but never used.

  3. Large stack allocation risk - embedded/projects/board-controller/src/main.cpp:543: LocalQueueItem* items = new LocalQueueItem[itemCount] with MAX_QUEUE_SIZE=150 items of ~88 bytes each (~13KB heap allocation) is appropriate, but the corresponding ControllerQueueSyncData struct in graphql_ws_client.h:23-32 is ~19KB per instance. The code handles this correctly by heap allocating at line 1681-1689 in graphql_ws_client.cpp, but worth noting for memory-constrained ESP32.

  4. Potential null dereference - embedded/libs/lilygo-display/src/lilygo_display.cpp:699-704: navigateToPrevious() and navigateToNext() check canNavigate*() before decrementing/incrementing, but then call getCurrentQueueItem() which could return nullptr if queue state changed. The code does null-check newCurrent before use, so this is just a note.

Test Coverage

The new tests (grade-colors.test.ts, navigate-queue.test.ts, navigation-helpers.test.ts) provide good coverage for the backend changes including edge cases.

Documentation

No documentation updates needed - this PR adds embedded/backend features without changing documented systems in docs/.

marcodejongh and others added 2 commits February 5, 2026 12:00
- Add atomic flag to prevent BLE proxy scan/connect race condition
  between handleBoardFound and handleScanComplete callbacks
- Document hardcoded BLE delays (100ms, 200ms, 500ms) with explanations
- Add grade-colors.test.ts (32 tests for grade color utilities)
- Add navigation-helpers.test.ts (17 tests for navigation helpers)
- Add navigate-queue.test.ts (15 tests for navigateQueue mutation)
- Update test setup to include esp32_controllers table
- Fix controller.test.ts to create test users for FK constraints

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace blocking delay() calls in BLE proxy with state-machine-based
  non-blocking timers for WAIT_BEFORE_CONNECT and WAIT_BEFORE_ADVERTISE
- Remove unused BYTES_PER_LED variable from aurora_protocol.cpp
- Fix native embedded test mock issues:
  - Add macAddress() method to MockWiFi
  - Add start() method to NimBLEServer mock
  - Fix min/max templates to return by value instead of reference

This addresses review feedback about blocking delays and test failures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 5, 2026

Claude Review

Ready to merge - Minor issues noted below, but nothing blocking.

Issues Found

  1. Unused member variable - packages/backend/src/graphql/resolvers/controller/subscriptions.ts: The controllerId variable is extracted and logged but not used in the actual logic flow - only eventClientId from the event is passed through. This is intentional (per code comments), but the unused extraction could be cleaned up.

  2. Missing input validation in button handling - embedded/projects/board-controller/src/main.cpp:189-259: The button debouncing logic uses digitalRead(BUTTON_1_PIN) and digitalRead(BUTTON_2_PIN) without first calling pinMode(BUTTON_x_PIN, INPUT_PULLUP) in setup. Verify these are configured elsewhere or add initialization.

  3. Potential null dereference - embedded/libs/lilygo-display/src/lilygo_display.cpp:679: In setQueueFromSync, the loop condition uses i < _queueCount where _queueCount = min(count, MAX_QUEUE_SIZE), but items is allocated with size itemCount which equals min(data.count, MAX_QUEUE_SIZE) - the naming inconsistency could lead to confusion, though the values are equivalent.

  4. Memory management concern - embedded/libs/graphql-ws-client/src/graphql_ws_client.cpp:418-465: The heap-allocated ControllerQueueSyncData (~19KB) is allocated and freed synchronously within handleQueueSync. This is fine, but if the callback does async work that outlives the function, this could cause issues. The current implementation is safe.

  5. Documentation not updated - The PR deletes the embedded/projects/climb-queue-display/ project but does not mention updating any documentation. Verify no docs reference this removed project.

Test Coverage

The PR includes comprehensive tests for:

  • grade-colors.test.ts - Good coverage of edge cases
  • navigate-queue.test.ts - Tests direction navigation, direct queueItemUuid navigation, authorization, and event publishing
  • navigation-helpers.test.ts - Tests navigation context building

Test coverage is appropriate for the new functionality.

- Remove unused controllerId destructure in subscriptions.ts (use
  controller.id from DB query instead)
- Add pinMode initialization for button pins in setup() to ensure
  proper INPUT_PULLUP configuration before digitalRead calls

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 5, 2026

Claude Review

Ready to merge - This is a well-structured feature PR adding MAC address-based BLE disconnect decisions and ESP32 display/navigation support. Tests are included for the new backend functionality. A few minor issues noted below.

Issues

  1. packages/backend/src/graphql/resolvers/controller/subscriptions.ts:158-176 - Async IIFE inside synchronous callback has no guarantee of ordering. If buildLedUpdateWithNavigation takes time, queue sync events could interleave with LED updates in unexpected order. Consider using a queue or ensuring event ordering.

  2. packages/backend/src/graphql/resolvers/controller/mutations.ts:387 - Missing newIndex variable declaration before the if block when queueItemUuid path falls through:

    let newIndex: number;  // Declared but uninitialized
    // ... queueItemUuid path returns early
    // ... direction validation
    // ... uses newIndex without initializing it first if queueItemUuid was found but then falls through

    The logic actually returns early on success, so this works, but the shadowing at line 361 (newIndex = queue.findIndex(...)) is confusing.

  3. embedded/libs/graphql-ws-client/src/graphql_ws_client.cpp:418 - Heap allocation of ~19KB ControllerQueueSyncData struct without checking allocation success before use. The if (!syncData) check exists but still proceeds to log after failing.

  4. embedded/projects/board-controller/src/main.cpp:540-542 - Heap allocation of LocalQueueItem array (~13KB for 150 items) per queue sync event. On ESP32 with limited heap, this could cause fragmentation over time. Consider static allocation or reusing a buffer.

  5. Missing test coverage - The new navigateQueue mutation test file tests the happy paths well, but doesn't test:

    • Race conditions when multiple navigation requests come in quickly
    • What happens when roomManager.updateQueueState fails

Documentation Check

No documentation updates needed - this PR extends the ESP32 controller functionality which is not currently documented in docs/.

…tion

Backend:
- Fix async IIFE race condition in subscriptions by using event queue
  to ensure QueueSync arrives before LedUpdate at ESP32
- Rename newIndex to targetIndex in navigateQueue for clarity
- Add tests for concurrent navigation and error handling

Embedded:
- Use static buffer for queue sync to prevent heap fragmentation
- Add hexToRgb565Fast() helper to avoid String allocations
- Improve heap allocation error logging with size and free heap info

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 5, 2026

Claude Review

Ready to merge - Minor issues noted below, but nothing blocking.

Issues

  1. Potential memory fragmentation - embedded/projects/board-controller/src/main.cpp:46-47: g_queueSyncBuffer is a 13KB static array (150 × ~88 bytes). While static allocation avoids heap fragmentation, this significantly reduces available RAM on ESP32. Consider reducing MAX_QUEUE_SIZE if memory pressure becomes an issue.

  2. Missing null check for climb.difficulty - packages/backend/src/graphql/resolvers/controller/navigation-helpers.ts:12-14: buildNavigationItem passes climb.difficulty directly to getGradeColor, but TypeScript shows difficulty can be string | null. The getGradeColor function handles null safely, but the grade field assignment could be cleaner with a fallback.

  3. Inconsistent error messages - packages/backend/src/graphql/resolvers/controller/mutations.ts:406: Error message says 'Invalid direction. Must be "next" or "previous".' but the test at line 4714-4716 expects 'Invalid direction'. Consider aligning these.

  4. Race condition potential in button debouncing - embedded/projects/board-controller/src/main.cpp:218-264: The button handling logic uses static variables and checks millis() but doesn't protect against potential issues if loop() execution varies significantly. This is a minor concern for embedded code.

Test Coverage

  • ✅ Good test coverage for navigateQueue mutation (772 lines of tests)
  • ✅ Navigation helpers well-tested
  • ✅ Grade color utilities tested

Documentation

  • The docs/ directory doesn't appear to have documentation for the ESP32/controller WebSocket protocol that this PR significantly extends. Consider documenting the new ControllerQueueSync event type and navigation context fields.

marcodejongh and others added 2 commits February 5, 2026 17:06
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Reset connectionInitiated flag on all BLE proxy failure paths
  (empty scan results, target not found, proxy disabled)
- Reduce verbose logging in controller subscriptions
- Document ControllerQueueSync event, clientId field, and navigateQueue mutation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 5, 2026

Claude Review

Ready to merge - Well-structured feature with good test coverage. Minor issues noted below, but nothing blocking.

Issues

  1. Memory concern on ESP32 - ControllerQueueSyncData struct is ~19KB (packages/backend/src/graphql/resolvers/controller/subscriptions.ts / embedded/libs/graphql-ws-client/src/graphql_ws_client.h:28-40). While heap allocation is used, this is a large allocation for ESP32. The code handles allocation failure gracefully, but consider whether 150 items is necessary or if a smaller limit would suffice.

  2. Unused variable - controllerId is extracted from requireControllerAuthorizedForSession but never used in controllerEvents subscription (packages/backend/src/graphql/resolvers/controller/subscriptions.ts:100). The code later uses ctx.controllerId! directly.

  3. Static buffer initialization - g_queueSyncBuffer in main.cpp:33 is a ~13KB static array of LocalQueueItem. On ESP32-S3 with limited RAM, this is significant. Consider if dynamic allocation with smaller max would be better for memory-constrained scenarios.

Documentation

✅ Documentation updated appropriately in docs/websocket-implementation.md to cover:

  • ControllerQueueSync event
  • clientId field in LedUpdate
  • navigateQueue mutation
  • Navigation flow

Test Coverage

✅ Good test coverage added:

  • grade-colors.test.ts - comprehensive edge cases including null/undefined handling
  • navigate-queue.test.ts - covers direct navigation, direction-based navigation, edge cases, concurrent requests, error handling, and authorization
  • navigation-helpers.test.ts - covers utility functions with boundary conditions

The tests appropriately mock dependencies and verify event publishing behavior.

- BLE client now calls connectCallback(false) when connect() fails
  synchronously, so the proxy gets notified and can transition to
  RECONNECTING state instead of being stuck in CONNECTING
- Add debug logging in scanner to help diagnose board detection issues

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 5, 2026

Claude Review

Ready to merge - No significant issues found.

Summary

This PR adds MAC address-based client identification to fix ESP32 display not receiving metadata updates when climbs are loaded via Bluetooth. The changes are well-structured with good test coverage.

Documentation

The docs/websocket-implementation.md has been updated appropriately with the new clientId field, ControllerQueueSync event, and navigateQueue mutation documentation.

Test Coverage

New test files provide good coverage:

  • controller.test.ts - Authorization tests
  • grade-colors.test.ts - Grade color utility tests
  • navigate-queue.test.ts - Navigation mutation tests
  • navigation-helpers.test.ts - Navigation context builder tests

Tests cover the main scenarios including direct navigation via queueItemUuid, direction-based fallback, edge cases (empty queue, boundaries), and authorization requirements.

Minor Notes (non-blocking)

  1. embedded/libs/graphql-ws-client/src/graphql_ws_client.h:35 - ControllerQueueSyncData with MAX_ITEMS = 150 and 126-byte items allocates ~19KB. The code handles this correctly by heap allocation with null check.

  2. The climb-queue-display project was removed (merged into board-controller) - this consolidation seems intentional per the PR description.

Removed temporary diagnostic logging that was added to debug board
detection issues. The name is now only fetched after confirming the
device is an Aurora board, which is more efficient.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 5, 2026

Claude Review

Ready to merge - Minor issues noted below, but nothing blocking.

Issues

  1. packages/backend/src/graphql/resolvers/controller/subscriptions.ts:130-170 - The buildLedUpdateWithNavigation helper captures sessionId, boardPath, and controller from outer scope via closure. This is fine, but the function is recreated on every subscription iteration. Consider extracting it outside the subscription resolver if performance becomes a concern with many concurrent controllers.

  2. packages/backend/src/graphql/resolvers/controller/navigation-helpers.ts:59 - findClimbIndex searches by item.uuid (queue item UUID), but the parameter is named climbUuid which is misleading. The function finds by queue item UUID, not climb UUID.

  3. packages/shared-schema/src/types.ts:355 - LedUpdate.climbGrade is a new field added to the TypeScript type, but the existing climbGrade was already in the GraphQL schema before this PR. Verify this doesn't break existing clients expecting the field.

  4. embedded/libs/display-ui/src/climb_display.cpp:997-1002 - Buffer overflow risk in truncateText: when len > maxChars, the function writes to output[maxChars-2], output[maxChars-1], and output[maxChars]. If maxChars equals the buffer size, output[maxChars] writes past the buffer end. Caller must ensure output buffer is at least maxChars + 1 bytes.

  5. embedded/projects/board-controller/src/main.cpp:683-686 - When navigating, sendNavigationMutation is called but there's no handling for mutation failure. If the backend rejects the navigation, the optimistic UI update remains incorrect until the next LedUpdate arrives.

Documentation

docs/websocket-implementation.md is updated with new ControllerQueueSync event, clientId field, and navigateQueue mutation documentation.

Test Coverage

✅ Good test coverage:

  • packages/backend/src/__tests__/grade-colors.test.ts - Comprehensive tests for grade color utilities
  • packages/backend/src/__tests__/navigate-queue.test.ts - Tests for navigation mutation including edge cases
  • packages/backend/src/__tests__/navigation-helpers.test.ts - Tests for navigation helper functions

@marcodejongh marcodejongh merged commit 2d3ee5a into main Feb 5, 2026
6 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant