💡 On-device voice transcription for offline zero-tap inspections #346

2026-06-26T10:19:19Z

github-actions[bot]
Bot Jun 26, 2026

Summary

Integrate an on-device speech-to-text engine (e.g., Whisper.cpp or expo-speech-recognition with on-device models) so the zero-tap voice inspection workflow functions without cellular connectivity. The current architecture routes all voice processing through Gemini STT via Vertex AI API — a cloud dependency that breaks Broodly's core differentiator in the exact rural/field conditions where beekeepers operate. A hybrid approach (on-device transcription + cloud NLU when connected) preserves offline resilience while leveraging cloud intelligence for structured data extraction.

Market Signal

HiveSense already ships on-device Whisper AI for offline voice transcription with BLE sensor integration. HIVESOUND offers multi-colony continuous voice sessions (directly competing with Broodly's claimed core differentiator). BeeKeeperVoice and APiLOG both offer voice-driven inspection logging with AI-powered data extraction. Voice-first beekeeping is now table stakes — at least 4 competitors ship it — and offline voice processing is the new competitive frontier. The broader agritech UX trend in 2026 confirms voice-enabled, hands-free field interfaces as a baseline expectation.

User Signal

PRD Technical Constraints explicitly require "Field workflows must tolerate low-connectivity conditions" and "Logging must be fast and multimodal, with voice-first entry." The architecture's cloud-only STT (Gemini via Vertex AI API) creates a direct contradiction with these requirements. Beeyards are frequently in rural areas with limited or no cellular coverage — the exact conditions where Broodly's zero-tap promise matters most.

Technical Opportunity

Expo supports expo-speech-recognition for on-device STT. Whisper.cpp has React Native bindings (whisper.rn). The architecture already plans local media staging (audio captured locally, uploaded when connected) — adding local transcription is a natural extension of this pattern. On-device models handle raw transcription while cloud Gemini handles higher-order NLU (entity extraction, structured observation mapping, recommendation triggering) when connectivity returns. The async worker service architecture (Cloud Run triggered by Pub/Sub) already supports this deferred-processing pattern.

Assessment

Dimension	Score	Rationale
Feasibility	med	On-device STT libraries exist for React Native but Expo managed workflow compatibility needs validation. Model size (40-150MB) requires progressive download strategy.
Impact	high	Directly enables Broodly's core differentiator (zero-tap) in the environments where it matters most. Without this, the product promise is broken for a significant portion of the target market.
Urgency	high	HiveSense already ships this capability. Every month of delay is a month where a competitor offers what Broodly promises but can't deliver. Architecture decisions made now (cloud-only STT) will be expensive to retrofit.

Adversarial Review

Strongest objection: On-device Whisper models add 40-150MB to app size, React Native/Expo support for on-device ML is still maturing, and maintaining two STT paths (on-device + cloud) increases system complexity and testing surface.

Rebuttal: HiveSense already ships on-device Whisper successfully in a React Native app, proving the approach is production-viable. Model download can be optional and progressive (download on first use, not at install). The alternative — a "zero-tap" app that requires cellular connectivity to function — is a product-breaking contradiction that no amount of marketing can paper over. Beeyards ARE the low-connectivity environment. The dual-path complexity is manageable because the on-device path only needs to produce raw transcription text; all structured interpretation happens in the existing cloud pipeline.

Suggested Next Step

Spike: evaluate expo-speech-recognition on-device mode and whisper.rn library for Expo managed workflow compatibility. Measure model size options (tiny/base/small), transcription accuracy on beekeeping-specific vocabulary, and cold-start latency on mid-range iOS/Android devices.

Generated by weekly feature ideation workflow — 2026-06-26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

💡 On-device voice transcription for offline zero-tap inspections #346

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

💡 On-device voice transcription for offline zero-tap inspections #346

Uh oh!

github-actions[bot] Bot Jun 26, 2026

Summary

Market Signal

User Signal

Technical Opportunity

Assessment

Adversarial Review

Suggested Next Step

Replies: 0 comments

github-actions[bot]
Bot Jun 26, 2026