Skip to content

Conversation

@TomTasche
Copy link
Member

https://play.google.com/console/u/1/developers/6896129017819210714/app/4976319305580244614/vitals/crashes/9935fddcf820135a21dd37aeca47efaa/details?days=28&versionCode=196&isUserPerceived=true
https://play.google.com/console/u/1/developers/6896129017819210714/app/4976319305580244614/vitals/crashes/8fe43e04d3fc37780aa3c19118a69b8f/details?days=28&versionCode=196&isUserPerceived=true
https://play.google.com/console/u/1/developers/6896129017819210714/app/4976319305580244614/vitals/crashes/b44641c271c987dd91db9b46b0fcd799/details?days=28&versionCode=196&isUserPerceived=true

This PR is not meant to be merged, but could serve as inspiration for real fixes... Since we can't reproduce the crashes ourselves, it's hard to verify how good these changes are. According to Claude, all crashes are related to shutdown scenarios - meaning they wouldn't actually be that bad for users?

Summary

Fixes multiple SIGSEGV crashes in the httplib HTTP server on Android, affecting ~950+ users. All crashes were related to use-after-free and race conditions during server shutdown.

Crash Reports Fixed

  1. SIGSEGV in httplib::Server::write_response_core - Crash during HTTP response writing
  2. SIGSEGV in httplib::Server::process_request (__assign_multi) - Crash during header map operations
  3. SIGSEGV in __construct_node - Crash during map node construction

All crashes occurred in httplib's thread pool workers on various Android devices (Android 10-16, Samsung, Motorola, etc.).

Root Causes & Fixes

1. Chunked Transfer Encoding Issues (331d9fc)

Problem: Using ContentProviderWithoutLength for streaming responses caused crashes when:

  • Client disconnects mid-transfer
  • Exceptions thrown during content generation
  • Server stops while requests in-flight

Fix: Buffer content to std::ostringstream first, then use res.set_content() for Content-Length based responses.

2. Missing Shutdown Synchronization (b919561)

Problem: No proper lifecycle management - destructor didn't stop server, exceptions weren't caught.

Fix:

  • Added destructor that ensures server is stopped
  • Added std::atomic<bool> m_stopping flag - handlers return 503 during shutdown
  • Set up httplib::set_exception_handler() to catch internal exceptions
  • Changed lambda captures from [&] to [this] for clarity

3. C++ Member Destruction Order (11f49f0) — Root Cause

Problem: Members were declared in wrong order:

httplib::Server m_server;           // destroyed 4th
std::unordered_map<...> m_content;  // destroyed 1st ← CRASH!

C++ destroys members in reverse declaration order. m_content was destroyed before m_server, but m_server's destructor is what joins thread pool threads. Threads were accessing freed m_content.

Fix:

  • Changed m_server to std::unique_ptrhttplib::Server for explicit destruction control
  • Destructor explicitly destroys server via reset() before other members
  • Reordered member declarations: m_server declared last (destroyed first)

This fixes a crash that was affecting ~950 users with the following
root causes:

1. Chunked transfer encoding issues: Replace ContentProviderWithoutLength
   with buffered content using set_content(). The streaming content
   provider could crash when:
   - Client disconnects during transfer
   - Exceptions thrown during content generation
   - Server stopped while requests in-flight

2. Unhandled exceptions: Add try-catch blocks around request handling
   to gracefully handle exceptions instead of letting them propagate
   into httplib's internals where they caused SIGSEGV.

3. Race condition in stop(): Reorder operations to call m_server.stop()
   before clear() so that in-flight requests complete before resources
   are invalidated.

The crash was reported as SIGSEGV in
httplib::Server::write_response_core on various Android devices and
versions (Android 10-16, Samsung, Motorola devices).
Additional fixes for Android crashes in httplib HTTP server:

1. Add destructor to Impl class: Ensures the server is properly stopped
   before the Impl object is destroyed. This prevents worker threads
   from accessing freed memory (use-after-free crash).

2. Add atomic stopping flag: Handlers check this flag and return 503
   Service Unavailable during shutdown, preventing new work from
   starting while resources are being freed.

3. Set up httplib exception handler: Catches any internal httplib
   exceptions and returns HTTP 500 instead of crashing.

4. Change lambda captures from [&] to [this]: More explicit about what's
   captured, making the code's intent clearer.

5. Delete copy constructor/assignment: Prevents unsafe copying since
   lambdas capture 'this' pointer.

The second crash was occurring in std::__tree::__assign_multi during
httplib::Server::process_request, caused by accessing freed memory when
the Impl object was destroyed while worker threads were still running.
Root cause: C++ member destruction order was causing use-after-free.
The m_content map was being destroyed BEFORE m_server, but m_server's
destructor is what joins the thread pool threads. This meant worker
threads could still be accessing m_content after it was destroyed.

Crashes were occurring in httplib's internal map operations
(__construct_node, __assign_multi) during process_request because
threads were accessing freed memory.

Fix:
1. Changed m_server to unique_ptr<httplib::Server> to enable explicit
   destruction timing control.

2. In destructor, explicitly destroy server via m_server.reset() BEFORE
   other members are destroyed. This ensures thread pool threads are
   fully joined first.

3. In stop(), also destroy the server to ensure threads are joined
   before clearing content.

4. Reordered member declarations: m_server is now declared LAST so if
   we miss any explicit destruction, the natural destruction order will
   still destroy it first (reverse of declaration order).

This is a critical fix for the SIGSEGV crashes in httplib's internal
map operations during request processing on Android.
@andiwand
Copy link
Member

@TomTasche looks reasonable to me. potentially I would cleanup the LLM bloat a bit we could merge it as is. I saw multiple lifetime issues with that part of the code so I can imagine there are more to fix

@andiwand andiwand marked this pull request as ready for review November 30, 2025 14:06
@andiwand andiwand merged commit 71cb002 into main Nov 30, 2025
16 checks passed
@andiwand andiwand deleted the claude/fix-android-cpp-crashes-015CGZwhKSrNPARVHoGJFT7L branch November 30, 2025 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants