Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android crash strlen+16 signal 11 (SIGSEGV), code 1 (SEGV_MAPERR) #3267

Closed
tri-bao opened this issue Apr 17, 2022 · 14 comments
Closed

Android crash strlen+16 signal 11 (SIGSEGV), code 1 (SEGV_MAPERR) #3267

tri-bao opened this issue Apr 17, 2022 · 14 comments
Assignees
Labels
legacy:face mesh Issues related to Face Mesh platform:android Issues with Android as Platform type:bug Bug in the Source Code of MediaPipe Solution

Comments

@tri-bao
Copy link

tri-bao commented Apr 17, 2022

This is same as #2547 but I couldn't reopen it for discussion. Thus, I created a new one.
Same as this one #2746

Our android app based on the iris_tracking_gpu.pbtxt graph. When getting pushed to Play Stored, we receive the following crash on Samsung Galaxy Note5 which runs Android 7.0 (SDK 24).

backtrace:
  #00  pc 000000000001b5f8  /system/lib64/libc.so (strlen+16)
  #00  pc 0000000000aa0afc  /data/app/xyz-1/split_config.arm64_v8a.apk!lib/arm64-v8a/libiris_jni.so (offset 0x1000) (mediapipe::PathToResourceAsFile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&))
  #00  pc 000000000054468c  /data/app/xyz-1/split_config.arm64_v8a.apk!lib/arm64-v8a/libiris_jni.so (offset 0x1000) (mediapipe::LocalFileContentsCalculator::Open(mediapipe::CalculatorContext*))
  #00  pc 0000000000b1126c  /data/app/xyz-1/split_config.arm64_v8a.apk!lib/arm64-v8a/libiris_jni.so (offset 0x1000) (mediapipe::CalculatorNode::OpenNode())
  #00  pc 0000000000afb450  /data/app/xyz-1/split_config.arm64_v8a.apk!lib/arm64-v8a/libiris_jni.so (offset 0x1000) (mediapipe::internal::SchedulerQueue::OpenCalculatorNode(mediapipe::CalculatorNode*))
  #00  pc 0000000000afb1ec  /data/app/xyz-1/split_config.arm64_v8a.apk!lib/arm64-v8a/libiris_jni.so (offset 0x1000) (mediapipe::internal::SchedulerQueue::RunNextTask())
  #00  pc 0000000000b31fd0  /data/app/xyz-1/split_config.arm64_v8a.apk!lib/arm64-v8a/libiris_jni.so (offset 0x1000) (mediapipe::ThreadPool::RunWorker())
  #00  pc 0000000000b31c20  /data/app/xyz-1/split_config.arm64_v8a.apk!lib/arm64-v8a/libiris_jni.so (offset 0x1000) (mediapipe::ThreadPool::WorkerThread::ThreadBody(void*))
  #00  pc 0000000000068258  /system/lib64/libc.so (__pthread_start(void*)+196)
  #00  pc 000000000001dc00  /system/lib64/libc.so (__start_thread+16)

We also used that phone during development and didn't face such crash. It does not always crash because after it crashed, closed the app and reopened it the it worked!

According to https://source.android.com/devices/tech/debug/native-crash#nullpointer, this is just a pure null pointer dereference.

Tracing the source code and all the involved graph files, I suspect the argument path in the following method of file resource_util_default.cc

absl::StatusOr<std::string> PathToResourceAsFile(const std::string& path) {
  return JoinPath(absl::GetFlag(FLAGS_resource_root_dir), path);
}

That path is provided via a side package input in declared in the graph face_landmarks_model_loader.pbtxt

node {
  calculator: "SwitchContainer"
  input_side_packet: "ENABLE:with_attention"
  output_side_packet: "PACKET:model_path"
  options: {
    [mediapipe.SwitchContainerOptions.ext] {
      contained_node: {
        calculator: "ConstantSidePacketCalculator"
        options: {
          [mediapipe.ConstantSidePacketCalculatorOptions.ext]: {
            packet {
              string_value: "mediapipe/modules/face_landmark/face_landmark.tflite"
            }
          }
        }
      }
      contained_node: {
        calculator: "ConstantSidePacketCalculator"
        options: {
          [mediapipe.ConstantSidePacketCalculatorOptions.ext]: {
            packet {
              string_value: "mediapipe/modules/face_landmark/face_landmark_with_attention.tflite"
            }
          }
        }
      }
    }
  }
}

# Loads the file in the specified path into a blob.
node {
  calculator: "LocalFileContentsCalculator"
  input_side_packet: "FILE_PATH:model_path"
  output_side_packet: "CONTENTS:model_blob"
}

Is that possible, under a special condition, the output_side_packet: "PACKET:model_path" if the SwitchContainer emit a null path to the input_side_packet: "FILE_PATH:model_path" of LocalFileContentsCalculator? As I said earlier, it doesn't always happen, on the same phone that crashed, relaunching the app did not crash

@tri-bao tri-bao added the type:bug Bug in the Source Code of MediaPipe Solution label Apr 17, 2022
@sureshdagooglecom sureshdagooglecom added platform:android Issues with Android as Platform legacy:face mesh Issues related to Face Mesh labels Apr 18, 2022
@sureshdagooglecom
Copy link

Hi @tri-bao,
Could you share steps to reproduce this issue.

@sureshdagooglecom sureshdagooglecom added the stat:awaiting response Waiting for user response label Apr 18, 2022
@tri-bao
Copy link
Author

tri-bao commented Apr 18, 2022

Hi @sureshdagooglecom,

When it happened, my app was uploaded to google Play Store. The app bases on an AAR that was built using the graph iris_tracking_gpu.pbtxt with the following bazel options:

          --compilation_mode=opt \
          --spawn_strategy=local \
          --strip=ALWAYS \
          --host_crosstool_top=@bazel_tools//tools/cpp:toolchain \
          --fat_apk_cpu=armeabi-v7a,arm64-v8a

As I mentioned earlier, this crash does not always occur. After it crashed, I sent the crash log to Play Store as you can see in my origin comment, I killed the app and reopened it, then it doesn't crash any more. Like the person in #2547 It happens once and not again on the same phone.

I wonder if is there any chance at a very special moment that the SwitchContainer emit the model_path side packet out-of-sync and it will be null when the LocalFileContentsCalculator retrieve it.

For my app, since we never use the face_landmark_with_attention.tflite model (mainly due to this issue #2935), to be safer for production app, I decided to replace the whole face_landmarks_model_loader subgraph with a fixed model path, something like following in the face_landmark_gpu.pbtxt

node {
  calculator: "InferenceCalculator"
  input_stream: "TENSORS:input_tensors"

#  input_side_packet: "MODEL:model"

  input_side_packet: "CUSTOM_OP_RESOLVER:op_resolver"
  output_stream: "TENSORS:output_tensors"
  options: {
    [mediapipe.InferenceCalculatorOptions.ext] {
      # Do not remove. Used for generation of XNNPACK/NNAPI graphs.

      model_path: "mediapipe/modules/face_landmark/face_landmark.tflite"
    }
  }
}

@sureshdagooglecom
Copy link

Hi @tri-bao ,
It does look like a sync problem, could you please share complete error logs.

@sureshdagooglecom sureshdagooglecom added stat:awaiting response Waiting for user response and removed stat:awaiting response Waiting for user response labels Apr 22, 2022
@tri-bao
Copy link
Author

tri-bao commented Apr 23, 2022

hi @sureshdagooglecom ,
I only have the backtrace that I've already posted on the initial comment of this ticket. That is what I caught from Google Play Console. If you mean the whole console log of Android Studio, I don't have it as we have never seen that error during development.

@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

@tri-bao
Copy link
Author

tri-bao commented Apr 30, 2022

please don't close, bot :)

@google-ml-butler google-ml-butler bot removed stale stat:awaiting response Waiting for user response labels Apr 30, 2022
@sureshdagooglecom sureshdagooglecom added the stat:awaiting googler Waiting for Google Engineer's Response label May 20, 2022
@sureshdagooglecom
Copy link

Hi @tri-bao ,
Replace the SwitchContainer and ConstantPacketCalculator with a simpler file path,
see if the crash can still happen.
Suggest: accept "FILE_PATH" side packet as an input side packet into the graph.

@sureshdagooglecom sureshdagooglecom added stat:awaiting response Waiting for user response and removed stat:awaiting googler Waiting for Google Engineer's Response labels Jun 6, 2022
@tri-bao
Copy link
Author

tri-bao commented Jun 6, 2022

as I mentioned in my previous comment, that was what I did #3267 (comment)

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Waiting for user response label Jun 6, 2022
@sureshdagooglecom sureshdagooglecom added the stat:awaiting googler Waiting for Google Engineer's Response label Jun 13, 2022
@blackchen20004
Copy link

I also met this issue. Reproduce rate less than 1/100.

pid: 3345, tid: 3621, name: mediapipe/3621 >>> com.blake.aircmd <<<
uid: 10326
x0 0000006fe1b38f40 x1 0000000000000080 x2 0000000000000000 x3 0000000000000000
x4 feff303135322e64 x5 feff303135322e64 x6 feff303135322e64 x7 7f7f7f7f7f7f7f7f
x8 0000000000000062 x9 0000000000000000 x10 0000000000000001 x11 0000000000000001
x12 0000006eb7048de8 x13 0000000000003039 x14 0000000000000000 x15 0000000000000030
x16 0000006eb72f3700 x17 0000006fcb55dd50 x18 0000006ea5a7a000 x19 0000000000000000
x20 0000006fe1b38f40 x21 0000006eb7048edb x22 0000006eb7048ee5 x23 0000000000000001
x24 0000000000000881 x25 0000000000000000 x26 0000006ea5c77ff8 x27 00000000000fc000
x28 0000006ea5b7f000 x29 0000006ea5c77c50
lr 0000006eb6e87c20 sp 0000006ea5c779a0 pc 0000006fcb55dd70 pst 0000000060001000

backtrace:
#00 pc 0000000000086d70 /apex/com.android.runtime/lib64/bionic/libc.so (syscall+32) (BuildId: 33005d19716b95e4cd346b0d7c61d055)
#1 pc 0000000000be1c1c /data/app/~~UXyRPyHfeakBKcnCBG0vbg==/com.blake.aircmd-nCy-jAivafNv7ii9EotlOQ==/base.apk (absl::lts_20210324::synchronization_internal::FutexImpl::WaitUntil(std::__ndk1::atomic, int, absl::lts_20210324::synchronization_internal::KernelTimeout)+108)
#2 pc 0000000000be1b48 /data/app/~~UXyRPyHfeakBKcnCBG0vbg==/com.blake.aircmd-nCy-jAivafNv7ii9EotlOQ==/base.apk (absl::lts_20210324::synchronization_internal::Waiter::Wait(absl::lts_20210324::synchronization_internal::KernelTimeout)+156)
#3 pc 0000000000be1a6c /data/app/~~UXyRPyHfeakBKcnCBG0vbg==/com.blake.aircmd-nCy-jAivafNv7ii9EotlOQ==/base.apk (AbslInternalPerThreadSemWait_lts_20210324+72)
#4 pc 0000000000be1040 /data/app/~~UXyRPyHfeakBKcnCBG0vbg==/com.blake.aircmd-nCy-jAivafNv7ii9EotlOQ==/base.apk (absl::lts_20210324::CondVar::WaitCommon(absl::lts_20210324::Mutex
, absl::lts_20210324::synchronization_internal::KernelTimeout)+172)
#5 pc 0000000000b7e7e0 /data/app/~~UXyRPyHfeakBKcnCBG0vbg==/com.blake.aircmd-nCy-jAivafNv7ii9EotlOQ==/base.apk (mediapipe::ThreadPool::RunWorker()+164)
#6 pc 0000000000b7e518 /data/app/~~UXyRPyHfeakBKcnCBG0vbg==/com.blake.aircmd-nCy-jAivafNv7ii9EotlOQ==/base.apk (mediapipe::ThreadPool::WorkerThread::ThreadBody(void*)+796)
#7 pc 00000000000efbf4 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+264) (BuildId: 33005d19716b95e4cd346b0d7c61d055)
#8 pc 000000000008c4d0 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68) (BuildId: 33005d19716b95e4cd346b0d7c61d055)

@kboyarshinov
Copy link

I've also encountered this issue. Can confirm it gets fixed by providing model path directly to InferenceCalculator like:

node {
  calculator: "InferenceCalculator"
  input_stream: "TENSORS:input_tensors"
  output_stream: "TENSORS:output_tensors"
  options: {
    [mediapipe.InferenceCalculatorOptions.ext] {
      model_path: "mediapipe/modules/pose_landmark/pose_landmark_lite.tflite"
    }
  }
}

From that it looks like root cause of the problem is in either SwitchContainer, ConstantSidePacketCalculator or LocalFileContentsCalculator.

@kuaashish kuaashish assigned kuaashish and unassigned jiuqiant Apr 26, 2023
@kuaashish kuaashish removed the stat:awaiting googler Waiting for Google Engineer's Response label Apr 26, 2023
@kuaashish
Copy link
Collaborator

Hello @tri-bao,
We are upgrading the MediaPipe Legacy Solutions to new MediaPipe solutions However, the libraries, documentation, and source code for all the MediapPipe Legacy Solutions will continue to be available in our GitHub repository and through library distribution services, such as Maven and NPM.

You can continue to use those legacy solutions in your applications if you choose. Though, we would request you to check new MediaPipe solutions which can help you more easily build and customize ML solutions for your applications. These new solutions will provide a superset of capabilities available in the legacy solutions. Thank you

@kuaashish kuaashish added the stat:awaiting response Waiting for user response label Apr 26, 2023
@github-actions
Copy link

github-actions bot commented May 4, 2023

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale label May 4, 2023
@github-actions
Copy link

This issue was closed due to lack of activity after being marked stale for past 7 days.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@kuaashish kuaashish removed stat:awaiting response Waiting for user response stale labels May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
legacy:face mesh Issues related to Face Mesh platform:android Issues with Android as Platform type:bug Bug in the Source Code of MediaPipe Solution
Projects
None yet
Development

No branches or pull requests

6 participants