Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPC_CALL_OK crash #133

Closed
douglaswsilva opened this issue Jun 24, 2019 · 17 comments · Fixed by mavlink/MAVSDK#785
Closed

GRPC_CALL_OK crash #133

douglaswsilva opened this issue Jun 24, 2019 · 17 comments · Fixed by mavlink/MAVSDK#785

Comments

@douglaswsilva
Copy link
Collaborator

douglaswsilva commented Jun 24, 2019

Tested with:

  • Yuneec H520-G and ST10C remote controller
  • Swift 4 Xcode 10.1
  • MAVSDK Example app
  • Carftile.resolved:
binary "https://code.videolan.org/videolan/VLCKit/raw/master/Packaging/MobileVLCKit.json" "3.1.4"
binary "https://s3.eu-central-1.amazonaws.com/dronecode-sdk/backend.json" "0.17.0"
github "Dronecode/DronecodeSDK-Swift" "0.5.0"
github "ReactiveX/RxSwift" "4.5.0"
github "grpc/grpc-swift" "0.8.1"

Reproduce:

  • Connect iPad to drone running HITL via wifi
  • Run Example app
  • After connecting to the drone and start subscribing to telemetry, app will crash.

Crash message:

E0624 15:20:24.714529000 6164901888 call_op_set.h:943] assertion failed: GRPC_CALL_OK == g_core_codegen_interface->grpc_call_start_batch( call_.call(), ops, nops, core_cq_tag(), nullptr)

I noticed that the crash will happen after subscribing to some telemetry observables. I'd receive some results, for instance position would return location for a few seconds and then crash after a while.

I saw crashes in observePosition(), observeGroundSpeed(), observeGPS(), but suspect it can happen in any place we have some observable callback, specially in telemetry.

Crash Stack:

#3        0x00000001044129ac in grpc::CoreCodegen::assert_fail(char const*, char const*, int) ()
#4        0x0000000104423dbc in grpc::internal::CallOpSet<grpc::internal::CallOpSendInitialMetadata, grpc::internal::CallOpSendMessage, grpc::internal::CallNoOp<3>, grpc::internal::CallNoOp<4>, grpc::internal::CallNoOp<5>, grpc::internal::CallNoOp<6> >::ContinueFillOpsAfterInterception() ()
#5        0x00000001042df4ec in grpc::ServerWriter<dronecode_sdk::rpc::telemetry::PositionResponse>::Write(dronecode_sdk::rpc::telemetry::PositionResponse const&, grpc::WriteOptions) ()
#6        0x0000000104155440 in std::__1::__function::__func<dronecode_sdk::backend::TelemetryServiceImpl<dronecode_sdk::Telemetry>::SubscribePosition(grpc::ServerContext*, dronecode_sdk::rpc::telemetry::SubscribePositionRequest const*, grpc::ServerWriter<dronecode_sdk::rpc::telemetry::PositionResponse>*)::'lambda'(dronecode_sdk::Telemetry::Position), std::__1::allocator<dronecode_sdk::backend::TelemetryServiceImpl<dronecode_sdk::Telemetry>::SubscribePosition(grpc::ServerContext*, dronecode_sdk::rpc::telemetry::SubscribePositionRequest const*, grpc::ServerWriter<dronecode_sdk::rpc::telemetry::PositionResponse>*)::'lambda'(dronecode_sdk::Telemetry::Position)>, void (dronecode_sdk::Telemetry::Position)>::operator()(dronecode_sdk::Telemetry::Position&&) ()
#7        0x0000000104409158 in std::__1::__function::__func<dronecode_sdk::TelemetryImpl::process_global_position_int(__mavlink_message const&)::$_1, std::__1::allocator<dronecode_sdk::TelemetryImpl::process_global_position_int(__mavlink_message const&)::$_1>, void ()>::operator()() ()
#8        0x00000001043ccbc4 in dronecode_sdk::ThreadPool::worker() ()
#9        0x00000001043cd1c8 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (dronecode_sdk::ThreadPool::*)(), dronecode_sdk::ThreadPool*> >(void*) ()
@douglaswsilva
Copy link
Collaborator Author

douglaswsilva commented Jun 25, 2019

Note: I tested with Gazebo and didn't get the crash. I tried starting SITL before and after starting the app and it worked in both cases. We just have to fix this for the actual drone.

@unipheas
Copy link
Contributor

unipheas commented Jun 26, 2019

@douglaswsilva Just wanted to confirm, but when you pulled the latest version of the example app you are manually updating it to 0.5.0 from 0.4.1, right? I'm changing the Cartfile to github "Dronecode/DronecodeSDK-Swift" == 0.5.0 now and will test everything.

That said, I tried the current version of the example app (with 0.4.1) and it wasn't establishing a connection with my ESP8266 WiFi Module, but QGC was able to.

update: Not able to get it to connect with 0.5.0 either.

@JonasVautherin
Copy link
Collaborator

@unipheas: Are you getting a crash, or is it just not connecting? Does the backend output anything at all (at least the version number and a few log lines)?

@JonasVautherin
Copy link
Collaborator

@douglaswsilva: I'm trying to reproduce, and also testing with latest mavsdk_server 0.18.0.

@unipheas
Copy link
Contributor

@unipheas: Are you getting a crash, or is it just not connecting? Does the backend output anything at all (at least the version number and a few log lines)?

It's not crashing, just not connecting. Here is the log

2019-06-26 19:44:58.875767+0800 DronecodeSDK_Swift_Example[65517:4136020] [DYMTLInitPlatform] platform initialization successful
2019-06-26 19:44:59.109322+0800 DronecodeSDK_Swift_Example[65517:4135848] creating player instance using shared library
Running backend in background (MAVLink port: 14540
[07:44:59|Info ] DronecodeSDK version: 0.15.0 (dronecode_sdk_impl.cpp:25)
[07:44:59|Debug] New: System ID: 0 Comp ID: 0 (dronecode_sdk_impl.cpp:285)
[07:44:59|Info ] Server set to listen on 0.0.0.0:50051 (grpc_server.cpp:43)
[07:44:59|Info ] Server started (grpc_server.cpp:27)
[07:44:59|Info ] Waiting to discover system... (connection_initiator.h:57)

@JonasVautherin
Copy link
Collaborator

@unipheas: I believe this is a different issue. Let's continue it on #134.

@JonasVautherin
Copy link
Collaborator

@douglaswsilva Just wanted to confirm, but when you pulled the latest version of the example app you are manually updating it to 0.5.0 from 0.4.1, right?

@unipheas: I believe they don't, they are still on Swift 4. But I'm now wondering if that could be related somehow (e.g. grpc-swift 0.8.1 has issues with the newer version of grpc we use on the server side. Remember we had issues with grpc-swift 0.8.2 already?).

@unipheas
Copy link
Contributor

@JonasVautherin I do remember.

@unipheas
Copy link
Contributor

unipheas commented Jul 3, 2019

@douglaswsilva can you run the example app through Xcode while your phone is plugged in and tell us what the response is in the console? If it's crashing it should be producing some sort of error.

@douglaswsilva
Copy link
Collaborator Author

douglaswsilva commented Jul 3, 2019

@unipheas Yeah, here it is:

2019-07-03 09:14:56.750499-0700 DronecodeSDK_Swift_Example[9440:2545104] [DYMTLInitPlatform] platform initialization successful
2019-07-03 09:14:56.900792-0700 DronecodeSDK_Swift_Example[9440:2544927] creating player instance using shared library
Running backend in background (MAVLink port: udp://:14540
[09:14:56|Info ] MAVSDK version: 0.18.1 (mavsdk_impl.cpp:25)
[09:14:56|Debug] New: System ID: 0 Comp ID: 0 (mavsdk_impl.cpp:333)
[09:14:56|Info ] Server set to listen on 0.0.0.0:50051 (grpc_server.cpp:44)
[09:14:56|Info ] Server started (grpc_server.cpp:28)
[09:14:56|Info ] Waiting to discover system... (connection_initiator.h:58)
[09:15:45|Info ] New system on: 192.168.42.1:38499 (udp_connection.cpp:228)
[09:15:45|Debug] Component Camera 1 (100) added. (system_impl.cpp:400)
[09:15:45|Debug] request camera info (camera_impl.cpp:141)
[09:15:45|Info ] Using cached file for Yuneec E90. (camera_impl.cpp:890)
[09:15:45|Debug] Ignoring string params. (camera_definition.cpp:121)
[09:15:45|Debug] Ignoring custom params. (camera_definition.cpp:126)
[09:15:45|Debug] Ignoring custom params. (camera_definition.cpp:126)
[09:15:45|Debug] Component Autopilot (1) added. (system_impl.cpp:400)
[09:15:45|Debug] Component Gimbal (154) added. (system_impl.cpp:400)
[09:15:46|Debug] Discovered 3 component(s) (UUID: 3762002188609060911) (system_impl.cpp:563)
[09:15:46|Info ] System discovered [UUID: 3762002188609060911] (connection_initiator.h:62)
Drone connected with UUID : 3762002188609060911
E0703 09:15:46.774386000 6100955136 call_op_set.h:943]                 assertion failed: GRPC_CALL_OK == g_core_codegen_interface->grpc_call_start_batch( call_.call(), ops, nops, core_cq_tag(), nullptr)

@douglaswsilva
Copy link
Collaborator Author

Screen Shot 2019-07-03 at 9 17 41 AM

@douglaswsilva
Copy link
Collaborator Author

douglaswsilva commented Jul 3, 2019

This is my current cartfile.resolved, since I'm testing with Swift 5 now:

binary "https://code.videolan.org/videolan/VLCKit/raw/master/Packaging/MobileVLCKit.json" "3.1.4"
binary "https://s3.eu-central-1.amazonaws.com/dronecode-sdk/backend.json" "0.18.1"
github "Dronecode/DronecodeSDK-Swift" "af1b3cf64f30c6407542391b72645e2f0a41df05"
github "ReactiveX/RxSwift" "5.0.1"
github "jonasvautherin/grpc-swift" "820730e24b9cf035b2889b30d2fe87411e041720"

Let me know if you need anything else. :)

@JonasVautherin
Copy link
Collaborator

Note: they are building from the move-to-mavsdk-server branch, which is my work in progress following the renaming to MAVSDK 😇.

@JonasVautherin
Copy link
Collaborator

I could reproduce on Pixhawk 4 and on an old H520, and not on SITL, as @douglaswsilva observed. I reproduced running the Swift example app, both on the iOS simulator and on an iPad.

I could not reproduce in a (somewhat simpler) example in Python (running a mission while printing all the position, ground_speed_ned and gps_info messages), which suggests it may not come from mavsdk_server but rather from the Swift side.

Because I don't remember changing anything in the Swift code, I would guess it may come from the new version of grpc-swift (from 0.9 on). Maybe something changed and we should handle the streams a bit differently 🤔.

First thing that comes to my mind is this change that I needed when moving to Swift 5. I don't believe it is the cause because I think @douglaswsilva was experiencing this bug before I started working on that branch.

@douglaswsilva or @byuarus: could you comment on that change (here), just to make sure it is not obviously wrong?

In the meantime I'll try to downgrade grpc-swift and see where I go.

@JonasVautherin
Copy link
Collaborator

JonasVautherin commented Jul 4, 2019

You know what? I just ran another test:

And I reproduced it there! It took a few missions, but it did crash:

[10:30:59|Debug] Send mission item 0 (mission_impl.cpp:1118)
[10:30:59|Debug] Send mission item 1 (mission_impl.cpp:1118)
[10:30:59|Debug] Send mission item 2 (mission_impl.cpp:1118)
[10:30:59|Debug] Send mission item 3 (mission_impl.cpp:1118)
[10:31:00|Warn ] Retrying send mission count... (mission_impl.cpp:1339)
[10:31:00|Debug] Send mission item 3 (mission_impl.cpp:1118)
E0704 22:31:00.110097137   26875 call_op_set.h:943]          assertion failed: GRPC_CALL_OK == g_core_codegen_interface->grpc_call_start_batch( call_.call(), ops, nops, core_cq_tag(), nullptr)
zsh: abort (core dumped)  ./mavsdk/bin/mavsdk_server

@JonasVautherin
Copy link
Collaborator

Well, I'm actually not sure if it is the same issue. It sounds like this assertion shows some kind of race condition. It crashes super quickly when running mavsdk_server on iOS, but not on linux or macOS.

Not sure why, because you don't have this crash on the production version, do you @douglaswsilva?

@JonasVautherin
Copy link
Collaborator

Ok I think that's a race in mavsdk_server. What I can't understand is why it appears now, because it has been like this for months.

Working on a fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants