Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZUPT] Problems with zero velocity update #89

Closed
zgxsin opened this issue Aug 7, 2020 · 15 comments
Closed

[ZUPT] Problems with zero velocity update #89

zgxsin opened this issue Aug 7, 2020 · 15 comments
Labels
debugging Might be a bug, looking into the issue

Comments

@zgxsin
Copy link

zgxsin commented Aug 7, 2020

I can run openvins without enabling the zero velocity update feature. Once i enable it, the program seems to get stuck when the program remains in zero velocity state for a long time (< 2 mins). And sometimes the program just dies (see below). And the program does not publish the pose of IMU when it is in zero velocity state. Is this expected or not? I think IMU pose should still be published with a different timestamp when the device is not moving.

[ZUPT]: accepted zero velocity |v_IinG| = 0.025 (chi2 0.547 < 126.574)
[ZUPT]: chi2_check over the residual limit - 4614
[ZUPT]: rejected zero velocity |v_IinG| = 0.025 (chi2 26.072 > 0.000)
[TIME]: 34.2291 seconds for tracking
[TIME]: 0.0053 seconds for propagation
[TIME]: 0.0096 seconds for MSCKF update (7 features)
[TIME]: 0.0012 seconds for SLAM update (9 feats)
[TIME]: 0.0000 seconds for SLAM delayed init (0 feats)
[TIME]: 0.0029 seconds for marginalization (15 clones in state)
[TIME]: 34.2481 seconds for total
q_GtoI = 0.002,0.002,0.677,0.736 | p_IinG = 2.164,-1.111,-0.190 | dist = 3.39 (meters)
bg = 0.0014,0.0015,0.0095 | ba = 0.1782,-0.0449,0.0023
camera-imu timeoffset = -0.01607
cam0 intrinsics = 550.765,573.616,715.611,548.810 | 0.410,0.076,-0.274,0.292
cam1 intrinsics = 548.473,552.193,723.017,546.064 | 0.421,0.081,-0.297,0.334
cam0 extrinsics = -0.001,-0.712,0.702,0.005 | -0.008,-0.065,-0.219
cam1 extrinsics = -0.716,0.003,-0.002,0.698 | -0.036,-0.098,-0.749
[ZUPT]: accepted zero velocity |v_IinG| = 0.023 (chi2 22.560 < 437.047)
[ZUPT]: accepted zero velocity |v_IinG| = 0.021 (chi2 1.863 < 126.574)
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

```
@zgxsin
Copy link
Author

zgxsin commented Aug 7, 2020

I checked that the res dimension is too large, causing a very intensive computation. Something might be wrong with handling the timestamps of IMU measurements.

res.rows102
[ZUPT]: accepted zero velocity |v_IinG| = 0.012 (chi2 0.544 < 79.697)
res.rows774
[ZUPT]: accepted zero velocity |v_IinG| = 0.012 (chi2 4.462 < 710.441)
res.rows198
[ZUPT]: accepted zero velocity |v_IinG| = 0.012 (chi2 0.912 < 166.444)
res.rows3270
[ZUPT]: accepted zero velocity |v_IinG| = 0.012 (chi2 18.781 < 3138.127)
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 10004)


@goldbattle
Copy link
Member

goldbattle commented Aug 7, 2020 via email

@goldbattle goldbattle added the debugging Might be a bug, looking into the issue label Aug 8, 2020
@goldbattle
Copy link
Member

Do have a dataset this is failing on?
It seems #90 is running into the same issue on the KAIST urban 28 dataset.

@zgxsin
Copy link
Author

zgxsin commented Aug 10, 2020

GDB back-tracing:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f146fa2a8b1 in __GI_abort () at abort.c:79
#2  0x00007f147041d957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f1470423ae6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f1470423b21 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f1470423d54 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f147041f85d in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f1471e38d65 in std::vector<ov_msckf::Propagator::IMUDATA, std::allocator<ov_msckf::Propagator::IMUDATA> >::_M_range_check (this=0x55f06f1772a0, this=0x55f06f1772a0, __n=18446744073709551615) at /usr/include/c++/6/bits/stl_vector.h:804
#8  std::vector<ov_msckf::Propagator::IMUDATA, std::allocator<ov_msckf::Propagator::IMUDATA> >::at (__n=18446744073709551615, this=0x55f06f1772a0) at /usr/include/c++/6/bits/stl_vector.h:843
#9  ov_msckf::Propagator::select_imu_readings (imu_data=std::vector of length 209, capacity 32768 = {...}, time0=<optimized out>, time1=time1@entry=1596716829.6416836) at /home/guzhou/catkin_ws/src/ov_msckf/src/state/Propagator.cpp:233
#10 0x00007f1471eb4a24 in ov_msckf::UpdaterZeroVelocity::try_update (this=0x55f06f1771f0, state=0x55f06f15cd30, timestamp=timestamp@entry=1596716829.6570051) at /home/guzhou/catkin_ws/src/ov_msckf/src/update/UpdaterZeroVelocity.cpp:51
#11 0x00007f1471e590a6 in ov_msckf::VioManager::feed_measurement_monocular (this=0x55f06f164b40, timestamp=1596716829.6570051, img0=..., cam_id=0) at /home/guzhou/catkin_ws/src/ov_msckf/src/core/VioManager.cpp:175
#12 0x000055f06cea7595 in callback_monocular (msg0=...) at /home/guzhou/catkin_ws/src/ov_msckf/src/ros_subscribe_msckf.cpp:165
#13 0x000055f06cebaffe in boost::function1<void, boost::shared_ptr<sensor_msgs::Image_<std::allocator<void> > const> const&>::operator() (a0=..., this=<optimized out>) at /usr/include/boost/function/function_template.hpp:760
#14 boost::detail::function::void_function_obj_invoker1<boost::function<void (boost::shared_ptr<sensor_msgs::Image_<std::allocator<void> > const> const&)>, void, boost::shared_ptr<sensor_msgs::Image_<std::allocator<void> > const> >::invoke(boost::detail::function::function_buffer&, boost::shared_ptr<sensor_msgs::Image_<std::allocator<void> > const>) (function_obj_ptr=..., a0=...) at /usr/include/boost/function/function_template.hpp:159
#15 0x000055f06ceda67d in boost::function1<void, boost::shared_ptr<sensor_msgs::Image_<std::allocator<void> > const> >::operator() (a0=..., this=0x55f06f22bac8) at /usr/include/boost/function/function_template.hpp:760
#16 ros::SubscriptionCallbackHelperT<boost::shared_ptr<sensor_msgs::Image_<std::allocator<void> > const> const&, void>::call (this=0x55f06f22bac0, params=...) at /opt/ros/melodic/include/ros/subscription_callback_helper.h:144
#17 0x00007f1471517422 in ros::SubscriptionQueue::call() () from /opt/ros/melodic/lib/libroscpp.so
#18 0x00007f14714c1b9c in ros::CallbackQueue::callOneCB(ros::CallbackQueue::TLS*) () from /opt/ros/melodic/lib/libroscpp.so
#19 0x00007f14714c2f8b in ros::CallbackQueue::callAvailable(ros::WallDuration) () from /opt/ros/melodic/lib/libroscpp.so
#20 0x00007f147151aff9 in ros::SingleThreadedSpinner::spin(ros::CallbackQueue*) () from /opt/ros/melodic/lib/libroscpp.so
#21 0x00007f147150388b in ros::spin() () from /opt/ros/melodic/lib/libroscpp.so
#22 0x000055f06cea5497 in main (argc=<optimized out>, argv=<optimized out>) at /home/guzhou/catkin_ws/src/ov_msckf/src/ros_subscribe_msckf.cpp:112

@zgxsin
Copy link
Author

zgxsin commented Aug 10, 2020

Do have a dataset this is failing on?
It seems #90 is running into the same issue on the KAIST urban 28 dataset.

Sorry, i cannot provide the datasets to you now. But you might reproduce the issue on the KAIST urban 28 dataset #90

@zgxsin
Copy link
Author

zgxsin commented Aug 10, 2020

Further investigation:
I guess this issue is due to the limited number of threads to handle the callbacks. When doing zero velocity update, it takes time to do update. If the time is too long and we might skip the images instead of triggering the callback. When the callback is finally triggered by the image, the timestamp between the image and current system state might have a large gap, causing increase of the number of selected IMU measurements for ZUPT, which further increase the computational burden. This is a Vicious Cycle, which will make the program die because of the unbound computation burden.

I increase the threads by changing

 `ros:spin()` 

to

   ros::MultiThreadedSpinner spinner(3);
   spinner.spin();

The above mention problem is gone. But i encountered the following very big chi2-distance issue (trajectory flies away):

[ZUPT]: accepted zero velocity |v_IinG| = 0.027 (chi2 63.808 < 126.574)
StateHelper::EKFUpdate() - diagonal at 9 is -0.00
StateHelper::EKFUpdate() - diagonal at 10 is -0.00
StateHelper::EKFUpdate() - diagonal at 11 is -0.00
uncorrected timestamps: state->_timestamp, timestamp1596716860.654639 to 1596716860.694836
time offsets: last_prop_time_offset, t_off_new-0.015265 to -0.015265
Propagator::select_imu_readings1596716860.639373 to 1596716860.679570
imu_recent.size()18
res.rows102
[ZUPT]: rejected zero velocity |v_IinG| = 0.027 (chi2 359854873886298961979245592576.000 > 126.574)
StateHelper::EKFPropagation() - diagonal at 9 is -0.00
StateHelper::EKFPropagation() - diagonal at 10 is -0.00
StateHelper::EKFPropagation() - diagonal at 11 is -0.00
StateHelper::EKFPropagation() - diagonal at 9 is -0.00
StateHelper::EKFPropagation() - diagonal at 10 is -0.00
StateHelper::EKFPropagation() - diagonal at 11 is -0.00
StateHelper::EKFPropagation() - diagonal at 175 is -0.00
StateHelper::EKFPropagation() - diagonal at 9 is -0.00
StateHelper::EKFPropagation() - diagonal at 10 is -0.00
StateHelper::EKFPropagation() - diagonal at 11 is -0.00
StateHelper::EKFPropagation() - diagonal at 175 is -0.00
StateHelper::EKFPropagation() - diagonal at 9 is -0.00
StateHelper::EKFPropagation() - diagonal at 10 is -0.00
StateHelper::EKFPropagation() - diagonal at 11 is -0.00
StateHelper::EKFPropagation() - diagonal at 175 is -0.00
StateHelper::EKFPropagation() - diagonal at 9 is -0.00
StateHelper::EKFPropagation() - diagonal at 10 is -0.00
StateHelper::EKFPropagation() - diagonal at 11 is -0.00
StateHelper::EKFPropagation() - diagonal at 175 is -0.00
[TIME]: 0.0069 seconds for tracking
[TIME]: 0.0003 seconds for propagation
[TIME]: 0.0001 seconds for MSCKF update (0 features)
[TIME]: 0.0003 seconds for SLAM update (0 feats)
[TIME]: 0.0001 seconds for SLAM delayed init (0 feats)
[TIME]: 0.0014 seconds for marginalization (15 clones in state)
[TIME]: 0.0091 seconds for total
q_GtoI = 0.005,-0.001,0.678,0.735 | p_IinG = 1.871,-1.919,0.064 | dist = 3.25 (meters)
bg = -0.0012,0.0063,0.0780 | ba = 0.0764,-0.0513,0.0079
camera-imu timeoffset = -0.01527
cam0 intrinsics = 545.539,564.960,708.694,538.362 | 0.413,0.094,-0.254,0.279
cam0 extrinsics = -0.001,-0.718,0.697,0.002 | -0.021,-0.058,-0.264
uncorrected timestamps: state->_timestamp, timestamp1596716860.694836 to 1596716860.734752
time offsets: last_prop_time_offset, t_off_new-0.015265 to -0.015265
Propagator::select_imu_readings1596716860.679570 to 1596716860.719486
imu_recent.size()18
res.rows102
[ZUPT]: rejected zero velocity |v_IinG| = 0.027 (chi2 426833296315304007907072278528.000 > 126.574)

@zgxsin
Copy link
Author

zgxsin commented Aug 10, 2020

Update: We could get NAN value for chi2 calculation, which needs to be considered.

[ZUPT]: accepted zero velocity |v_IinG| = 0.027 (chi2 0.466 < 126.574)
uncorrected timestamps: state->_timestamp, timestamp1596716860.694836 to 1596716860.734752
time offsets: last_prop_time_offset, t_off_new-0.015369 to -0.015369
Propagator::select_imu_readings1596716860.679467 to 1596716860.719383
imu_recent.size()17
[ZUPT]: rejected zero velocity, detect NAN chi2!

@bobododosjl
Copy link

Further investigation:
I guess this issue is due to the limited number of threads to handle the callbacks. When doing zero velocity update, it takes time to do update. If the time is too long and we might skip the images instead of triggering the callback. When the callback is finally triggered by the image, the timestamp between the image and current system state might have a large gap, causing increase of the number of selected IMU measurements for ZUPT, which further increase the computational burden. This is a Vicious Cycle, which will make the program die because of the unbound computation burden.

I increase the threads by changing

 `ros:spin()` 

to

   ros::MultiThreadedSpinner spinner(3);
   spinner.spin();

The above mention problem is gone. But i encountered the following very big chi2-distance issue (trajectory flies away):

[ZUPT]: accepted zero velocity |v_IinG| = 0.027 (chi2 63.808 < 126.574)
StateHelper::EKFUpdate() - diagonal at 9 is -0.00
StateHelper::EKFUpdate() - diagonal at 10 is -0.00
StateHelper::EKFUpdate() - diagonal at 11 is -0.00
uncorrected timestamps: state->_timestamp, timestamp1596716860.654639 to 1596716860.694836
time offsets: last_prop_time_offset, t_off_new-0.015265 to -0.015265
Propagator::select_imu_readings1596716860.639373 to 1596716860.679570
imu_recent.size()18
res.rows102
[ZUPT]: rejected zero velocity |v_IinG| = 0.027 (chi2 359854873886298961979245592576.000 > 126.574)
StateHelper::EKFPropagation() - diagonal at 9 is -0.00
StateHelper::EKFPropagation() - diagonal at 10 is -0.00
StateHelper::EKFPropagation() - diagonal at 11 is -0.00
StateHelper::EKFPropagation() - diagonal at 9 is -0.00
StateHelper::EKFPropagation() - diagonal at 10 is -0.00
StateHelper::EKFPropagation() - diagonal at 11 is -0.00
StateHelper::EKFPropagation() - diagonal at 175 is -0.00
StateHelper::EKFPropagation() - diagonal at 9 is -0.00
StateHelper::EKFPropagation() - diagonal at 10 is -0.00
StateHelper::EKFPropagation() - diagonal at 11 is -0.00
StateHelper::EKFPropagation() - diagonal at 175 is -0.00
StateHelper::EKFPropagation() - diagonal at 9 is -0.00
StateHelper::EKFPropagation() - diagonal at 10 is -0.00
StateHelper::EKFPropagation() - diagonal at 11 is -0.00
StateHelper::EKFPropagation() - diagonal at 175 is -0.00
StateHelper::EKFPropagation() - diagonal at 9 is -0.00
StateHelper::EKFPropagation() - diagonal at 10 is -0.00
StateHelper::EKFPropagation() - diagonal at 11 is -0.00
StateHelper::EKFPropagation() - diagonal at 175 is -0.00
[TIME]: 0.0069 seconds for tracking
[TIME]: 0.0003 seconds for propagation
[TIME]: 0.0001 seconds for MSCKF update (0 features)
[TIME]: 0.0003 seconds for SLAM update (0 feats)
[TIME]: 0.0001 seconds for SLAM delayed init (0 feats)
[TIME]: 0.0014 seconds for marginalization (15 clones in state)
[TIME]: 0.0091 seconds for total
q_GtoI = 0.005,-0.001,0.678,0.735 | p_IinG = 1.871,-1.919,0.064 | dist = 3.25 (meters)
bg = -0.0012,0.0063,0.0780 | ba = 0.0764,-0.0513,0.0079
camera-imu timeoffset = -0.01527
cam0 intrinsics = 545.539,564.960,708.694,538.362 | 0.413,0.094,-0.254,0.279
cam0 extrinsics = -0.001,-0.718,0.697,0.002 | -0.021,-0.058,-0.264
uncorrected timestamps: state->_timestamp, timestamp1596716860.694836 to 1596716860.734752
time offsets: last_prop_time_offset, t_off_new-0.015265 to -0.015265
Propagator::select_imu_readings1596716860.679570 to 1596716860.719486
imu_recent.size()18
res.rows102
[ZUPT]: rejected zero velocity |v_IinG| = 0.027 (chi2 426833296315304007907072278528.

Further investigation:
I guess this issue is due to the limited number of threads to handle the callbacks. When doing zero velocity update, it takes time to do update. If the time is too long and we might skip the images instead of triggering the callback. When the callback is finally triggered by the image, the timestamp between the image and current system state might have a large gap, causing increase of the number of selected IMU measurements for ZUPT, which further increase the computational burden. This is a Vicious Cycle, which will make the program die because of the unbound computation burden.

I increase the threads by changing

 `ros:spin()` 

to

   ros::MultiThreadedSpinner spinner(3);
   spinner.spin();

The above mention problem is gone. But i encountered the following very big chi2-distance issue (trajectory flies away):

[ZUPT]: accepted zero velocity |v_IinG| = 0.027 (chi2 63.808 < 126.574)
StateHelper::EKFUpdate() - diagonal at 9 is -0.00
StateHelper::EKFUpdate() - diagonal at 10 is -0.00
StateHelper::EKFUpdate() - diagonal at 11 is -0.00
uncorrected timros:spin()` ros:spin()` estamps: state->_timestamp, timestamp1596716860.654639 to 1596716860.694836
time offsets: last_prop_time_offset, t_off_new-0.015265 to -0.015265
Propagator::select_imu_readings1596716860.639373 to 1596716860.679570
imu_recent.size()18
res.rows102
[ZUPT]: rejected zero velocity |v_IinG| = 0.027 (chi2 359854873886298961979245592576.000 > 126.574)
StateHelper::EKFPropagation() - diagonal at 9 is -0.00
StateHelper::EKFPropagation() - diagonal at 10 is -0.00
StateHelper::EKFPropagation() - diagonal at 11 is -0.00
StateHelper::EKFPropagation() - diagonal at 9 is -0.00
StateHelper::EKFPropagation() - diagonal at 10 is -0.00
StateHelper::EKFPropagation() - diagonal at 11 is -0.00
StateHelper::EKFPropagation() - diagonal at 175 is -0.00
StateHelper::EKFPropagation() - diagonal at 9 is -0.00
StateHelper::EKFPropagation() - diagonal at 10 is -0.00
StateHelper::EKFPropagation() - diagonal at 11 is -0.00
StateHelper::EKFPropagation() - diagonal at 175 is -0.00
StateHelper::EKFPropagation() - diagonal at 9 is -0.00
StateHelper::EKFPropagation() - diagonal at 10 is -0.00
StateHelper::EKFPropagation() - diagonal at 11 is -0.00
StateHelper::EKFPropagation() - diagonal at 175 is -0.00
StateHelper::EKFPropagation() - diagonal at 9 is -0.00
StateHelper::EKFPropagation() - diagonal at 10 is -0.00
StateHelper::EKFPropagation() - diagonal at 11 is -0.00
StateHelper::EKFPropagation() - diagonal at 175 is -0.00
[TIME]: 0.0069 seconds for tracking
[TIME]: 0.0003 seconds for propagation
[TIME]: 0.0001 seconds for MSCKF update (0 features)
[TIME]: 0.0003 seconds for SLAM update (0 feats)
[TIME]: 0.0001 seconds for SLAM delayed init (0 feats)
[TIME]: 0.0014 seconds for marginalization (15 clones in state)
[TIME]: 0.0091 seconds for total
q_GtoI = 0.005,-0.001,0.678,0.735 | p_IinG = 1.871,-1.919,0.064 | dist = 3.25 (meters)
bg = -0.0012,0.0063,0.0780 | ba = 0.0764,-0.0513,0.0079
camera-imu timeoffset = -0.01527
cam0 intrinsics = 545.539,564.960,708.694,538.362 | 0.413,0.094,-0.254,0.279
cam0 extrinsics = -0.001,-0.718,0.697,0.002 | -0.021,-0.058,-0.264
uncorrected timestamps: state->_timestamp, timestamp1596716860.694836 to 1596716860.734752
time offsets: last_prop_time_offset, t_off_new-0.015265 to -0.015265
Propagator::select_imu_readings1596716860.679570 to 1596716860.719486
imu_recent.size()18
res.rows102
[ZUPT]: rejected zero velocity |v_IinG| = 0.027 (chi2 426833296315304007907072278528.000 > 126.574)

it dosen"t work by changing ros::spin()

@zgxsin
Copy link
Author

zgxsin commented Aug 13, 2020

Did you get the same error as before after you change ros::spin()?

@goldbattle
Copy link
Member

This GDB hints at that the IMU is not providing good timestamps, or the time offset calibration seems to be freaking out. Can you disable time offset calibration, and then also take a look at the timestamps. The line that is probably hurting you is this one which assumes that you have IMU measurements to propagate with. If you don't have any measurements, then you can't propagate.

Of course I would make sure that you can get the system to work without zero velocity updates, and since you are using your own datasets, make sure you can run the system on one of the supported ones.

@bobododosjl
Copy link

Did you get the same error as before after you change ros::spin()?

yes

@bobododosjl
Copy link

Did you get the same error as before after you change ros::spin()?

can u show me your pgeneva_ros_kaist.launch ?

@zgxsin
Copy link
Author

zgxsin commented Aug 17, 2020

This GDB hints at that the IMU is not providing good timestamps, or the time offset calibration seems to be freaking out. Can you disable time offset calibration, and then also take a look at the timestamps. The line that is probably hurting you is this one which assumes that you have IMU measurements to propagate with. If you don't have any measurements, then you can't propagate.

Of course I would make sure that you can get the system to work without zero velocity updates, and since you are using your own datasets, make sure you can run the system on one of the supported ones.

Hi @goldbattle , thanks for the reply. I checked the timestamps of IMU and cameras. They are synchronized and there is no message loss. I think the problem is due to that we select too many IMU measurements in this function: ov_msckf::Propagator::select_imu_readings, causing the unbound computational burden. There might be something wrong with the timestamp handling logic.
see here

res.rows102
[ZUPT]: accepted zero velocity |v_IinG| = 0.012 (chi2 0.544 < 79.697)
res.rows774
[ZUPT]: accepted zero velocity |v_IinG| = 0.012 (chi2 4.462 < 710.441)
res.rows198
[ZUPT]: accepted zero velocity |v_IinG| = 0.012 (chi2 0.912 < 166.444)
res.rows3270
[ZUPT]: accepted zero velocity |v_IinG| = 0.012 (chi2 18.781 < 3138.127)
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 10004)

Of course I would make sure that you can get the system to work without zero velocity updates

If i run the program without enabling the ZUPT, the trajectory will drift a little bit, which is not what we want.

@zgxsin
Copy link
Author

zgxsin commented Aug 17, 2020

Did you get the same error as before after you change ros::spin()?

can u show me your pgeneva_ros_kaist.launch ?

Sorry, I ran Kasit dataset using the default one. I cannot tell whether there is issue with that or not since i did not run the full trajectory.

@goldbattle
Copy link
Member

This is due to zero velocity update IMU selection logic, will track this in Issue #90.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
debugging Might be a bug, looking into the issue
Projects
None yet
Development

No branches or pull requests

3 participants