-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] ROS Noetic PoE connectivity issues #352
Comments
Something interesting happened while I was working on collecting the logs. I am not sure if this was the reason of the disconnect issue, but now I cannot even get data from ROS topics. When I run a query like
The full log of the session is attached. |
Hello again, I was able to capture the issues from the debug log. First one is the error I mentioned above and the second one is
Looks like there is a very similar bug report on #293, the only difference is that I am running One thing I could also share is that the OAK-D Pro PoE camera I have is running (and will be running) on a busy but isolated network with several other PoE cameras along with some other other network devices. I have one OAK-D Pro camera on that network at this time. The switch is good enough to power many PoE cameras, I don't think power would be an issue here. |
I'd like to share another log stream. In this case, the OAK-D Pro camera failed after running approximately 4 hours.
|
I'd like to share some updates after testing the OAK-D Pro PoE cameras for 4 days. As suggested on the documentation, I set I have two OAK-D Pro PoE cameras, and I've tested The good thing is that both cameras are just running fine with the reduced FPS and they haven't failed with the CPU error explained previously. This also makes sense, as running under 100% CPU load never looks good. I haven't tested if I could get 60 FPS data from ROS or the actual difference between setting |
I've restarted the ROS driver with a new configuration today (it looked stable with the default config and reduced FPS) and it has started failing again. The only log entry regarding the error I can get is the following. This happens when I subscribe to
The configuration file changes are
As I mentioned, I have two cameras. I was able to run this config on one of them, but the other one fails. Additional observations:
|
Hi, thanks for the detailed information. Can you try running stereo in low bandwidth mode? And see if that changes things? |
@Serafadam I just received a programming board by mail and reflashed the firmware of the both cameras as a last resort. I am testing them now, also playing with the stereo configuration. They haven't failed yet during the last 1 hour, but it seems like these errors are occurring randomly. I'll update this ticket with more information when I have it. If it still fails with the new firmware, I'll test the low bw mode for the stereo only. |
After a couple of restarts with some different config values, mainly changing the spatial filter parameters and bilateral sigma value, it started the crash loop again. At that time, I turned low bandwidth on with 50%, but it didn't solve the issue. After waiting ~1 hr, the camera started responding again. When it is in the crash loop (i.e. continuously crashing no matter what), changing to the default config or any config variation fails. I am wondering if there is a way to run a hard reset before starting the ROS node, as in to make sure that the camera config is cleared from the device. I am sure it is designed to delete the config in case of a restart, but it would be great to test if a hard reset on the device would recover from the crash. |
This is a long shot, but could you try checking if setting parameter |
I am not using IMU (at least, for the current application I am working on) and I could even turn it off, if possible (edit: I see In any case, I'll add/change the parameter you mentioned and leave it on during the weekend to see if it helps. |
@Serafadam unfortunately, I set up the config variable on Friday, restarted the camera with a variety of configs (as discussed previously, the config variables I am mainly focused are the ones starting with I restarted the container today (Thursday) and after that, subscribing to any image topic started crashing the ROS node. Both cameras were working fine before I restarted the containers. In summary, restarting the containers after running the cameras continuously for a couple of days (e.g. 4 or more) ends up with a camera/ROS node crash loop. This loop continues for hours, usually the cameras start working again (so, the crash loop ends) very early in the morning. The CPU load stays on the 50% level all the time for both of the cameras during the testing period of 4 days. Previously, during the crash loop, I was able to connect and get image from the cameras when I used the DepthAI Python SDK directly. That was never an issue, but the ROS node always fails during the loop. |
I'll try to setup some testing scenario next week, but could you additionally try running those nodes with debugger enabled? |
@Serafadam I am not sure how to enable debugger without editing the files in the container (I am guessing you are talking about It looks like I have failed to mention something important in my bug report. On my setup, I have a separate Considering this information, let me share the
The last line on the log seemed very interesting and here are some issues that I found about "broken bond":
I am not sure if the "broken bond" I see on the OAK-D logs is the same nodelet issue as mentioned on the links. Since this broken bond issue happens whenever I subscribe to the topic, it feels like it might be related. It is still a guess, for sure. One other thing to mention is that "separate roscore instance" works fine with my other ROS docker containers, but I don't think any of them implements nodelets. My observation is that there are two main issues mentioned on this ticket:
Do these make any sense to you? Edit: Also tested |
|
Looks like it might be an issue with Nodelets and ROS1, after all. When The separate roscore issue seems to be unrelated to this repo. Note to myself: Maybe it is time to start using ROS2 :) Thanks for the help! |
No problem, glad to help 🙂 |
Describe the bug
I am using
luxonis/depthai-ros:v2.7.4-noetic
Docker image with a OAK-D Pro PoE camera. The camera is plugged to one of the PoE++ ports of a Ubiquity USW Pro 24 PoE switch. After some time, the ROS driver stops publishing torgb
andstereo
topics. This could happen as short as in 2 hours or something like 12 hours after launching the ROS driver. I was never able to get the image from the ROS topic after leaving the camera on overnight.While the ROS driver is running, I can get the images from the topics without any issues. When I encounter the issue, I can still see the ROS topics but I cannot get any data from them.
I also tried to roll back to
luxonis/depthai-ros:v2.7.2-noetic
Docker image, but that didn't change anything.I tried to change the switch, and used a Ubiquity USW Lite 8 PoE. On this switch, I used one of the PoE+ ports. No change observed.
I've tried the Python scripts provided as examples, they seem to work fine. I can't see any log entries regarding the disconnection on the Docker image logs, so I am not sure what is happening.
Minimal Reproducible Example
What I do is pretty much running the following command
configs/camera.yaml
is the one provided hereExpected behavior
It should publish to the ROS topics continuously.
Pipeline Graph
I can provide this, but since I haven't customized the configuration file, it would be the default one if I understand it correctly. I was unable to save the pipeline via
rosservice call /oak/save_pipeline
command, it just didn't create/tmp/pipeline.json
file in the container.Attach system log
luxonis/depthai-ros:v2.7.4-noetic
DEPTHAI_DEBUG=1
mode, I'll check and share the logs after I encounter the issue.The text was updated successfully, but these errors were encountered: