Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

despam logs on camera failures #158

Open
scottlamb opened this issue Aug 31, 2021 · 4 comments
Open

despam logs on camera failures #158

scottlamb opened this issue Aug 31, 2021 · 4 comments
Labels
enhancement rust Rust backend work required usability Usability / user interface improvements

Comments

@scottlamb
Copy link
Owner

scottlamb commented Aug 31, 2021

When Moonfire NVR repeatedly has trouble connecting to a camera, the logs can get very spammy. (Example: #144) Currently it only waits 1 second between tries, and in some cases the tries can be very short. It logs at least a couple lines per try; many lines with RUST_BACKTRACE=1 as in the recommended configuration. They're annoying to look through. If you don't have log file rotation set up properly, you can even quickly run out of root filesystem space, and then things get worse as database writes start to fail.

I'd like to improve this but haven't decided how. Some ideas:

  • longer retry delay / backoff. (Eg exponential backoff: sleep 1 second before retrying the first time, then 2 seconds, then 4, ..., on up to some ceiling.) The usual reason for backoff in a distributed system is that retrying too frequently adds too much load to the server over normal traffic, preventing recovery after a problem. I don't think that applies here given that connection attempts should be less expensive for both client and server than keeping the connection open. But backoff would have the benefit of reducing how much gets logged. And of reducing maximum rate of recording rows added to the database. (Also in the specific case of old Reolink cameras, a retry delay of more than 65 seconds might actually solve the problem by allowing stale sessions to get cleaned up. Details at support live555 servers older than 2017.06.04 (eg some Reolink models), which have buggy RTP/TCP retina#17.)
  • suppress stack traces on "straightforward" server problems. Eg designated errors from Retina like TCP-level errors.
  • don't individually log every failure. syslog says things like last message repeated 4 times. That exact mechanism wouldn't work: Retina's error messages are very detailed with timestamps, port numbers, and offsets so I wouldn't expect two similar errors to be bytewise identical. But maybe we can do some grouping, or the easy thing: just log one message per minute and assume it's representative.

@jlpoolen might have opinions.

@scottlamb scottlamb added enhancement rust Rust backend work required usability Usability / user interface improvements labels Aug 31, 2021
@jlpoolen
Copy link
Contributor

My philosophy on logs is that the more, the better. What is needed is a filter or something that extracts when may be important at hand. I think have statistics on failure is extremely helpful and could help in intelligently (human, not artificial!) determining parameters such as timeouts.

Manipulation and extraction is what the wise old programming language Perl excels at. I do not have a public facing server at this time, but have as a task to do so. I could try to make that a task for the upcoming weekend with holiday and it could be something where a large log is submitted and then parsed to extract what is desired and/or provide statistics. Of course, there are privacy issues.

Yes, a goal I have in mind is determining what the time-out for the Live555 server running in a Reolink camera is. Of course the easiest way is to formally ask Reolink. Scott had indicated he had made some inquiries, possibly related to something else about their Live555 server still not meeting standards.

If I had a better understanding of what is relevant in the log, I could craft a Perl script add it to my fork and then it could be run wherever against whatever. (I confess, pushing a single file, but not others, back to my fork in GitHub has gated me... hmmm... can I alternatively use Subversion??)

@scottlamb
Copy link
Owner Author

Yes, a goal I have in mind is determining what the time-out for the Live555 server running in a Reolink camera is.

That part's easy. It's 65 seconds. They tell us in the SETUP response.

Last I heard from Reolink: "According to our senior engineer, the verison of our RTSP indeed has not been updated. And he will analyze your log in detail, whick take some times. We will update you as soon as possible."

@jlpoolen
Copy link
Contributor

re: 65 seconds
That's a lifetime when catching someone purloining something. I'm repeating myself, but a 27 second video documented a Felony II theft where a car pulled up, a guy hopped out and grabbed a Stihl brand concrete saw from my contractor's truck and hopped back in the vehicle for a quick getaway. I saw another posting where someone's gardening tools worth several thousands of dollars were taken in or around 1 minute -- and that included actually breaking a door to a closed-up trailer. While it may be simple accept a 65 second pause in recording and that could be a short term works-for-me, it would be more desirable to try and salvage what comes across the wire and save it. You can go for days with nothing, and suddenly that 1 minute time span is everything. That's what makes this project so much more desirable than the software what comes with Reolink's cameras and its hit-or-miss activity feature.

@scottlamb
Copy link
Owner Author

Yeah, that level of delay is undesirable for sure. But we can discuss that more at scottlamb/retina#17; I'd like to keep this issue for the backoff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement rust Rust backend work required usability Usability / user interface improvements
Projects
None yet
Development

No branches or pull requests

2 participants