Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watchdog #50

Closed
subogero opened this issue May 5, 2016 · 79 comments
Closed

Watchdog #50

subogero opened this issue May 5, 2016 · 79 comments

Comments

@subogero
Copy link
Owner

subogero commented May 5, 2016

Run a periodic watchdog to detect and kill stale omxplayer instances that have ignored commands on STDIN.

@OMID-313
Copy link

OMID-313 commented May 5, 2016

Thanks @subogero for opening a new issue for watchdog problem.
I look hopefully forward to the solution to this video playing abnormalities caused mainly by omxplayer.

@OMID-313
Copy link

OMID-313 commented May 5, 2016

Useful info here:
Screenly/Anthias#21
They seem to have the same problem with omxplayer :D

@OMID-313
Copy link

OMID-313 commented May 9, 2016

Hi @subogero ,
I understand that you're too busy.
I just logged in to say keep up the good work. This omxd is great. Never give up.
Thanks man.

@OMID-313
Copy link

New news on this watch-dog method?
@subogero

@OMID-313
Copy link

OMID-313 commented Jun 5, 2016

I was hoping that the freeze problem of omxplayer had been solved via the raspberry-pi framework upgrade. But unfortunately, today when I was testing the system, the freeze problem happened again between the videos. Also, it caused the tv to show No HDMI Signal notification.

I had to use Next (omxd n) twice to un-freeze the display.

I would really really appreciate if you design this watchdog method to avoid these omxplayer crazy anomalies.

@subogero
Copy link
Owner Author

subogero commented Jun 5, 2016

I'm currently working on the watchdog.
On the other hand, the newest few omxplayer versions are totally crap. I've installed an old version manually. As you can see it's one year old!

apt-get update
apt-get install --reinstall omxplayer=0.3.6~git20150627~843744e

@OMID-313
Copy link

OMID-313 commented Jun 5, 2016

Thanks @subogero for keeping the good work.

You mean the older version works better and has less bugs (white flicker between videos, video freeze, etc.) !?

@subogero
Copy link
Owner Author

subogero commented Jun 5, 2016

The old version does not freeze at the end.

@OMID-313
Copy link

OMID-313 commented Jun 5, 2016

OK.
I'll try the older version.
Do I have to update the repository or the list before installing the older version?

@subogero
Copy link
Owner Author

subogero commented Jun 5, 2016

The bloody instructions are 6 lines above. Come on!!!

@OMID-313
Copy link

OMID-313 commented Jun 5, 2016

😋
I meant editing the repository files before running those commands.

Anyway, the omxplayer downdated (!!) to the older version.
Thanks.

Waiting enthusiastically to enable the watchdog method, so as to get rid of the omxplayer freeze problem.

@subogero
Copy link
Owner Author

subogero commented Jun 7, 2016

It seems that a newer version of omxplayer 0.3.6~git20160102~f544084 also does not freeze at the end of videos. But the newest one is bad.

@OMID-313
Copy link

OMID-313 commented Jun 7, 2016

Thanks @subogero for your info. I'll give it a try.

By the way, what do you mean by does not freeze at the end of videos ?
The omxplayer freezes the videos randomly whenever it likes! Beginning, middle, end, ...
Am I wrong?

@subogero
Copy link
Owner Author

subogero commented Jun 7, 2016

I only saw omxplayer 3.7-xxx hanging at the end of the video.

subogero added a commit that referenced this issue Jun 12, 2016
subogero added a commit that referenced this issue Jun 12, 2016
subogero added a commit that referenced this issue Jun 12, 2016
subogero added a commit that referenced this issue Jun 12, 2016
subogero added a commit that referenced this issue Jun 14, 2016
subogero added a commit that referenced this issue Jun 14, 2016
If current track has length (unlike internet radios), and play time
has exceeded it, kill its omxplayer.bin via omxwd script.

omxwd kills all omxplayer.bin processes in its own process group,
or just the child of an omxplayer PID supplied on command line.
@subogero
Copy link
Owner Author

Please test wd branch.

git fetch
git checkout wd
make
sudo make stop
sudo make install
sudo make start

@OMID-313
Copy link

Thanks @subogero for this omxwd .
I'll try it.

Just one question:
If omxplayer freezes, what does this wd do exactly? Does it switch to the next video?
(when it freezes, I manually do omxd n twice, so it solves the freeze and switched to the next video.)

@subogero
Copy link
Owner Author

Just click the last commit above and read the commit message.

subogero added a commit that referenced this issue Jun 14, 2016
Segfault both in client omxd S, and server when watchdog running.
@subogero
Copy link
Owner Author

Please use newest commit, it fixes client/server segfault.

git pull origin wd
make
...

@OMID-313
Copy link

Are the following process OK?

git fetch
git checkout wd
git pull origin wd
make
sudo make stop
sudo make install
sudo make start

@OMID-313
Copy link

OMID-313 commented Jun 15, 2016

I did the following:

git fetch
git pull origin wd
make
sudo make stop
sudo make install
sudo make start

reboot

And then I tested the system.

In order to reproduce the freeze conditions, I do omxd n several times very fast!
It freezes, then No HDMI Input comes to the screen, then after 10-20 seconds, a video comes to the screen from its middle frames (not its beginning).

I think the watchdog would be more applicable and useful if running in the background every 1 or 2 seconds.

subogero added a commit that referenced this issue Jun 15, 2016
Returning with -1 track length without closing the omxplayer.log.xxx
file was not a good idea.
While playing internet radios (length -1), the watchdog() called this
every 10s, opening more and more fds to the same log file, until the
entire system collapsed.
Fix: set ret val and break instead of return.
@subogero
Copy link
Owner Author

I've started testing with videos instead of mp3s, it's much more sensitive, I guess the videos cause much higher processor load.

I've reproduced the flickering screen when more videos are played at once. But the last commit improved it in a big way. The watchdog retries killing the target omxplayer.bin if not found the first time.

On the other hand, I could not reproduce the SIGFTE.

@OMID-313
Copy link

I updated to the latest commit via:

git fetch
git pull origin wd
make
sudo make stop
sudo make install
sudo make start

reboot

After reboot, one of the videos (KK.mp4) which is the last(!!!) video in the playlist directory begins to play, then a white flicker, then the screen gets black, and it doesn't play any other videos.

The following are the status of omxd service, both during playing that video, and also after finishing that video and getting to the black screen:

root@raspberrypi:~# service omxd status

● omxd.service - omxd media player daemon
   Loaded: loaded (/usr/share/doc/omxd/omxd.service; enabled)
   Active: active (running) since Wed 2016-06-22 10:57:10 BST; 26s ago
  Process: 401 ExecStart=/usr/bin/omxd (code=exited, status=0/SUCCESS)
 Main PID: 402 (omxd)
   CGroup: /system.slice/omxd.service
           ├─402 /usr/bin/omxd
           ├─664 /bin/bash /usr/bin/omxplayer -I --vol 0 --blank --no-osd -olocal /usermedias/KK.mp4
           └─674 /usr/bin/omxplayer.bin -I --vol 0 --blank --no-osd -olocal /usermedias/KK.mp4

Jun 22 10:57:10 raspberrypi omxd[401]: omxd daemon started, PID 402

and

root@raspberrypi:~# service omxd status

● omxd.service - omxd media player daemon
   Loaded: loaded (/usr/share/doc/omxd/omxd.service; enabled)
   Active: failed (Result: signal) since Wed 2016-06-22 10:59:08 BST; 4s ago
  Process: 401 ExecStart=/usr/bin/omxd (code=exited, status=0/SUCCESS)
 Main PID: 402 (code=killed, signal=FPE)

Jun 22 10:57:10 raspberrypi omxd[401]: omxd daemon started, PID 402
Jun 22 10:59:08 raspberrypi systemd[1]: omxd.service: main process exited, code=killed, status=8/FPE
Jun 22 10:59:08 raspberrypi systemd[1]: Unit omxd.service entered failed state.

and the following are the last lines of /var/log/omxlog :

...
...
1466589433 main: A /usermedias/JJ.mp4
1466589433 player_off: PID 622
1466589433 omxwd: kill omxplayer.bin 639 of omxplayer 622
1466589433 player_quit: PID=600 with 137
1466589433 player_new: PID=643 /usermedias/JJ.mp4
1466589433 main: A /usermedias/KK.mp4
1466589433 player_off: PID 643
1466589433 omxwd: kill omxplayer.bin 660 of omxplayer 643
1466589433 player_quit: PID=622 with 137
1466589433 player_new: PID=664 /usermedias/KK.mp4
1466589433 player_quit: PID=643 with 137
1466589548 player_quit: PID=664 with 0

@OMID-313
Copy link

The same scenario happens when I run the script manually:

. /path/to/myscript.sh

@subogero
Copy link
Owner Author

Please post the log from the last daemonize line.

@OMID-313
Copy link

The following are the logs added to /var/log/omxlog after reboot:

1466590933 daemonize: omxd started, SID 334
1466590937 daemonize: omxd started, SID 400
1466590937 main: X
1466590937 main: j
1466590937 main: O --blank
1466590937 main: .
1466590937 main: O --no-osd
1466590937 main: .
1466590937 main: A /usermedias/AA.mp4
1466590937 player_new: PID=408 /usermedias/AA.mp4
1466590937 main: A /usermedias/AAB.mp4
1466590937 player_off: PID 408
1466590937 omxwd: kill omxplayer.bin 426 of omxplayer 408
1466590938 player_new: PID=430 /usermedias/AAB.mp4
1466590938 main: A /usermedias/AAC.mp4
1466590938 player_off: PID 430
1466590938 omxwd: kill omxplayer.bin 447 of omxplayer 430
1466590938 player_quit: PID=408 with 137
1466590938 player_new: PID=451 /usermedias/AAC.mp4
1466590938 main: A /usermedias/AAD.mp4
1466590938 player_off: PID 451
1466590938 omxwd: kill omxplayer.bin 468 of omxplayer 451
1466590938 player_quit: PID=430 with 137
1466590938 player_new: PID=472 /usermedias/AAD.mp4
1466590938 main: A /usermedias/BB.mp4
1466590938 player_off: PID 472
1466590938 omxwd: kill omxplayer.bin 489 of omxplayer 472
1466590938 player_quit: PID=451 with 137
1466590938 player_new: PID=493 /usermedias/BB.mp4
1466590938 main: A /usermedias/DD.mp4
1466590938 player_off: PID 493
1466590938 omxwd: kill omxplayer.bin 510 of omxplayer 493
1466590938 player_quit: PID=472 with 137
1466590938 player_new: PID=514 /usermedias/DD.mp4
1466590938 main: A /usermedias/EE.mp4
1466590938 player_off: PID 514
1466590938 omxwd: kill omxplayer.bin 531 of omxplayer 514
1466590939 player_quit: PID=493 with 137
1466590939 player_new: PID=535 /usermedias/EE.mp4
1466590939 main: A /usermedias/FF.mp4
1466590939 player_off: PID 535
1466590939 omxwd: kill omxplayer.bin 552 of omxplayer 535
1466590939 player_quit: PID=514 with 137
1466590939 player_new: PID=556 /usermedias/FF.mp4
1466590939 main: A /usermedias/GG.mp4
1466590939 player_off: PID 556
1466590939 omxwd: kill omxplayer.bin 573 of omxplayer 556
1466590939 player_quit: PID=535 with 137
1466590939 player_new: PID=577 /usermedias/GG.mp4
1466590939 main: A /usermedias/HH.mp4
1466590939 player_off: PID 577
1466590939 omxwd: kill omxplayer.bin 594 of omxplayer 577
1466590939 player_quit: PID=556 with 137
1466590939 player_new: PID=598 /usermedias/HH.mp4
1466590939 main: A /usermedias/II.mp4
1466590939 player_off: PID 598
1466590939 omxwd: kill omxplayer.bin 616 of omxplayer 598
1466590939 player_quit: PID=577 with 137
1466590939 player_new: PID=620 /usermedias/II.mp4
1466590939 main: A /usermedias/JJ.mp4
1466590939 player_off: PID 620
1466590939 omxwd: kill omxplayer.bin 637 of omxplayer 620
1466590940 player_quit: PID=598 with 137
1466590940 player_new: PID=641 /usermedias/JJ.mp4
1466590940 main: A /usermedias/KK.mp4
1466590940 player_off: PID 641
1466590940 omxwd: kill omxplayer.bin 658 of omxplayer 641
1466590940 player_quit: PID=620 with 137
1466590940 player_new: PID=662 /usermedias/KK.mp4
1466590940 player_quit: PID=641 with 137
1466657982 player_quit: PID=662 with 0

@subogero
Copy link
Owner Author

Your omxd A works differently than mine. Yours always destroys the current track as if it was doing an implicit n. But it should not. The very first A should start the first track, the second one should start the next track paused, and from then on they should just append to the playlist and leave the omxplayers alone. This is what mine does.

Are you sure you're running the same code? What are the outputs of the following three commands?

make ps
git status
git rev-parse HEAD

@OMID-313
Copy link

root@raspberrypi:~# cd omxd

root@raspberrypi:~/omxd# make ps

pstree -pu | grep omx
bash: pstree: command not found
Makefile:57: recipe for target 'ps' failed
make: *** [ps] Error 1

root@raspberrypi:~/omxd# git status

On branch wd
Your branch is up-to-date with 'origin/wd'.
nothing to commit, working directory clean

root@raspberrypi:~/omxd# git rev-parse HEAD

496cbec88982e1ab4276a9702f602c0f9f9a9529

I think I'd better re-write my microSD card with my stable backup, then update omxd to the latest commit.

@OMID-313
Copy link

I wrote the latest backup to microSD, and then updated the omxd to the latest commit as the following:

root@raspberrypi:~# cd omxd

root@raspberrypi:~/omxd# ls

client.c  init       Makefile  omxd.1  omxd_help.h  postrm   README         utils.c
COPYING   INSTALL    m_list.c  omxd.c  player.c     preinst  test_m_list.c  version.h
HACKING   logrotate  omxd      omxd.h  postinst     prerm    test_utils.c   version.txt

root@raspberrypi:~/omxd# git fetch

remote: Counting objects: 72, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 72 (delta 15), reused 15 (delta 15), pack-reused 54
Unpacking objects: 100% (72/72), done.
From https://github.com/subogero/omxd
   420ea4f..a32d806  master     -> origin/master
 * [new branch]      wd         -> origin/wd
 * [new tag]         1.7        -> 1.7

root@raspberrypi:~/omxd# git pull origin wd

From https://github.com/subogero/omxd
 * branch            wd         -> FETCH_HEAD
Updating 420ea4f..496cbec
Fast-forward
 Makefile     |   6 +++++-
 client.c     |  73 +++++++++++++++++++++++++++++++++++++++++++++++----------------------
 m_list.c     |  10 ++++++----
 omxd.c       |  27 +++++++++++++++++---------
 omxd.h       |   2 ++
 omxd.service |  11 +++++++++++
 omxwd        |  45 +++++++++++++++++++++++++++++++++++++++++++
 player.c     | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------
 postinst     |  15 +++++++++++----
 postrm       |  13 ++++++++++---
 preinst      |   2 ++
 version.txt  |   2 +-
 12 files changed, 242 insertions(+), 74 deletions(-)
 create mode 100644 omxd.service
 create mode 100755 omxwd

root@raspberrypi:~/omxd# make

sed -rn '1,/^\.$/ s/^(.*)$/"\1\\n"/p' README >omxd_help.h
gcc -g -o omxd omxd.c client.c player.c utils.c m_list.c

root@raspberrypi:~/omxd# make stop

./prerm
./postrm
Running omxd postrm
572
Synchronizing state for omxd.service with sysvinit using update-rc.d...
Executing /usr/sbin/update-rc.d omxd defaults
insserv: warning: current start runlevel(s) (empty) of script `omxd' overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `omxd' overrides LSB defaults (0 1 6).
Executing /usr/sbin/update-rc.d omxd disable
insserv: warning: current start runlevel(s) (empty) of script `omxd' overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `omxd' overrides LSB defaults (0 1 6).
Removed symlink /etc/systemd/system/omxd.service.
killall omxd
bash: killall: command not found
Makefile:41: recipe for target 'stop' failed
make: [stop] Error 127 (ignored)
killall omxplayer.bin
bash: killall: command not found
Makefile:41: recipe for target 'stop' failed
make: [stop] Error 127 (ignored)

root@raspberrypi:~/omxd# make install

./preinst
Running omxd preinst
Unknown operation 'omxd'.
cp omxd /usr/bin
cp omxd.1 /usr/share/man/man1/
mkdir -p /usr/share/doc/omxd
cp init /usr/share/doc/omxd/
cp omxd.service /usr/share/doc/omxd/
cp logrotate /usr/share/doc/omxd/
cp omxwd /usr/bin
perl -lne 'print unless /^omxd$/' -i /etc/rc.local # Auto migrate from rc.local

root@raspberrypi:~/omxd# make start

./postinst
Running omxd postinst
572
Created symlink from /etc/systemd/system/multi-user.target.wants/omxd.service to /usr/share/doc/omxd/omxd.service.
Created symlink from /etc/systemd/system/omxd.service to /usr/share/doc/omxd/omxd.service.

The last command (make start) took almost 5 minutes!!
Anyway, are these warnings and errors normal during the make process?

@OMID-313
Copy link

OMID-313 commented Jun 23, 2016

OK. Now it works.

Here are the logs from /var/log/omxlog :

1466670503 daemonize: omxd started, SID 335
1466670507 daemonize: omxd started, SID 403
1466670507 main: X
1466670507 main: j
1466670507 main: O --blank
1466670507 main: .
1466670507 main: O --no-osd
1466670507 main: .
1466670507 main: A /usermedias/AA.mp4
1466670507 player_new: PID=414 /usermedias/AA.mp4
1466670507 main: A /usermedias/BB.mp4
1466670507 player_new: PID=415 /usermedias/BB.mp4
1466670507 main: A /usermedias/CC.mp4
1466670507 main: A /usermedias/DD.mp4
1466670507 main: A /usermedias/EE.mp4
1466670507 main: A /usermedias/FF.mp4
1466670507 main: A /usermedias/GG.mp4
1466670507 main: A /usermedias/HH.mp4
1466670507 main: A /usermedias/II.mp4
1466670507 main: A /usermedias/JJ.mp4
1466670507 main: A /usermedias/KK.mp4
1466670507 main: A /usermedias/LL.mp4
1466670507 main: A /usermedias/MM.mp4
1466670507 main: A /usermedias/NN.mp4
1466670507 main: A /usermedias/OO.mp4
1466670507 main: A /usermedias/PP.mp4
1466670523 player_quit: PID=414 with 0
1466670523 player_cmd: Send p to omxplayer PID/fd 415/4
1466670523 player_new: PID=546 /usermedias/CC.mp4
1466670551 player_quit: PID=415 with 0
1466670551 player_cmd: Send p to omxplayer PID/fd 546/1
1466670551 player_new: PID=586 /usermedias/DD.mp4
1466670585 player_quit: PID=546 with 0
1466670585 player_cmd: Send p to omxplayer PID/fd 586/3
1466670585 player_new: PID=608 /usermedias/EE.mp4
1466670625 player_quit: PID=586 with 0
1466670625 player_cmd: Send p to omxplayer PID/fd 608/1
1466670625 player_new: PID=626 /usermedias/FF.mp4
1466670662 player_quit: PID=608 with 0
1466670662 player_cmd: Send p to omxplayer PID/fd 626/3
1466670662 player_new: PID=644 /usermedias/GG.mp4

So, back to the huge-breakthrough commit.
We can continue from there!

@OMID-313
Copy link

I tested it with both omxd n and omxd N commands.
They both work perfectly now.

Now, how can I see what will happen if omxplayer freezes?
How can I reproduce the freeze situation?

subogero added a commit that referenced this issue Jun 23, 2016
subogero added a commit that referenced this issue Jun 23, 2016
player.c:
1st watchdog start to init(), which is called at all main entry points.
Simplify omxplayer track/play time extraction.

omxwd: also log attempts to kill specific PIDs.
@subogero
Copy link
Owner Author

I found another bug with the periodic watchdog, when there were options added before the first playlist tracks. See last commit.

I tested it by editing the /usr/bin/omxplayer script and adding a delay before the start of omxplayer.bin, around line 68 of the file:

sleep 10
LD_LIBRARY_PATH="$OMXPLAYER_LIBS${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" $OMXPLAYER_BIN "$@"

@subogero
Copy link
Owner Author

This way, each track plays 10s longer than the track length, so the watchdog will shut down the track a bit before the end. Test only works with the 1st track of the playlist. Don't forget to restore /usr/bin/omxplayer after the test.

@subogero
Copy link
Owner Author

Another new commit on branch wd related to issue #56. Stange behaviour for command A fixed.

@OMID-313
Copy link

Can I test the new commits via the following?

git fetch
git pull origin wd
make
sudo make stop
sudo make install
sudo make start

reboot

@subogero
Copy link
Owner Author

git pull always does an implicit fetch, and there is make restart for all three above. And why do you need reboot? Just run your script.

git pull origin wd
make
sudo make restart

@OMID-313
Copy link

I tested adding sleep 10 to /usr/bin/omxplayer (around line 68) .

It affected the behavior of 1st video in the playlist. Sometimes it wold freeze, and then started the 2nd video from middle. Sometimes it would stop 1st video after 2 seconds, and started 2nd video.

@OMID-313
Copy link

I restored /usr/bin/omxplayer to the default.

@OMID-313
Copy link

After all these commits, the omxd n and omxd N both work perfectly, even on fast hits. This is really great.
Although, I haven't found a way to reproduce the freeze situation, so as to see how will this watchdog act in random freeze situations (which happen when e.g. the system is running for a few hours).

@subogero
Copy link
Owner Author

When you added sleep 10 to omxplayer, that was exactly the expected behaviour. It also demonstrates that the watchdog works, because it shuts down a e.g. 10s long video a bit after 10s, even if omxplayer added 10s to its runtime, increasing its length to 20s.

The reason this test works only for the 1st video is that all other videos are started earlier and paused, so the sleep 10 does not affect them.

So things look good. If no other issue comes up, I'll merge soon and release version 1.8

@OMID-313
Copy link

Noticed a new minor bug:
On the first boot after updating the playlist videos (deleting the old ones and copying new ones), some strange things happen to 1st or 2nd video of the playlist:
Sometimes the 2nd video stops after 2-3 seconds, and the 3rd video starts to play.
Or
Sometimes, the 1st video stops on 3-4 seconds remaining to the end, and the 2nd video starts to play.

These things happen only on the first boot after updating the playlist. And as I said, it's minor, and easy to ignore; but the info might help solve other bugs.

Thanks for your support.

@subogero
Copy link
Owner Author

Is it possible that it takes longer to start videos after the reboot? Because the watchdog does not consider that. It has no mercy! :-)

@OMID-313
Copy link

I tested it by running the script manually. I didn't see any problem.
I don't understand!!
How is this problem related to running the script automatically on boot !!?

@subogero
Copy link
Owner Author

Because there is much more processor load during boot. Additionally, video related subsystems might not be fully initialized.

@OMID-313
Copy link

OMID-313 commented Jul 2, 2016

How much is the optimum GPU memory to set for playing videos with RPi 3?

@subogero
Copy link
Owner Author

subogero commented Jul 2, 2016

I don't have the faintest clue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants