Raspberry Pi freezes roughly after 1 day and 15 hours (on average) when playing audio files (*.wav) #247

Closed
hid3nax opened this Issue Jan 14, 2014 · 25 comments

Projects

None yet

8 participants

@hid3nax
hid3nax commented Jan 14, 2014

Raspberry Pi (running latest Raspbian with all updates) freezes roughly after 1 day and 15 hours (on average) when playing audio files (*.wav).

The system is running headless.
The system is running off a 3.5" HDD (powered separately).
The system does not have any devices connected except USB HDD and ethernet cable.
TP1 and TP2 Voltage is around 4.89 V.
Analogue output is being used, routed to an amplifier.
'ifplugd' package was removed/purged to save some interrupts.
The system is most up to date (apt-get update, upgrade). The kernel is latest version for now:
Linux safira 3.10.25+ #622 PREEMPT Fri Jan 3 18:41:00 GMT 2014 armv6l GNU/Linux

The mpd player is used to play *.wav files.
The music is being played only on time between 9AM till 8 PM. The Pi hangs only when playing music, on the 2nd day of uptime.

What has been done to eliminate the issue (didn't help anything of the listed):
The mpd is being stopped and started nightly by a cron script.
The music files are being played from USB HDD. I've moved it to SD card but that didn't help.
Swap space has been increased to 200 MB.
minfreekbytes in sysctl was increased to 16 MB.
The following sysctl values were set:
kernel.panic = 4
kernel.panic_on_oops = 3
Unfortunately, the Pi does not reboot after the hang up.

The only way to recover the Raspberry Pi again is to unplug and then plug the power.

After the reboot, neither syslog (syslog/messages/kern.log/auth.log or mpd.log) do not report anything.

The system is being graphed in Cacti at 5 minute intervals. The last values before the system hangs do not report anything unusual (memory/cpu/interrupts are in normal range).

If any further information or output is needed, please reply.

@popcornmix
Contributor

I'd be interested if you can test with different wav files.
If you repeatedly play a 1 second wav file and it fails much more quickly, then it might suggest that something is being leaked on each play and we die when that is exhausted.
Tracking the number of files played before it dies may be interesting (if it was a number like 512, then it may give a hint as to what is being exhausted).

@hid3nax
hid3nax commented Jan 14, 2014

What I discovered during today's hang up was that the wav files were placed on flash while the system itself was running off USB HDD. The Pi crashed today around midday.

This evening I drove to where the Pi is. Surprisingly, it was still playing music but the system was unaccessible (no response to ping, etc.)

Cron job was supposed to stop the music at certain time. (it has never failed to do that before) I decided to wait until that time. And as expected, the cron job didn't stop the music and it still played. I guess this means that the whole system is crashed/frozen while the mpd daemon (which resides in ram) is still alive and is playing wav files from flash which might also be accessible.

I guess this is something related to USB, Ethernet. Any hints on how to troubleshoot further?

FWIW, I'll also try to "count" the files played, as well as try to play 1 sec wav file till infinity.

Thanks.

@popcornmix
Contributor

Can you measure the voltage when system is under load (perhaps when copying files on USB).
Any errors in dmesg after it's gone bad?
Can you narrow it down? i.e. unplug the USB drive and run from sdcard?

@dos1
dos1 commented Jan 15, 2014

I have had similar experiences with rootfs on USB HDD few months ago, without playing any music. When it happens while being logged in via ssh with htop running, it starts to show massive load values until it hangs completely.

I haven't noticed any consistency - it just happened around once per day. Sometimes it was working whole day, but then I found it broken after waking up next day. I managed to get watchdog to restart the device when it hangs, but after that it's not able to mount rootfs until the power is plugged off and on again, so after automatic reboot it was stuck in reboot loop.

Eventually I became tired of that, so now my Pi has been off for a while. I'll try to get some more details soon - I remember that there were some kernel messages on screen when I connected the monitor. Unfortunately, I haven't noted them down.

@hid3nax
hid3nax commented Jan 15, 2014

Thanks, I will measure the voltage when under load and report back.

Unfortunately, the system is headless and I can't see anything. SSH is unreachable after the hang up. But once I caught a moment when I accidently typed dmesg just 5 minutes before the hang up. It showed no errors. Last message in there was from the boot process.

dos1, it would be great if you'd find your Pi and test it. It's possible we're having the same issue.

@hid3nax
hid3nax commented Jan 16, 2014

4.84 V under load.
1 second wav file was playing for 1 day and 3 hours continiously but didn't hang up.
I had to power off the Pi to swap for a 5V 1A PSU off an iPad.

@popcornmix
Contributor

We need the dmesg log after failure not before.
I think you'll need to attach a screen or UART.

@hid3nax
hid3nax commented Jan 27, 2014

Unfortunately I was unable to capture the dmesg output after the failure. Display did not respond after the RPi got crashed.
I have downgraded the kernel to this one from github:
Linux safira 3.6.11+ #462 PREEMPT Mon Jun 3 22:15:00 BST 2013 armv6l GNU/Linux

It has been already 3 days and no crashes so far, with the same PSU and the same all configuration. It looks like there is the issue with the latest firmware.

@hid3nax
hid3nax commented Jan 31, 2014

Once again, confirming, looks like this is related to the firmware/kernel rather than the Power Supply or the hardware itself; It has just beaten 7 days and 20 hours of uptime/stability. Hope it won't crash anymore.

@popcornmix
Contributor

We can't do anything without a dmesg log.
Can you check /var/log/messages. I believe that will contain dmesg logs from previous boots.
If it goes back far enough, it may have information on your last crash.
Also /var/log/dmesg.X should contain dmesg logs from the last few boots.

If not, then updating to latest kernel, and looking in /var/log after the crash may shed some light.

@trasferetti

having the same problem on a system with a pair of relays, a temperature sensor and a USB wi-fi, besides the SD card running the last kernel.
it seems to be a problem with the Broadcom BCM2835 SoC hanging up due to eventual lack of juice.
I'm testing this out:
http://blog.ricardoarturocabral.com/2013/01/auto-reboot-hung-raspberry-pi-using-on.html

@hid3nax
hid3nax commented Apr 4, 2014

If you're interested, I solved the problem completely by downgrading the kernel/firmware to " 3.6.11+ #462 PREEMPT Mon Jun 3 22:15:00 BST 2013 armv6l GNU/Linux". No more crashes, no more headache, no more driving miles away from home every 2 days to reboot the Pi.

I was NOT able to capture any error messages on the original/community kernel.

FWIW, this is definitely NOT a lack-of-power related problem. I highly suspect it's related to usb core/modules/drivers/etc

@popcornmix
Contributor

I'm glad you've found a workaround.
We can't do anything to help without any logs showing how it crashed.
I'll close this issue. If anyone has a related problem and can supply dmesg logs when it crashes, then please open a new issue.

@popcornmix popcornmix closed this Apr 4, 2014
@braincrash

@hid3nax, can you give me the string from rpi-firmware? So I can test it on mine?
Thanks

@braincrash

@hid3nax, can you give me the string from rpi-firmware? So I can test it on mine?
Thanks

@J-e-f-f-A

I have a phone screening system that I've built that will either lock up after a few days, or I lose the USB accessories (Audio, Wifi and USB modem, which are all on a powered USB hub) and my program will cease to function since it's interfaces to the 'real world' have gone away...

I can also consistently get it to lock up just doing an 'aplay' of a wav file without forcing direct hardware access (IE: without the "-D plughw:0,0" option, it locks up most of the time.)

My only preventative measure thus far has been to re-boot it every couple of days to keep the system 'clean' and 'stable'.

I'm running Raspibian on a model B with the latest updates to raspibian and the latest firmware. (well, as of a week ago now).

This has been happening all along since I started this project a few months ago, and I just haven't had much time to troubleshoot it further. I had planned on trying a different Linux distribution, but haven't gotten around to that yet.

I'll try to force the issue (via aplay) and open a new ticket with my logs for analysis...

And perhaps after I create that ticket, I'll fall back to the kernel version mentioned by hid3nax so my system will be stable and reliable...

Jeff

@hid3nax
hid3nax commented Jun 21, 2014

@braincrash , what do you mean a string from firmware?
At the moment I am running this firmware and it seems to be very stable (uptime has reached 95 days at one moment):
uname -a
Linux safira 3.6.11+ #462 PREEMPT Mon Jun 3 22:15:00 BST 2013 armv6l GNU/Linux
cat /boot/.firmware_revision
a1a99df049176671fdfd5b0f6629fc52e7c71d31

@hid3nax
hid3nax commented Jun 21, 2014

@J-e-f-f-A yes, this seems to be VERY related to USB. I have made an experiment with the latest firmware once: I played wav files from the SD card while the system was running from the USB drive. Surprisingly, the system has crashed (no ping, no ssh no nothing) but the music still continued to play.

My suggestion for you would be to try to "downgrade" to this firmware revision: (I have pasted the code in pastie.org because GitHub converts it to an URL: http://pastie.org/private/ylyjiw2nb3ir17j8sedvmg )

The command to do that is 'rpi-update <the_code_above>'
Then reboot.

Please report back if that works for you stable! Thanks!

@braincrash

@hid3nax, it's the commit hash, I think you have post it above, will check it :)
Thanks.

@trasferetti

found some workarounds here:

http://iqjar.com/jar/raspberry-pi-rebooting-itself-when-it-becomes-unreachable-from-outside-networks/

Pedro Ivo Trasferetti von Ah

pedro.ivo.trasferetti@gmail.com arroubapedroivo@gmail.com
fone: 11 97662 5505

On Sat, Jun 21, 2014 at 10:47 AM, braincrash notifications@github.com
wrote:

@hid3nax https://github.com/hid3nax, it's the commit hash, I think you
have post it above, will check it :)
Thanks.


Reply to this email directly or view it on GitHub
#247 (comment)
.

@hid3nax
hid3nax commented Jul 29, 2014

Probably that won't work.
As the Pi's userland freezes completely. E.g. cron jobs do not execute, you cannot access any running services on it (web, ssh, etc).

@J-e-f-f-A

My workaround is to hot-plug my USB devices every day (Modem, Wireless &
Sound card) - then restart my program to re-connect to the modem after
hot-plugging it.
Note: Hot-plugging the USB hub that they are on does NOT work for some
reason. I have to hotplug each device.
Also note that I haven't done further testing on the 'hard' lockups I get
if playing audio without specifying the HARDWARE device name... This is a
'live' phone screening system, so I don't want to break it, lol.
Jeff
On Jul 29, 2014 12:37 PM, "hid3nax" notifications@github.com wrote:

Probably that won't work.
As the Pi's userland freezes completely. E.g. cron jobs do not execute,
you cannot access any running services on it (web, ssh, etc).


Reply to this email directly or view it on GitHub
#247 (comment)
.

@kprkpr
kprkpr commented May 6, 2015

For know to all lastest commit of 3.6.11+ Rpi Hexxeh , they took me a lot of time for find out..
Maybe anyone are searching this in the web ;P
https://github.com/Hexxeh/rpi-firmware/tree/8234d5148aded657760e9ecd622f324d140ae891

sudo rpi-update 8234d5148aded657760e9ecd622f324d140ae891

uname -a
Linux raspberrypi 3.6.11+ #557 PREEMPT Wed Oct 2 18:49:09 BST 2013 armv6l GNU/Linux

@pelwell
Contributor
pelwell commented May 6, 2015

What are you saying about that commit? Is it good or bad?

@kprkpr
kprkpr commented May 6, 2015

Well..Im sharing this because up it talks about 3.6.11+ doesnt have this bugs, and I thought is easier to install having the commit and command

Before I tried with pastebin attached up, but they didnt worked for me

Im trying it right now, im not sure about it works, but if it's a bug of 3.10.y kernels, this may work (Trying with mpd,streaming, but I have same bug as that)
Edit: wifi remains shutting down and rpi stops.. ignore it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment