Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After activated RTSP/TCP the satip box freeze every day! #43

Closed
Imerwin opened this issue Feb 2, 2016 · 27 comments
Closed

After activated RTSP/TCP the satip box freeze every day! #43

Imerwin opened this issue Feb 2, 2016 · 27 comments
Labels

Comments

@Imerwin
Copy link

Imerwin commented Feb 2, 2016

Hi,
I have a problem with the Tvheadend RTSP/TCP function.
If I activate this function my satip box crashes minimum one time per day.
When the box crashes, the network is unreachable but the led with the last tuner state is shining.

If I use RTSP with UDP then I can watch 2 HD Streams without a problem after I start a third stream I get a lot of continuity errors but the box don’t crash.

With RTSP/TCP I can watch 4 HD Streams or more without continuity errors.

I use:
HTS Tvheadend 4.1-1242~ge14c134
Inverto IDL-400s with satip-axe-201508061051-10

my summary:
own VLAN only for satip and tvheadend or a new network cable – same problem

I cannot exactly describe when the box goes down.
I can watch 3 or 4 HD streams over hours with no problem but it’s also possible to watch 1 stream and the box crashes between 2 and 6 hours.

Maybe someone can help me.
Thanks!

@CvH
Copy link

CvH commented Feb 2, 2016

look at #34 if this is similar
my gss.box is now defect again (hw fault - dies after 1-2min work), symptoms was regular drop out of the tuners (reboot all okay) and increasing continuous errors.

@perexg
Copy link
Owner

perexg commented Feb 2, 2016

If it's software (kernel in this case) fault, the serial console might show something.

Otherwise, it's very difficult to determine the fault like this.

I have my two boxes open with fan and all major chips (CPU/network/demodulators/tuners) have heatsinks, too. But saying that, I won't force anyone to break the warranty.

http://postimg.org/image/aaya8pnb5/ - circles marks added / modified heatsinks.

@Imerwin
Copy link
Author

Imerwin commented Feb 8, 2016

Thanks for the quick responses.

Yesterday I connect the serial cable over a usb adapter to a server.
This is the output without a crash:

�������[10;13H�򔺻V퀿����������߳5A!�5#ߝ欿�����������߳5A!�5#��߿퀿����������߳5A!�5#뿿ڥ퀿����������߳5A!�5#忿ࠥ쾽��10;117H��0;117H��0;117H��0;117H��0;117H��0;117H��0;117H��0;117H��0;117H�[11;117H��[11;117H򛛱1;117HT�[11;117H򛛱1;117H󛛱1;117HV�[11;117H뜛11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117Hߛ[11;117H3�[11;117H5�[11;117HA�[11;117H!�[11;117H��[11;117H5�[11;117H#�[11;117H忿�[11;117Hݛ[11;117H圛11;117H뜛11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117Hߛ[11;117H3�[11;117H5�[11;117HA�[11;117H!�[11;117H��[11;117H5�[11;117H#�[11;117H忿�[11;117H뜛11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117Hߛ[11;117H3�[11;117H5�[11;117HA�[11;117H!�[11;117H��[11;117H5�[11;117H#�[11;117H뿿�[11;117Hٛ[11;117H圛11;117H뜛11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117H��[11;117Hߛ[11;117H3�[11;117H5�[11;117HA�[11;117H!�[11;117H��[11;117H5�[11;117H#�[11;117H忿�[11;117Hߛ[11;117H

Is this output normal or is it needed to configure something?
Thanks!

@satip
Copy link

satip commented Feb 8, 2016

Hi

same problem here (same inverto box)
changed original-firmware to satip-axe to get more quality of data
with original-firmware: many continuity errors
with satip-axe and enabled "RTSP/TCP"-feature : marginal continuity errors but satip-device crashes

for me it was necessary to monitoring my satip-device with a simple fping and zabbix to get an info-mail on crash (see up/down-times on zabbix-picture)
http://postimg.org/image/pg3v1007x/

i dont think it is a temperature problem. my satip-device is installed near my multiswitch (spaun) with short cables (0,5m) on the loft (see temperature-picture)
http://postimg.org/image/equg1l0rb/

i also supporting a friend with the same configuration. his satip-box crashes as well (same inverto box)
http://postimg.org/image/6yjy8r40j/

making some stresstests does not crashes the box (mux-scans, recording four full-hd streams, trigger eit-epg)

but in my opinion, satip-box is more stable after cleaning not necessary mux, services and channels from tvheadend. after cleaning, my box was 7 days up-and-running (but yesterday it was crashing again)

after reading the post from Imerwin, i bought a "USB zu TTL-converter-Modul" to get a serial output from the satip box (but i has to install it)
i will post more infos, if the serial output is running and the box crashes again

Using:
tvheadend (daily build: 4.1-1477)
Inverto IDL-400s with: Welcome to SATIP-AXE server 201508061051-10

@perexg
Copy link
Owner

perexg commented Feb 8, 2016

Is this output normal or is it needed to configure something?

Configure your serial device properly. See the top-level README - https://github.com/perexg/satip-axe/blob/master/README.md

@perexg
Copy link
Owner

perexg commented Feb 8, 2016

but in my opinion, satip-box is more stable after cleaning not necessary mux, services and channels from tvheadend. after cleaning, my box was 7 days up-and-running (but yesterday it was crashing again)

If the bug is in the Inverto drivers, it would be very very difficult to fix this. But a captured crash from the serial console might help to identify the lock issue.

@jacotec
Copy link

jacotec commented Feb 8, 2016

Hi,

I can confirm this issue. Running my Digibit for weeks without any issue with satip-axe I've changed yesterday to RTSP/TCP to get rid of some continuity errors (which still appear even all is running on a switched gigabit network with high quality D-Link managed switches).

I had a complete crash 5 minutes ago, the box was not even reacting on a ping.

Switched back to UDP for now ...

@Imerwin
Copy link
Author

Imerwin commented Feb 9, 2016

Hi,
with a other serial adapter i can now connect me to the satip box.
My next problem is i can't start the debug firmware:

idl4k>
idl4k> set bootargs console=ttyAS0,115200
idl4k> set debugfw "debugfw=usb start;usb start;fatload usb 0 0x84000000 uimage.gz;set bootargs console=ttyAS0,115200 bigphysarea=20000;bootm 0x84000000"
idl4k> save
Saving Environment to SPI Serial Flash...
.
idl4k> run debugfw
ERROR: There is a global environment variable with the same name.
Unknown command 'start' - try 'help'
(Re)start USB...
USB: scanning bus for devices... 2 USB Device(s) found
scanning bus for storage devices... 1 Storage Device(s) found
reading uimage.gz

** Unable to read "uimage.gz" from usb 0:1 **

Booting image at 84000000 ...

Bad Magic Number
idl4k>

USB stick is (upper) connected with "satip-axe-201508061051-10.fw" and "satip-axe-201508061051-10.usb" renamed to idl4k.scr

It is possible to start the debug mode from the flash memory (without usb boot)?

Thanks!

@perexg
Copy link
Owner

perexg commented Feb 9, 2016

@Imerwin : Don't do anything with u-boot (don't press anything). Just keep it running - the box will boot to satip-axe as before, but you can see all kernel outputs.

The 'debugfw' command set is from Inverto guys - I don't use it for the satip-axe firmware.

@Imerwin
Copy link
Author

Imerwin commented Feb 10, 2016

Hi,
now i have a serial output after a crash.

Please press Enter to activate this console. ntpd: bad address '0.pool.ntp.org'
@@@@@@@@@@@@ fe_open 0
@@@@@@@@@@@@ fe_open 1
@@@@@@@@@@@@ fe_open 2
@@@@@@@@@@@@ fe_open 3
@@@@@@@@@@@@ fe_open 0
@@@@@@@@@@@@ fe_open 1
@@@@@@@@@@@@ fe_open 2
@@@@@@@@@@@@ fe_open 3
PHY: 1:01 - Link is Up - 1000/Full
ntpd: setting time to 2016-02-09 17:22:09.344555 (offset +34924912.466852s)
@@@@@@@@@@@@ fe_open 0
@@@@@@@@@@@@ fe_open 1
@@@@@@@@@@@@ fe_open 2
@@@@@@@@@@@@ fe_open 3
@@@@@@@@@@@@ fe_open 0
@@@@@@@@@@@@ fe_open 1
@@@@@@@@@@@@ fe_open 2
@@@@@@@@@@@@ fe_open 3
@@@@@@@@@@@@ fe_open 0
@@@@@@@@@@@@ fe_open 1
@@@@@@@@@@@@ fe_open 2
@@@@@@@@@@@@ fe_open 3
@@@@@@@@@@@@ fe_open 0
@@@@@@@@@@@@ fe_open 1
@@@@@@@@@@@@ fe_open 2
@@@@@@@@@@@@ fe_open 3
ntpd: reply from 80.64.132.152: delay 0.271055 is too high, ignoring
_here crashed the box_
ntpd: timed out waiting for 80.64.132.152, reach 0xfe, next query in 129s
ntpd: timed out waiting for 80.64.132.152, reach 0xfc, next query in 135s
ntpd: timed out waiting for 80.64.132.152, reach 0xf8, next query in 259s
ntpd: timed out waiting for 80.64.132.152, reach 0xf0, next query in 264s

After the crash the network is unreachable but the serial console work.

Ping to himself (127.0.0.1 and 10.10.10.115) work but to the gateway (10.10.10.1) not.

Here is the log file with what i have tried.
http://imerwin.com/tmp/satip_serial_output-09.02.2016.log

@satip
Copy link

satip commented Feb 10, 2016

Hi,

if it is a problem with the network card/driver, maybe we can getting more infos modifying driver parameters on kernel build:
https://www.kernel.org/doc/Documentation/networking/stmmac.txt

my box is running at this time. now iam also watching some network parameters/statistics:
cat /sys/devices/platform/stmmaceth.1/net/eth0/carrier
1
cat /sys/devices/platform/stmmaceth.1/net/eth0/statistics/tx_carrier_errors
4
cat /sys/devices/platform/stmmaceth.1/net/eth0/statistics/tx_errors
4

@perexg
Copy link
Owner

perexg commented Feb 10, 2016

I agree, it looks like a network driver/hw issue.

You may try 'ifconfig eth0 down ; ifconfig eth0 up' sequence to restart the network driver. Also, you may try to force 100M speed through ethtool (just for tests or you may put this settings to the shell script /etc/sysconfig/localboot). I'll look if I can backport some ethernet driver changes from the stlinux kernel.

@Imerwin
Copy link
Author

Imerwin commented Feb 11, 2016

Hi,
after the crash today i see more output:

NETDEV WATCHDOG: eth0 (stmmaceth): transmit queue 0 timed out
------------[ cut here ]------------
Badness at 812a5818 [verbose debug info unavailable]

Pid : 0, Comm: swapper
CPU : 0 Tainted: P (2.6.32.42_stm24_0208-idl4k_7108 #43)

PC is at dev_watchdog+0x2d8/0x320
PR is at dev_watchdog+0x2d8/0x320
PC : 812a5818 SP : 8141be64 SR : 40008001 TEA : c1a8d120
R0 : 00000044 R1 : 8141d420 R2 : 00000000 R3 : 00000000
R4 : 81421374 R5 : 00003fd3 R6 : ffffffff R7 : 82805b54
R8 : 9d1e9800 R9 : 8141be78 R10 : 8141d420 R11 : 814222e8
R12 : 82914ee8 R13 : 0000017c R14 : ffffffff
MACH: 005c5a40 MACL: 00000000 GBR : 29704470 PR : 812a5818

Call trace:
[<81345680>] preempt_schedule+0x0/0x60
[<81010068>] idl4k_switch_standby_led+0x28/0x40
[<81011fc6>] try_to_wake_up+0x166/0x1a0
[<8102254e>] mod_timer+0x2e/0x12c
[<8102261c>] mod_timer+0xfc/0x12c
[<810221dc>] run_timer_softirq+0xbc/0x1fc
[<812a5540>] dev_watchdog+0x0/0x320
[<8101d554>] __do_softirq+0x74/0x120
[<8101d64e>] do_softirq+0x4e/0x80
[<811cee20>] strlen+0x0/0x58
[<81002240>] default_idle+0x0/0xe0
[<8101da54>] irq_exit+0x34/0x60
[<81002b18>] do_IRQ+0x38/0x60
[<81007766>] ret_from_irq+0x0/0x1a
[<81002ae0>] do_IRQ+0x0/0x60
[<81078820>] quicklist_trim+0x0/0x150
[<81344f80>] schedule+0x0/0x5a0
[<81002240>] default_idle+0x0/0xe0
[<811cee20>] strlen+0x0/0x58
[<81002258>] default_idle+0x18/0xe0
[<8100236c>] cpu_idle+0x4c/0xa0
[<8145b7fc>] start_kernel+0x448/0x4e4
[<8145b154>] unknown_bootoption+0x0/0x1f0

If I manually disconnect the network cable and reconnect the cable then comes this message:

PHY: 1:01 - Link is Down
PHY: 1:01 - Link is Up - 1000/Full

ifconfig eth0 down and then up also don't helps.

When I change the network speed i see this messages in the dmesg output:

PHY: 1:01 - Link is Down
PHY: 1:01 - Link is Up - 100/Full
ILC: spurious interrupt demux 64
ILC: inputs status enabled used
ILC: 0- 31: 04000016 a8800000 a8800000
ILC: 32- 63: 00004000 f8080844 f8080844
ILC: 64- 95: 00000000 00002001 00002001
ILC: 96-127: 00000000 00000000 00000000
ILC: 128-159: 00000000 0ffffffe 0ffffffe
ILC: 160-191: 00000400 00000000 00000000
PHY: 1:01 - Link is Down
PHY: 1:01 - Link is Up - 1000/Full
ILC: spurious interrupt demux 64
ILC: inputs status enabled used
ILC: 0- 31: 04000016 a8800000 a8800000
ILC: 32- 63: 00004000 f8080844 f8080844
ILC: 64- 95: 00000000 00002001 00002001
ILC: 96-127: 00000000 00000000 00000000
ILC: 128-159: 00000000 0ffffffe 0ffffffe
ILC: 160-191: 00000400 00000000 00000000
/ #

I have now changed the network speed to 100M let's see how long the box work.

http://imerwin.com/tmp/satip/satip_serial_output_11.02.2016.log

@satip
Copy link

satip commented Feb 11, 2016

Hi,

is there any possibility to compare limits/settings between the original firmware and satip-axe?
for example comparing "sysctl -a" and "cat /sys/module/stmmac/parameters/*"

i rebooted my box and changed some parameters to see, whats happend:
echo 512 > /sys/module/stmmac/parameters/dma_rxsize
echo 512 > /sys/module/stmmac/parameters/dma_txsize
echo 2048 > /sys/module/stmmac/parameters/buf_sz

let speed to 1Gbit/s

@perexg
Copy link
Owner

perexg commented Feb 11, 2016

@satip: You should probably do interface up / down sequence when you change this - these parameters are usually used only when the network interface is initialized.

@satip
Copy link

satip commented Feb 11, 2016

thx

ifconfig eth0 down && ifconfig eth0 up
dmesg tells me:
stmmac: Rx Checksum Offload Engine supported
TX Checksum insertion supported
ADDRCONF(NETDEV_UP): eth0: link is not ready
PHY: 1:01 - Link is Up - 1000/Full
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

checking parameters again:
cat /sys/module/stmmac/parameters/dma_rxsize
512
cat /sys/module/stmmac/parameters/dma_txsize
512
cat /sys/module/stmmac/parameters/buf_sz
2048

@perexg
Copy link
Owner

perexg commented Feb 11, 2016

Firmware with the updated stmmac driver from the latest stlinux kernel:
http://s000.tinyupload.com/index.php?file_id=00057549675274050513

Please, give me feedback.

Note: The updated driver has /sys/module/stmmac/parameters/wol_plus_en file (it does not exist in the old driver).

Note2: There's debugfs support:

mount -t debugfs debufs /sys/kernel/debug
cat /sys/kernel/debug/stmmaceth/descriptors_status

@perexg
Copy link
Owner

perexg commented Feb 12, 2016

I did some more hacking and here's new package (beta for 11 release):

http://s000.tinyupload.com/?file_id=78443875428972064634

List of changes:

https://github.com/perexg/satip-axe/blob/master/dist/README (upcoming)

Please, give me feedback.

I enabled Realtek PHY initialization code in this kernel too (it won't affect probably anything, but who knows).

@perexg perexg added the bug label Feb 12, 2016
@perexg
Copy link
Owner

perexg commented Feb 13, 2016

Unfortunately, the updated driver won't solve the locking issue. Trying with setup from @satip (added to /etc/sysconfig/config file - echo lines):

#
# IPV4 config
#
#IPV4="dhcp"
# or
IPV4="192.168.1.200"
IPV4_NETMASK="255.255.255.0"
IPV4_BROADCAST="192.168.100.255"
IPV4_GATEWAY="192.168.100.1"
echo "512" > /sys/module/stmmac/parameters/dma_rxsize
echo "512" > /sys/module/stmmac/parameters/dma_txsize
echo "8192" > /sys/module/stmmac/parameters/buf_sz

@perexg
Copy link
Owner

perexg commented Feb 13, 2016

Another lockup here... Trying:

ethtool -K eth0 sg off

@satip
Copy link

satip commented Feb 15, 2016

some informations from my side

friends box (SATIP-AXE server 201602121045-10):
~ # uname -av
Linux satip-axe 2.6.32.42_stm24_0208-idl4k_7108 #56 PREEMPT Fri Feb 12 10:45:32 CET 2016 sh4 GNU/Linux
~ # cat /sys/module/stmmac/parameters/dma_rxsize
256
~ # cat /sys/module/stmmac/parameters/dma_txsize
256
~ # cat /sys/module/stmmac/parameters/buf_sz
2048
~ # uptime
07:32:19 up 1 day, 6:57, 1 users, load average: 6.00, 6.00, 6.00

my box (SATIP-AXE server 201508061051-10):
~ # uname -av
Linux satip-axe 2.6.32.42_stm24_0208-idl4k_7108 #43 PREEMPT Thu Aug 6 10:51:35 CEST 2015 sh4 GNU/Linux
~ # cat /sys/module/stmmac/parameters/dma_rxsize
2048
~ # cat /sys/module/stmmac/parameters/dma_txsize
2048
~ # cat /sys/module/stmmac/parameters/buf_sz
2048
~ # uptime
07:27:35 up 3 days, 12:50, 1 users, load average: 6.99, 6.97, 6.81

attached serial console on both boxes. running "picocom" in a linux "screen" session.
still watching...

@Imerwin
Copy link
Author

Imerwin commented Feb 15, 2016

Hi,
yes with the new driver my box crashed 4 or 5 times in one day more then with the old driver.
Yesterday morning I add the new parameters and the box is now running since 24 hours without a crash.
With 100Mbit the box also don't crash (24 hours test)

@perexg
Copy link
Owner

perexg commented Feb 15, 2016

My testing: With scatter-gather off (ethtool -K eth0 sg off) I wasn't able to lock the ethernet with new fw in two days (did two stress tests and a standard operation). I will make 'sg off' as default in the driver and also I increase the DMA descriptors sizes to 512.

@perexg
Copy link
Owner

perexg commented Feb 16, 2016

For more recent pre-releases see issue #45 . The scatter-gather is off by default in the kernel.

@CvH
Copy link

CvH commented Feb 21, 2016

Even If it is not directly connected to this issue here, with the new official fw 1.24.0.156 I get continuity errors like if I disable TCP at axe (Raspberry Pi 2). Seems like they changed something. Semi-official V1.0.0.146 works like intended.

@satip
Copy link

satip commented Feb 22, 2016

hi,

for me, you can close this bug
changing networkcard-parameter did the trick
satip-box is up-and-running - with marginal data errors (see pictures of recordings)
my box: 10 days uptime with old firmware (201508061051-10)
http://s23.postimg.org/yn73nexej/data_errors_my_box.gif
(260 recordings in 10 days)

friends box: 5 days uptime with new firmware (201602121045-10)
http://s7.postimg.org/5gzlsauuz/data_errors_friend.gif
(200 recordings in 5 days)

THX for support

@perexg
Copy link
Owner

perexg commented Feb 22, 2016

Ok, closing - resolved in build 11 (issue #45).

@perexg perexg closed this as completed Feb 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants