Provide adjustment of TX/RX delay on sun50i*/sun8iw11p1 through boot script? #546

Closed
ThomasKaiser opened this Issue Nov 18, 2016 · 22 comments

Projects

None yet

4 participants

@ThomasKaiser
Collaborator
ThomasKaiser commented Nov 18, 2016 edited

Based on current testing it seems different Pine64+ hardware revisions might work better with different TX/RX delay settings. Since we're able to set custom values via fdt set command from boot script I thought I create this issue to provide a script to test boards and to collect results.

Based on results collected we could then allow individual settings in /boot/armbianEnv.txt or maybe auto detect board revisions and set up things accordingly.

The script as follows:

#!/bin/bash
#
# script intended to test through TX/RX parameters. Should be called from
# eg. /etc/rc.local with a short delay to ensure network is already up, eg.
# sleep 5 && /usr/local/bin/test-tx-rx.sh

TestPartner=192.168.83.146			# here 3 x 'iperf3 -s' must be running
TimeToTest=20						# how long should iperf3 run each
TX_File=/var/log/tx-value			# file containing actual tx value
RX_File=/var/log/rx-value			# file containing actual rx value
LogFile=/var/log/tx-rx.log			# result log
SourceDTS=/boot/new.dts				# source, must contain rx/tx set to 0!
TargetDTB=/boot/pine64-plus.dtb 	# target .dtb

Main() {
	CheckPrerequisits
	read TX <"${TX_File}"
	read RX <"${RX_File}"
	TX_Result=$(timeout -k $(( ${TimeToTest} + 2 )) $(( ${TimeToTest} + 1 )) ${TestScript} | awk -F" " '/sender$/ {printf ("%0.1f",$7/1000); print "\t"$9}' | sed 's/sender/0/')
	RX_Result=$(timeout -k $(( ${TimeToTest} + 2 )) $(( ${TimeToTest} + 1 )) ${TestScript} -R | awk -F" " '/sender$/ {printf ("%0.1f",$7/1000); print "\t"$9}' | sed 's/sender/0/')
	LoadAverage=$(uptime | awk -F" " '/average/ {print $9}' | tr -d ',')
	echo -e "$(printf "%2s" ${TX})/$(printf "%2s" ${RX}):\t${TX_Result}\t${RX_Result}\t${LoadAverage}" >>"${LogFile}"
	IncrementAndReboot
} # Main

CheckPrerequisits() {
	export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
	echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
	echo 1-2 >/proc/irq/$(awk -F":" "/1c30000.eth/ {print \$1}" </proc/interrupts | sed 's/\ //g')/smp_affinity_list
	echo 1 > /sys/class/net/eth0/queues/rx-0/rps_cpus
	echo 1 > /sys/class/net/eth0/queues/tx-0/xps_cpus
	echo 32768 > /proc/sys/net/core/rps_sock_flow_entries
	which iperf3 >/dev/null 2>&1 || apt-get -f -qq -y install iperf3
	which ethtool >/dev/null 2>&1 || apt-get -f -qq -y install ethtool
	which dtc >/dev/null 2>&1 || apt-get -f -qq -y install device-tree-compiler
	[[ -f "${TX_File}" ]] || echo -n 0 >"${TX_File}"
	[[ -f "${RX_File}" ]] || echo -n 0 >"${RX_File}"
	[[ -f "${LogFile}" ]] || echo -e "TX/RX\tTX\t\tRX\t\tload\ndelay:\tMb/s\tretry\tMb/s\tretry\t avg" >"${LogFile}"
	TestScript=$(mktemp /tmp/${0##*/}.XXXXXX)
	echo '#!/bin/bash' >${TestScript}
	echo -e "taskset 8 iperf3 -f k \$1 -c ${TestPartner} -t ${TimeToTest}" >>${TestScript}
	chmod 755 ${TestScript}
} # CheckPrerequisits

IncrementAndReboot() {
	let RX++
	if [ ${RX} -eq 8 ]; then
		RX=0
		let TX++
	fi
	if [ ${TX} -lt 8 ]; then
		# write stuff to files, patch .dtb, reboot
		echo -n ${TX} >"${TX_File}"
		echo -n ${RX} >"${RX_File}"
		HexTX="$(printf '<0x%x>;' ${TX})"
		HexRX="$(printf '<0x%x>;' ${RX})"
		sed -e "s/tx-delay = <0x0>;/tx-delay = ${HexTX}/" \
			-e "s/rx-delay = <0x0>;/rx-delay = ${HexRX}/" \
			<"${SourceDTS}" | dtc -I dts -O dtb -o "${TargetDTB}"
		sync
		reboot
	fi
} # IncrementAndReboot

Main $@

It requires another GbE capable host preferrable on the same switch running iperf3 -s and IP address or name has to be adjusted in $TestPartner variable. This other host must exceed 930 Mbits/sec when testing with iperf3 so first test is always with 2 other GbE capable devices to ensure $TestPartner is capable of saturating a GbE link.

Also both $SourceDTS and $TargetDTB variables have to be adjusted and both delay values have to be set to these defaults (otherwise replacement through sed will fail):

tx-delay = <0x0>;
rx-delay = <0x0>;
@ThomasKaiser
Collaborator
ThomasKaiser commented Nov 18, 2016 edited

These are the results of a normal 2GB Pine64+ and a pre-release 1GB dev sample against a MacBook Pro connected to the same switch:

2 GB Pine64+ 'A64-DB-2GRev B 2016-03-21'

TX/RX   TX              RX
delay:  Mb/s    retry   Mb/s    retry
 0/ 0:  876     233     941     0
 0/ 1:  900     150     941     0
 0/ 2:  388     1346    148     0
 0/ 3:  312     1701    967     0
 0/ 4:  898     108     55.9    0
 0/ 5:
 0/ 6:
 0/ 7:
 1/ 0:  915     0       941     0
 1/ 1:  916     0       941     0
 1/ 2:  917     0       154     0
 1/ 3:  915     0       647     0
 1/ 4:  917     0       64.7    0
 1/ 5:
 1/ 6:
 1/ 7:
 2/ 0:  914     0       941     0
 2/ 1:  916     0       940     0
 2/ 2:  916     0       321     0
 2/ 3:  917     0       2.04    0
 2/ 4:  917     0       52.4    0
 2/ 5:  725     580     58.6    0
 2/ 4:
 2/ 5:
 2/ 6:
 2/ 7:
 3/ 0:  507     1104    931     0
 3/ 1:  477     1052    930     0
 3/ 2:  455     1125    252     0
 3/ 3:  484     1209    2.11    0
 3/ 4:  416     954     47.2    0
 3/ 5:          0.00    0
 3/ 6:
 3/ 7:
 4/ 0:  450     1318    937     0
 4/ 1:  437     1248    937     0
 4/ 2:  449     1234    348     0
 4/ 3:  444     1298    409     0
 4/ 4:  494     1346    65.4    0
 4/ 5:  23.3    106     52.4    0
 4/ 6:
 4/ 7:
 5/ 0:  344     1710    937     0
 5/ 3:  305     2083    4.06    0
 5/ 4:  413     1318    62.5    0
 5/ 5:  2.42    26      34.7    0
 5/ 6:
 5/ 7:
 6/ 0:  24.9    2072    908     0
 6/ 1:  2.32    555     936     0
 6/ 2:  12.5    1395    202     0
 6/ 3:  59.8    2707    6.36    0
 6/ 4:  11.0    704     61.7    0
 6/ 5:
 6/ 6:
 6/ 7:
 7/ 0:  353     1673    772     0
 7/ 1:  334     1554    934     0
 7/ 2:  359     1591    112     0
 7/ 3:  391     1765    2.34    0
 7/ 4:  398     1655    57.0    0
 7/ 5:
 7/ 6:
 7/ 7:


1 GB Pine64+ developer sample 'A64-DB-Rev B 2015-12-17'

TX/RX   TX              RX
delay:  Mb/s    retry   Mb/s    retry
 0/ 0:  710     642     935     0
 0/ 1:  746     566     937     0
 0/ 2:  692     645     937     0
 0/ 3:  655     710     413     0
 0/ 4:  705     633     3.57    0
 0/ 5:  256     297     52.4    0
 0/ 6:  6.95    4       52.4    0
 0/ 7:
 1/ 0:  704     605     933     0
 1/ 1:  707     633     938     0
 1/ 2:  770     517     938     0
 1/ 3:  705     621     938     0
 1/ 4:  676     628     42.2    0
 1/ 5:  723     618     116     0
 1/ 6:  5.66    15      42.7    0
 1/ 7:
 2/ 0:  697     634     937     0
 2/ 1:  683     598     934     0
 2/ 2:  721     591     938     0
 2/ 3:  719     601     938     0
 2/ 4:  710     550     3.47    0
 2/ 5:  725     580     58.6    0
 2/ 5:  894     150     86.9    0
 2/ 6:  45.5    19      52.4    0
 2/ 7:
 3/ 0:  878     231     938     0
 3/ 1:  867     274     938     0
 3/ 2:  863     276     937     0
 3/ 3:  881     272     608     0
 3/ 4:  864     273     59.1    0
 3/ 5:  882     279     112     0
 3/ 6:          52.4    0
 3/ 7:
 4/ 0:  906     37      939     0
 4/ 1:  907     41      938     0
 4/ 2:  916     49      931     0
 4/ 3:  916     38      792     0
 4/ 4:  902     32      5.89    0
 4/ 5:  913     37      155     0
 4/ 6:  411     152     52.4    0
 4/ 7:
 5/ 0:  851     312     937     0
 5/ 1:  873     296     936     0
 5/ 2:  853     322     936     0
 5/ 3:  853     337     296     0
 5/ 4:  875     271     4.27    0
 5/ 5:  692     221     86.3    0
 5/ 6:  407     147     52.4    0
 5/ 7:
 6/ 0:  881     257     938     0
 6/ 1:  886     274     937     0
 6/ 2:  880     264     937     0
 6/ 3:  853     311     392     0
 6/ 4:  872     292     8.56    0
 6/ 5:  871     278     152     0
 6/ 6:  79.1    59      52.4    0
 6/ 7:
 7/ 0:  247     2410    937     0
 7/ 1:  206     2985    937     0
 7/ 2:  211     3138    938     0
 7/ 3:  208     2862    909     0
 7/ 4:  250     2428    18.9    0
 7/ 5:  97.7    1080    61.8    0
 7/ 6:  6.48    102     52.4    0
 7/ 7:
@ThomasKaiser
Collaborator
ThomasKaiser commented Nov 18, 2016 edited

Next round of tests. This is exactly the same setup (cables, PSU, switch, SD card) than before but now testing against a virtualized Ubuntu Xenial running in ESXi on a Mac Mini connected to same switch. Only difference: script now always correctly reports throughput in MB/s and also logs average load to probably get a clue whether insufficient settings also influence CPU utilization (by looking at results the approach sucks/failed):

2 GB Pine64+ 'A64-DB-2GRev B 2016-03-21'

TX/RX   TX              RX              load
delay:  Mb/s    retry   Mb/s    retry    avg
 0/ 0:  240.9   1310    903.4   6783    1.17
 0/ 1:  228.3   1275    906.0   6597    0.95
 0/ 2:  244.0   1284    723.7   14830   1.14
 0/ 3:  574.9   653     0.5     273     0.85
 0/ 4:  164.3   365     0.0     22      0.84
 0/ 5:                  1.69
 0/ 6:                  1.40
 0/ 7:                  0.83
 1/ 0:  471.4   840     910.0   2624    0.91
 1/ 1:  440.8   729     901.2   2757    0.70
 1/ 2:  448.4   748     750.8   12076   1.42
 1/ 3:  353.3   647     0.0     0       0.79
 1/ 4:  222.8   398     0.0     7       0.79
 1/ 5:  0.0     1       0.0     0       1.10
 1/ 6:                  1.39
 1/ 7:                  1.42
 2/ 0:  469.4   869     912.5   2559    1.28
 2/ 1:  361.2   695     909.9   2360    0.79
 2/ 2:  487.7   792     805.2   9164    0.80
 2/ 3:  427.9   823     7.1     2014    0.60
 2/ 4:  272.3   528     0.0     16      1.42
 2/ 5:                  0.73
 2/ 6:                  1.34
 2/ 7:                  0.66
 3/ 0:  380.2   705     907.1   2695    0.85
 3/ 1:  436.7   945     911.7   3060    1.31
 3/ 2:  423.3   952     798.9   10618   1.39
 3/ 3:  328.9   788     28.4    3558    0.75
 3/ 4:  125.2   364     0.0     0       0.81
 3/ 5:                  1.21
 3/ 6:                  1.57
 3/ 7:                  1.09
 4/ 0:  417.8   875     912.5   3365    0.93
 4/ 1:  438.5   1013    910.2   3183    1.14
 4/ 2:  380.3   532     795.7   10510   0.88
 4/ 3:  392.9   921     26.4    6253    1.08
 4/ 4:  82.7    231     0.0     7       0.74
 4/ 5:  0.1     4               0.81
 4/ 6:                  1.58
 5/ 1:  415.6   892     907.8   3743    1.34
 5/ 2:  394.2   848     776.4   11914   0.91
 5/ 3:  399.5   873     17.1    4259    0.87
 5/ 4:  81.3    232     0.0     9       0.74
 5/ 5:          0.0     0       1.22
 5/ 6:                  1.49
 5/ 7:                  1.20
 6/ 0:  391.0   876     905.5   3456    1.33
 6/ 1:  385.2   970     910.5   3636    1.47
 6/ 2:  396.8   894     746.2   12133   1.55
 6/ 3:  270.7   647     0.7     336     0.75
 6/ 4:  69.9    205     0.0     7       0.72
 6/ 5:                  1.37
 6/ 6:                  0.98
 6/ 7:                  1.63
 7/ 0:  404.6   942     921.9   3662    0.89
 7/ 1:  415.3   1022    925.1   3710    0.52
 7/ 2:  397.1   1027    890.4   4332    1.05
 7/ 3:  398.1   942     20.1    4991    0.69
 7/ 4:  201.9   534     0.0     0       0.85
 7/ 5:          0.0     0       0.67
 7/ 6:                  1.33
 7/ 7:                  1.62

1 GB Pine64+ developer sample 'A64-DB-Rev B 2015-12-17'

TX/RX   TX              RX              load
delay:  Mb/s    retry   Mb/s    retry    avg
 0/ 0:  894.0   162     937.7   775     0.98
 0/ 1:  769.4   287     934.7   1021    1.37
 0/ 2:  812.3   280     931.2   1103    1.59
 0/ 3:  781.2   332     853.5   4732    1.40
 0/ 4:  823.4   245     443.3   23842   0.86
 0/ 5:  872.0   265     0.1     91      1.26
 0/ 6:          0.0     0       0.48
 0/ 7:                  1.29
 1/ 0:  803.0   332     921.1   1363    0.92
 1/ 1:  730.1   361     917.2   1321    1.42
 1/ 2:  664.3   286     915.4   1259    1.69
 1/ 3:  802.9   310     899.2   1945    1.02
 1/ 4:  787.9   328     27.8    5721    0.57
 1/ 5:  780.6   395     0.1     101     1.00
 1/ 6:  16.0    19      0.0     0       1.00
 1/ 7:                  1.01
 2/ 0:  732.4   326     926.9   1042    0.61
 2/ 1:  794.4   316     917.8   1382    1.28
 2/ 2:  703.7   314     920.7   1450    1.09
 2/ 3:  698.4   324     849.0   4807    0.71
 2/ 4:  713.2   323     136.9   16056   1.30
 2/ 5:  384.7   227     0.0     6       0.97
 2/ 6:  0.2     4       0.0     0       1.32
 2/ 7:                  0.66
 3/ 0:  737.8   369     910.9   1589    1.55
 3/ 1:  711.5   395     914.4   1706    1.18
 3/ 2:  641.4   327     913.9   1618    1.08
 3/ 3:  670.7   353     905.6   1918    1.25
 3/ 4:  793.3   394     35.8    7256    1.06
 3/ 5:  607.7   330     0.1     107     0.68
 3/ 6:  13.4    32      0.0     0       0.91
 3/ 7:                  1.78
 4/ 0:  772.9   327     916.5   1664    1.64
 4/ 1:  754.8   345     917.2   1694    1.68
 4/ 2:  798.9   339     912.6   1860    1.27
 4/ 3:  756.2   401     912.8   1829    1.21
 4/ 4:  699.9   409     5.0     1545    1.34
 4/ 5:  405.2   190     0.0     27      1.07
 4/ 6:  0.2     2       0.0     0       0.77
 5/ 1:  774.3   339     918.3   1298    1.42
 5/ 2:  849.1   237     925.2   1409    1.63
 5/ 3:  832.8   262     912.7   1710    1.10
 5/ 4:  816.1   304     29.8    6107    1.12
 5/ 5:  802.6   315     0.1     83      1.24
 5/ 6:  42.2    62      0.0     0       0.74
 5/ 7:                  1.65
 6/ 0:  798.0   375     917.5   1793    1.23
 6/ 1:  768.9   347     907.7   1739    1.75
 6/ 2:  776.8   419     924.3   1443    1.48
 6/ 3:  784.7   332     908.2   1914    1.67
 6/ 4:  669.3   330     44.1    8266    1.03
 6/ 5:  754.8   402     0.1     124     0.88
 6/ 6:  38.6    32              0.62
 6/ 7:                  1.36
 7/ 0:  806.8   312     907.4   1626    1.15
 7/ 1:  674.9   374     920.6   1661    1.52
 7/ 2:  812.1   316     916.4   1580    1.06
 7/ 3:  774.7   343     856.3   4099    1.12
 7/ 4:  705.9   401     40.9    7523    0.95
 7/ 5:  78.0    30      0.0     9       1.12
 7/ 6:  37.2    32      0.0     0       0.53
 7/ 7:                  1.56

I fail to interpret the results (partially better 'performance' and also higher count of re-transmits). Needs more testing.

@ThomasKaiser
Collaborator
ThomasKaiser commented Nov 18, 2016 edited

Almost forgot: The main 'consumer' of this stuff might be Banana Pi M2 Ultra since also using 3.10 Allwinner BSP kernel and most probably shipping with broken settings.

@ThomasKaiser ThomasKaiser changed the title from Provide adjustment of TX/RX delay on sun50i* through boot script? to Provide adjustment of TX/RX delay on sun50i*/sun8iw11p1 through boot script? Nov 19, 2016
@pfeerick
pfeerick commented Nov 19, 2016 edited

Did you have any issues with the pine64 hitting the the 'call trace' errors that are documented in #502 when running this script? Could it be related to the jessie build rather than the xenial? (I had intended to run the xenial build but it looks like my nice new evo SD is a fake and is knackered, so fell back to a reliable sandisk w/ jessie). I keep hitting them every couple of reboots, and a manual reset seems to kick it along again... Oh, and good thing you never claimed this would improve performance... I don't like the "[ 5] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes" type results :-P

Battery + clean 5v input, no serial or peripherals connected, pine64+ 1GB

@ThomasKaiser
Collaborator
ThomasKaiser commented Nov 19, 2016 edited

Did you have any issues with the pine64 hitting the the 'call trace' errors that are documented in #502 when running this script?

Nope. While testing the two Pine64+ rebooted a few hundred times but never ran into this issue. I used a Xenial build from 2 days ago but I don't think this is distro specific. Might sound like a lame excuse but when our support for a board leaves beta state most problems from then on are related to SD card or insufficient power supply.

This whole testing is more about getting an idea how to support new GbE boards but it might also help with Pine64+ since the defaults we now use (3/0 TX/RX) might not be the best for all board revisions. But for this more numbers are needed.

@zador-blood-stained
Collaborator

@pfeerick
Are you using the kernel from main Armbian repository or one from beta.armbian.com? This lockup issue should be solved with the newest kernel.

@pfeerick
pfeerick commented Nov 19, 2016 edited

@ThomasKaiser : No, not lame as it's the most common reason... but I don't beleive it to be the case this time - I was using good power and good microSD, so was wondering if there was something that could be distro specific else I was thinking it's a glitch with the board or something I'd done with the jessie image :-/

@zador-blood-stained I'd run an apt-get update && apt-get upgrade just before running this, so as far as I knew I was running the latest 'release' kernel. I'll try the beta one to see if that makes any difference.

@pfeerick
pfeerick commented Nov 20, 2016 edited

@zador-blood-stained Well, I moved to the beta repo and let that update the kernel and a couple other packages, and only had one manual reset intervention... so I can also report that the newer kernel seems much less hang-prone-at-boot.

@ThomasKaiser Here is of that lovely data you like ;) The only variation I made from your testing schedule was to increase the delay before the script ran... the ethernet appear to be not always be up when the script was run.

1 GB Pine64+  'A64-DB-Rev B 2016-02-24' 
running Armbian Jessie 5.24 Beta with a GbE modem/router 
and GbE desktop running Ubuntu 16.04

TX/RX   TX              RX              load
delay:  Mb/s    retry   Mb/s    retry    avg
 0/ 0:  918.0   0       941.7   1       0.39
 0/ 1:  917.8   0       941.0   0       0.45
 0/ 2:  918.4   0       941.0   5       0.25
 0/ 3:  918.7   0       9.4     2189    0.68
 0/ 4:  917.8   0       1.0     451     0.19
 0/ 5:  917.6   0       0.0     0       0.25
 0/ 6:                  0.17
 0/ 7:                  0.26
 1/ 0:  9.8     0       939.8   0       0.36
 1/ 1:  917.4   0       939.9   0       0.50
 1/ 2:  917.4   0       940.2   41      0.55
 1/ 3:  239.1   0       871.8   2171    0.33
 1/ 4:  918.0   0       0.2     169     0.20
 1/ 5:  918.2   0       0.0     9       0.20
 1/ 6:                  0.42
 1/ 7:                  0.16
 2/ 0:  917.2   0       941.0   0       0.52
 2/ 1:  917.3   0       941.5   0       0.52
 2/ 2:  917.7   0       939.8   4       0.61
 2/ 3:  917.9   0       13.8    2926    0.29
 2/ 4:  918.0   0       0.1     106     0.30
 2/ 5:  917.5   43      0.0     7       0.33
 2/ 6:                  0.06
 2/ 7:                  0.11
 3/ 0:  917.9   0       941.5   2       0.41
 3/ 1:  917.3   1       940.8   0       0.46
 3/ 2:  918.0   0       938.1   962     0.40
 3/ 3:  73.8    0       851.4   3381    0.33
 3/ 4:  918.4   0       0.2     140     0.41
 3/ 5:  439.9   0       0.0     7       0.20
 3/ 6:                  0.16
 3/ 7:                  0.16
 4/ 0:  908.1   49      941.8   0       0.77
 4/ 1:  917.2   33      941.5   0       0.46
 4/ 2:  914.3   6       940.8   23      0.38
 4/ 3:  918.9   71      8.9     2085    0.24
 4/ 4:  914.3   2       0.1     148     0.28
 4/ 5:  0.1     10      0.0     0       0.20
 4/ 6:                  0.26
 4/ 7:                  0.16
 5/ 0:  852.2   724     940.6   0       0.40
 5/ 1:  427.3   1906    940.4   0       0.30
 5/ 2:  817.9   722     939.9   18      0.38
 5/ 3:  598.1   1435    59.0    7221    0.17
 5/ 4:  830.6   652     0.4     240     0.30
 5/ 5:                  0.24
 5/ 6:                  0.16
 5/ 7:                  0.21
 6/ 0:  40.7    2008    941.2   0       0.32
 6/ 1:  70.4    3045    941.1   1       0.39
 6/ 2:  25.4    1370    940.7   2       0.45
 6/ 3:  54.9    1980    634.6   6958    0.48
 6/ 4:  24.9    431     0.4     210     0.17
 6/ 5:  2.8     160     0.0     21      0.91
 6/ 6:                  0.21
 6/ 7:                  0.16
 7/ 0:  4.6     465     940.1   0       0.42
 7/ 1:  2.0     302     938.5   2       0.35
 7/ 2:  0.5     162     937.0   92      0.27
 7/ 3:  0.0     12      0.0     0       0.33
 7/ 4:  0.2     55      0.5     220     0.15
 7/ 5:  0.0     9       0.0     0       0.20
 7/ 6:                  0.21
 7/ 7:                  0.16
@ThomasKaiser
Collaborator

Just a quick note: with latest beta image I ran today already two times into this 'call trace' issue and booting failed. And old OS image is overwritten :(

@ThomasKaiser
Collaborator
ThomasKaiser commented Nov 20, 2016 edited

Some results with the 1GB dev sample against the virtualized Linux (compared with these results before) and Pine Inc's gmactxonly 'fix':

TX/RX   TX              RX              load
delay:  Mb/s    retry   Mb/s    retry    avg
 0/ 0:  15.5    634     797.5   27469   1.04
 0/ 1:  35.3    1250    817.3   28292   1.55
 0/ 2:  30.8    1091    796.7   27242   1.61
 0/ 3:  33.5    1117    804.8   27795   1.28
 0/ 4:  3.5     0               2.09
 0/ 5:  5.6     266     0.1     115     0.96
 0/ 6:  0.1     4               0.94
 0/ 7:  3.1     0               1.51
 1/ 0:  19.9    796     804.0   27570   1.13
 1/ 1:  35.8    1201    806.6   27484   1.54
 1/ 2:  29.0    1120    799.7   27333   1.76
 1/ 3:                  1.26
 1/ 4:  35.4    1214    474.2   24700   0.62
 1/ 5:  32.2    1140    4.1     1272    0.66
 1/ 6:  21.6    763     0.0     0       0.64
 1/ 7:                  0.83
 2/ 0:  19.5    692     801.0   26917   0.67
 2/ 1:  33.5    1168    799.8   26842   0.87
 2/ 2:                  1.38
 2/ 3:                  1.12
 2/ 4:                  1.26
 2/ 5:  26.7    925     0.1     81      0.62
 2/ 6:                  1.13
 2/ 7:                  0.60
 3/ 0:  25.0    889     800.2   26962   1.06
 3/ 1:  26.1    994     807.0   27375   0.70
 3/ 2:  24.7    945     819.5   27687   0.50
 3/ 3:  23.1    898     823.7   27662   1.00
 3/ 4:                  0.76
 3/ 5:  26.6    1001    0.3     184     0.40
 3/ 6:                  0.87
 3/ 7:                  1.03
 4/ 0:  18.1    629     793.1   26823   0.71
 4/ 1:  25.6    939     800.4   27154   0.84
 4/ 2:  21.7    810     785.9   26567   0.76
 4/ 3:  35.0    1193    794.4   27010   0.68
 4/ 4:  17.1    667     715.2   25268   0.60
 4/ 5:                  1.26
 4/ 6:  7.2     259     0.0     0       0.83
 4/ 7:                  0.82
 5/ 0:  31.9    1096    796.7   27109   0.88
 5/ 1:  33.0    1137    798.0   27099   0.86
 5/ 2:  27.2    925     813.8   27388   0.80
 5/ 3:  25.3    904     779.6   25801   0.78
 5/ 4:  28.2    1010    745.7   26535   0.85
 5/ 5:  26.2    847     13.7    2914    0.68
 5/ 6:  2.5     79      0.0     7       0.46
 5/ 7:                  0.60
 6/ 0:                  1.51
 6/ 1:  24.4    920     789.7   26476   0.93
 6/ 2:  6.2     356     802.6   27213   0.79
 6/ 3:  30.0    1052    794.1   26625   0.68
 6/ 4:  23.1    939     400.2   22736   0.63
 6/ 5:  25.9    913     12.2    2589    0.69
 6/ 6:  18.2    705     0.0     19      0.56
 6/ 7:                  0.95
 7/ 0:  26.2    960     789.3   26752   0.79
 7/ 1:  26.1    1003    796.7   27197   0.62
 7/ 2:  33.1    1187    803.2   27413   0.76
 7/ 3:  30.2    1149    805.6   27449   0.51
 7/ 4:  30.0    1038    739.5   26736   0.93
 7/ 5:  17.8    797     13.8    2855    0.79
 7/ 6:                  1.07
 7/ 7:                  0.71

So this 'fix' is nothing to be considered since performance gets horribly low compared to above numbers. Maybe only for those defective boards. But I still consider defective hardware defective and not software should fix it but replacement or a refund.

@ThomasKaiser
Collaborator
ThomasKaiser commented Nov 23, 2016 edited

If this here has any meaning then Olimex' A64 board will use 6/0 TX/RX delay. OTOH the dmesg file in the same dir shows only Fast Ethernet negotiation.

@zador-blood-stained
Collaborator

So we may add 2 environment variables to pine64 boot script, so if gmac-tx-delay and/or gmac-rx-delay is set in /boot/armbianEnv.txt, these values will be patched in DT.

This will leave default values in DT, but will give end users a relatively easy way to adjust things.

@ThomasKaiser
Collaborator

@zador-blood-stained yes, this would be great. If I understood @apritzel correctly then same parameters also help improving GbE performance/stability with mainline kernel.

Finding optimal default values can be postponed for now and maybe this tweak will help here too.

@zador-blood-stained
Collaborator
zador-blood-stained commented Nov 29, 2016 edited

If I understood @apritzel correctly then same parameters also help improving GbE performance/stability with mainline kernel.

Once we have final or close to final Ethernet driver and DT bindings for it we will be able to handle it. For now let's deal with legacy stuff that won't be changed anytime soon.

@zador-blood-stained
Collaborator

U-boot part was implemented in 99d50c6, and if you think that this is enough (I mean no extra userspace scripts or armbianmonitor parameters to edit armbianEnv.txt) then this issue can be closed.

@ThomasKaiser
Collaborator

Thank you very much! Closed since igorpecovnik/lib.docs@287affb

@apritzel
apritzel commented Dec 1, 2016

Btw, if you need to do more tests with the RX/TX delay, I can provide you with some magic runes to change the delays at runtime from Linux userland, so without the need to tweak DT & reboot. Just let me know and I can work out the commands.

@ThomasKaiser
Collaborator

This would be great! Since I would assume it's the same for H5 please ping also ErwinH in IRC and point to the discussion/commands. :)

@apritzel
apritzel commented Dec 2, 2016 edited

So I hacked my "peekpoke" tool yesterday and it really worked that way:
# ./peekpoke -b 0x1c00000 B[12:10].l 0x30 1 B[9:5].l 0x30 2
sets TX delay to 1 and RX delay to 2. Don't do this when connected via ssh ;-)
An iperf3 started immediately after that shows the expected behaviour.
Since this tool pokes into system memory, I need to double check that it's doing the right thing, so I will publish something later tonight.

@apritzel
apritzel commented Dec 4, 2016

Alright, I created https://github.com/apritzel/peekpoke, which is my version of devmem2. I found the latter broken for 64-bit, also I wanted to issue multiple requests in one go (to access devices), so I came up with this. No README so far, but calling it with -h gives some info.
For the TX/RX delay setting use case, the command line is:
# ./peekpoke -b 0x1c00000 B[12:10].l 0x30 1 B[9:5].l 0x30 2 r.l 0x30
This should be valid for all H3/A64/H5 SoCs. Compare section 3.5.3.2 (EMAC clock register) from the A64 user manual for a reference to the magic bits.
In this example, the "1" is TX delay, the "2" is RX delay.

DISCLAIMER: Please use at your own risk, as this is open-heart surgery, poking in physical memory and accessing devices directly. Watch for typos, they could be fatal. Make sure you saved all your precious data.

The EMAC's driver open() routine resets the delay values to the one from the DT, I believe ifconfig down/up would trigger this.
Just use ./peekpoke r.l 0x1c00030 to read the value and compare this with the tweaked output from above to make sure your settings are actually in use and haven't been overwritten.

@tkaiser: if you are really sick, you can even tweak the CPU clock with this tool, by adapting the magic runes I gave you once for U-Boot.

@ThomasKaiser ThomasKaiser referenced this issue in OrangePiLibra/OrangePi_H5SDK Dec 5, 2016
Open

Gigabit Ethernet performance issue #4

@ThomasKaiser
Collaborator

@apritzel Thanks for this, did iperf3 tests with mainline kernel on 2 Pine64+ (1 GB pre-production dev sample and 2 GB normal version). Best tx-delay/rx-delay values are still 2/0 or 3/0 so I would propose to add 3/0 to DT and send it upstream.

Small note on measurements: I didn't go through all possible values but only tested worst case (rx-delay=7 -- immediately lost network connectivity) and then tx-delay from 0 to 6 while rx-delay was set to 0. With 2/0 or 3/0 iperf3 numbers are 900 Mbits/sec TX (~30 retransmits per second) and 940 Mbits/sec RX. That's perfect given that I just use these 'tunables':

/boot/boot.cmd:

mw.l 0x1c2005c 1
mw.l 0x1c20000 0x80001110

/etc/rc.local:

echo 2-3 >/proc/irq/$(awk -F":" "/1c30000.eth/ {print \$1}" </proc/interrupts | sed 's/\ //g')/smp_affinity_list
echo 1 > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo 1 > /sys/class/net/eth0/queues/tx-0/xps_cpus
/usr/local/bin/peekpoke -b 0x1c00000 B[12:10].l 0x30 2 B[9:5].l 0x30 0 r.l 0x30

Every iperf3 call was prefixed with taskset 8 to ensure that Ethernet IRQs and iperf3 were running on different CPU cores (using taskset 2 together with the above settings for example leads to a drop of more than 250 Mbits/sec)

@ThomasKaiser
Collaborator

Just use ./peekpoke r.l 0x1c00030 to read the value

Just to save others some time. It should read instead (and works as expected!):

root@pine64:~# peekpoke -b 0x1c00000 r.l 0x30
0x50806
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment