Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pi 5 HAT: Radxa Penta SATA HAT #615

Open
geerlingguy opened this issue Mar 25, 2024 · 59 comments
Open

Pi 5 HAT: Radxa Penta SATA HAT #615

geerlingguy opened this issue Mar 25, 2024 · 59 comments

Comments

@geerlingguy
Copy link
Owner

Radxa sells an updated version of their Penta SATA HAT for $45, and it includes four SATA drive connectors, plus one edge connector for a 5th drive, 12V power inputs (molex or barrel jack) to power both the drives and the Pi 5 via GPIO, a cable for the 5th drive, an FFC cable to connect the HAT to the Pi 5, and screws for the mounting.

radxa-penta-hat

It looks like the SATA controller is a JMB585 PCIe Gen 3x2 SATA controller, so it could benefit from running the Pi 5's PCIe lane at Gen 3.0 speeds (setting dtparam=pciex1_gen=3 in /boot/firmware/config.txt). Radxa sent me a unit for testing.

@geerlingguy
Copy link
Owner Author

It's on the site now: https://pipci.jeffgeerling.com/hats/radxa-penta-sata-hat.html

I'll be testing and benchmarking soon!

@geerlingguy geerlingguy changed the title Add Pi 5 HAT: Radxa Penta SATA HAT Pi 5 HAT: Radxa Penta SATA HAT Mar 25, 2024
@geerlingguy
Copy link
Owner Author

Some usage notes:

  • I had to add dtparam=pciex1 to the /boot/firmware/config.txt to get the HAT to be recognized
  • I also could run it at PCIe Gen 3.0 speeds with dtparam=pciex1_gen=3
  • To get the HAT to fit on top of the Pi 5 with an active cooler, I had to use needle-nose pliers to break off the tops of the three heat sink fins in the corner closest to the Pi's USB-C port. Otherwise the barrel jack would hit the tops of those heat sink fins, and not make full contact with the GPIO pins
  • I could get 800+ MB/sec at Gen 3.0 speeds with an array of four Samsung 8TB QVO SSDs
  • I could get 74 MB/sec writing to a RAIDZ1 array over Samba (using OMV)
  • I could get 97 MB/sec writing to a RAID 0 array over Samba (using bare Linux)
  • I could get 122 MB/sec reading from either array over Samba on the Pi's built-in 1 Gbps network interface
  • I could get 240 MB/sec reading from either array over Samba on a HatNET! 2.5G adapter from Pineberry Pi (this was plugged into a HatBRICK! Commander PCIe Gen 2.0 switch, which had one port to the 2.5G HAT, and one to the Radxa Penta SATA HAT
  • Idle power consumption for the setup with just the Penta SATA HAT was 6W
  • Idle power consumption for the setup including the PCIe switch and 2.5G NIC was 8W
  • Power consumption during disk read/write operations over the network was between 8-16W
  • Peak power consumption while ZFS was doing some sort of cleanup operation or compression was 24W

@geerlingguy
Copy link
Owner Author

One concern could be heating—the JMB585 SATA controller chip hit peaks of 60+°C in my testing:

radxa-penta-sata-hat-22

There is an official fan/OLED board, and that seems like it would be a wise choice for this build. It seems to also require the case, which is also announced but not available anywhere right now. See: https://forum.radxa.com/t/penta-sata-hat-is-now-available/20378

@geerlingguy
Copy link
Owner Author

And here's an illustration of the three heatsink fins I had to break off to get the HAT to fit:

radxa-penta-sata-hat-13

@geerlingguy
Copy link
Owner Author

geerlingguy commented Apr 3, 2024

I've also been monitoring IRQs and CPU affinity while doing network copies—the writes, specifically—and nothing really jumps out and suggests a bottleneck there (I'm reminded of this old Raspberry Pi linux issue):

Screenshot 2024-04-03 at 10 42 28 AM

This was in the middle of a 50 GB folder copy to a ZFS array. It is averaging 70 MB/sec or so, which is a fair bit less than line speed over the gigabit connection :(

@ThomasKaiser had suggested over in the Radxa forum there could be some affinity issues with networking on the Pi 5, but I don't see that via atop at least...

@geerlingguy
Copy link
Owner Author

geerlingguy commented Apr 3, 2024

Monitoring the CPU frequency with vcgencmd measure_clock arm, I do see it dipping down now and then, but mostly staying stable at 2.4 GHz (frequency(0)=2400033792). I will try performance and see if that gives any more consistent write speeds over the network.

I rebooted with force_turbo=1 in /boot/firmware/config.txt, and performed another copy (confirming the frequency was pegged at 2.4 GHz the whole time)... no difference. Still averaging around 70 MB/sec.

Here's htop as well:

Screenshot 2024-04-03 at 10 58 04 AM

And btop since it's pretty and shows similar data to atop in a more pleasing layout:

Screenshot 2024-04-03 at 11 01 54 AM

@geerlingguy
Copy link
Owner Author

geerlingguy commented Apr 3, 2024

I also tried NFS instead of Samba, by enabling it, creating a share 'shared', and connecting via Finder at nfs://10.0.2.214/export/shared (I had to glance on the Pi what the exports were, from OMV, with showmount -e localhost).

The copy was more stable around 82 MB/sec, but still no sign of a clear bottleneck in atop.

Screenshot 2024-04-03 at 11 21 47 AM

Unlike Samba, it looked like the nfsd process was pinned to CPU0, and atop showed the IRQ affinity was all on core 0 (which still seemed to have plenty of headroom—IRQ % never topped 20%, and CPU core0 usage stayed under 25% as well (the full CPU never reached above 50% during the copy):

Screenshot 2024-04-03 at 11 20 05 AM

I'm also going to try enabling compression in the ZFS pool, since it seems like I have plenty of CPU on the pi to handle it, and that can actually speed up writing through to the disk (though I don't think that's the bottleneck at all... just something to test that's easy and quick).

Result: ZFS Compression seems to make no difference—there's some more ZFS process CPU consumption, but overall the speed averages around the same 70 MB/sec...

Screenshot 2024-04-03 at 11 40 55 AM

@ThomasKaiser
Copy link

What about

echo 1 >/sys/devices/system/cpu/cpufreq/ondemand/io_is_busy
echo default > /sys/module/pcie_aspm/parameters/policy

@geerlingguy
Copy link
Owner Author

pi@pi-nas:~ $ sudo su
root@pi-nas:/home/pi# echo 1 >/sys/devices/system/cpu/cpufreq/ondemand/io_is_busy
root@pi-nas:/home/pi# echo default > /sys/module/pcie_aspm/parameters/policy
root@pi-nas:/home/pi# cat /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy
1
root@pi-nas:/home/pi# cat /sys/module/pcie_aspm/parameters/policy
[default] performance powersave powersupersave 

Still seeing the same sporadic performance:

Screenshot 2024-04-03 at 1 08 01 PM

(force_turbo is still set on, and clocks are still measuring at 2.4 GHz.)

@geerlingguy
Copy link
Owner Author

geerlingguy commented Apr 3, 2024

More stats on the share from the macOS client:

$ smbutil statshares -a

==================================================================================================
SHARE                         ATTRIBUTE TYPE                VALUE
==================================================================================================
--------------------------------------------------------------------------------------------------
shared                        
                              SERVER_NAME                   10.0.2.214
                              USER_ID                       501
                              SMB_NEGOTIATE                 SMBV_NEG_SMB1_ENABLED
                              SMB_NEGOTIATE                 SMBV_NEG_SMB2_ENABLED
                              SMB_NEGOTIATE                 SMBV_NEG_SMB3_ENABLED
                              SMB_VERSION                   SMB_3.1.1
                              SMB_ENCRYPT_ALGORITHMS        AES_128_CCM_ENABLED
                              SMB_ENCRYPT_ALGORITHMS        AES_128_GCM_ENABLED
                              SMB_ENCRYPT_ALGORITHMS        AES_256_CCM_ENABLED
                              SMB_ENCRYPT_ALGORITHMS        AES_256_GCM_ENABLED
                              SMB_CURR_ENCRYPT_ALGORITHM    OFF
                              SMB_SIGN_ALGORITHMS           AES_128_CMAC_ENABLED
                              SMB_SIGN_ALGORITHMS           AES_128_GMAC_ENABLED
                              SMB_CURR_SIGN_ALGORITHM       AES_128_GMAC
                              SMB_SHARE_TYPE                DISK
                              SIGNING_SUPPORTED             TRUE
                              EXTENDED_SECURITY_SUPPORTED   TRUE
                              LARGE_FILE_SUPPORTED          TRUE
                              FILE_IDS_SUPPORTED            TRUE
                              DFS_SUPPORTED                 TRUE
                              FILE_LEASING_SUPPORTED        TRUE
                              MULTI_CREDIT_SUPPORTED        TRUE
                              MULTI_CHANNEL_SUPPORTED       TRUE
                              SESSION_RECONNECT_TIME        2024-04-03 13:09:10
                              SESSION_RECONNECT_COUNT       1

And Samba version on the Pi:

root@pi-nas:/home/pi# /usr/sbin/smbd --version
Version 4.17.12-Debian

@ThomasKaiser
Copy link

Do you do only Finder copies (AKA 'network + storage combined' plus various unknown 'optimization strategies') or have you already tested network and storage individually? A quick iperf3 / iperf3 -R run between Mac and RPi and iozone on your array should be enough.

And for Samba performance I came up with these settings when I wrote the generic 'OMV on SBC' install routine over half a decade ago: https://github.com/armbian/build/blob/e83d1a0eabcc11815945453d58e1b9f4e201de43/config/templates/customize-image.sh.template#L122

@geerlingguy
Copy link
Owner Author

geerlingguy commented Apr 3, 2024

I've tested iperf3 on this setup a few times—on the 1 Gbps port, I get 940 Mbps up, 940 Mbps down (doing --reverse), and iozone with a filesize of 10 GB at 1M block size gets me 1.5 GB/sec random read, 1.5 GB/sec random write (obviously affected by caching in the front for ZFS).

I set it to 50 GB to try to bypass more of the cached speed (since this is an 8 GB RAM Pi 5):

	Iozone: Performance Test of File I/O
	        Version $Revision: 3.492 $
		Compiled for 64 bit mode.
		Build: linux-arm 

	Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
	             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
	             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
	             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
	             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
	             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
	             Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
	             Vangel Bojaxhi, Ben England, Vikentsi Lapa,
	             Alexey Skidanov, Sudhir Kumar.

	Run began: Wed Apr  3 13:27:54 2024

	Include fsync in write timing
	O_DIRECT feature enabled
	Auto Mode
	File size set to 51200000 kB
	Record Size 1024 kB
	Command line used: ./iozone -e -I -a -s 50000M -r 1024k -i 0 -i 2 -f /tank/shared/iozone
	Output is in kBytes/sec
	Time Resolution = 0.000001 seconds.
	Processor cache size set to 1024 kBytes.
	Processor cache line size set to 32 bytes.
	File stride size set to 17 * record size.
                                                              random    random     bkwd    record    stride                                    
              kB  reclen    write  rewrite    read    reread    read     write     read   rewrite      read   fwrite frewrite    fread  freread
        51200000    1024  1667845  1719853                    1492992  1479166                                                                

@geerlingguy
Copy link
Owner Author

geerlingguy commented Apr 3, 2024

Testing from Windows 11 on the same network, reads maxed out at 110 MB/sec, just like on the Mac.

smb copy down 110mb-ps

Writes... are getting a consistent 108 MB/sec. (It did get a little more up-and-down around the halfway point, where the below screenshot was taken, but still averages above 105 MB/sec.)

smb write 108 mb-ps

Screenshot 2024-04-03 at 1 45 29 PM

Now I'm shaking my fist strongly at my Mac—why does Apple have to hate GPLv3 so much!? Will try to see if there's a way to see what's going on with macOS Finder. I've heard from @jrasamba that macOS might try using packet signing (see article), which could definitely result in different performance characteristics. Not sure about Windows 11's defaults.

Maybe I have to ditch using my Mac as the 'real world performance' test bed... with other networking stuff it's not an issue. And I know Finder's terrible... I just didn't think it was that terrible. :P

(@jrasamba also suggested watching this video on io_uring with some good general performance tips.)

@ThomasKaiser
Copy link

I've heard from @jrasamba that macOS might try using packet signing

You can check with smbstatus on the RPi (should be SMB3_11 and partial(AES-128-GMAC) as protocol revision and signing status with recent macOS versions) or on macOS with smbutil statshares -m /path/to/volume.

As for GPL or not, IIRC Apple always used an SMB client that was derived from *BSD. Only for the SMB server component license issues came into play when Apple replaced Samba with their smbx. But they had another good reason since starting from 10.8 or 10.9 we were able to transfer Mac files flawlessly between Macs via SMB since all the HFS+ attributes were properly mapped via SMB unlike with Samba.

Am about to setup tomorrow a q&d local Netatalk instance on a RPi 5 to restore a TM backup for a colleague on a MacBook to be shipped to her. But since this sounds like fun I might try to do the excercise with Samba instead and see whether the Samba tunables developed over half a decade ago are still important or not. No idea whether spare time allows or not...

@geerlingguy
Copy link
Owner Author

@ThomasKaiser - see above (#615 (comment)) — SMB_CURR_SIGN_ALGORITHM AES_128_GMAC does that mean it's enabled?

@ThomasKaiser
Copy link

SMB_CURR_SIGN_ALGORITHM AES_128_GMAC does that mean it's enabled?

Yes. Sorry haven't seen the whole comment.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Apr 3, 2024

I tried adding server signing = No to the smb.conf on the Pi (and restarting it), and also tried disabling signing on the macOS side:

printf "[default]\nsigning_required=no\n" | sudo tee /etc/nsmb.conf >/dev/null

Doesn't seem to make a difference either in the file copy speed, nor in the smbutil statshares -a output... not sure if it's supposed to disable the signing, or if that's even an accurate reporting.

I also tried setting delayed_ack=0 on the Mac as suggested here:

$ sudo sysctl -w net.inet.tcp.delayed_ack=0
Password:
net.inet.tcp.delayed_ack: 3 -> 0

I unmounted and re-mounted the share, and I'm still seeing the same performance. (So I set it back to 3.)

I'm going to reboot the Pi and Mac entirely and try again. (I had just unmounted the share, restarted smbd on the Pi, and re-mounted the share.)

@geerlingguy
Copy link
Owner Author

geerlingguy commented Apr 3, 2024

Reboot changed nothing, so I had a gander at the SMB config documentation, and found the client signing variable might need to be disabled?

client signing = disabled

I re-mounted the share, but it's still showing as AES_128_GMAC for the current algorithm... however, I saw in this Reddit thread that maybe the key is SIGNING_ON, which is not present, which seems to indicate that's not the issue at all, as it's not enabled.

@geerlingguy
Copy link
Owner Author

One last little nugget is I was debugging SMB via debug logging (easy enough to enable via OMV's UI), and I noticed there are actually two log files that are being written to when I'm working on my Mac:

-rw-r--r--  1 root root 229K Apr  3 14:51 log.10.0.2.15
...
-rw-r--r--  1 root root 431K Apr  3 14:54 log.mac-studio
-rw-r--r--  1 root root 1.1M Apr  3 14:50 log.mac-studio.old

I wonder if there's any possibility of SMB doing some kind of internal thrashing when it sees my Mac as both IP 10.0.2.15 and local hostname mac-studio?

@ThomasKaiser
Copy link

The net.inet.tcp.delayed_ack reference is the 'Internet at work': outdated stuff being copy&pasted over and over again :)

Signing could really be the culprit (just searched through my OMV 'career'). Since I just replaced a M1 Pro MBP with an M3 Air I checked defaults (or what I believe the defaults are):

tk@mac-tk ~ % cat /etc/nsmb.conf 
[default]
signing_required=no

Not required doesn't mean disabled. Unfortunately I'm on macOS 14 for just a couple of days (the lazy guy trying to skip every other macOS release) and am not into all the details yet...

@geerlingguy
Copy link
Owner Author

geerlingguy commented Apr 3, 2024

Note that on my Mac, I didn't have anything in place in /etc/nsmb.conf (I had to create the file). And regarding delayed_ack, once I get through anything that makes sense, I enjoy throwing things at the wall and seeing what sticks. And if it doesn't, I can quickly revert ;)

Even with client signing = disabled, nothing changed in the mount, and I don't see SIGNING_ON TRUE, so I would assume it's not on (searching around, it looks like if it's enabled for a share, it will show up like that, and not just SIGNING_SUPPORTED TRUE.

@ThomasKaiser
Copy link

I was debugging SMB via debug logging

This can and will harm SMB performance (bitten by this several times). But I guess you also tried it with log settings set to info and 'performance' was the same?

@geerlingguy
Copy link
Owner Author

geerlingguy commented Apr 3, 2024

This can and will harm SMB performance (bitten by this several times). But I guess you also tried it with log settings set to info and 'performance' was the same?

I only had it set to debug for about 3 minutes while I was replaying the copy, to get a snapshot of the log. Then set it right back to 'None' (which is the default in OMV). None of the performance data in this issue that I've posted was taken at any time when any smbd logging was enabled.

@geerlingguy
Copy link
Owner Author

Shakes fist at Apple:

Screenshot 2024-04-03 at 3 18 16 PM

If I just use Transmit to do an SFTP transfer (file transfer via SSH), I get a solid 115 MB/sec write speed. Going to test a file copy via Terminal straight to the SMB share next, to verify it's not some idiotic issue with Finder itself... stranger things have happened.

@ThomasKaiser
Copy link

to verify it's not some idiotic issue with Finder itself

Maybe Apple's most idiotic software piece ever :)

Back in the days when network/storage testing was a huge part of my day job I always used Helios LanTest since being limited in some ways (explained here) showing the performance differences/increases you were aiming for when debugging settings while Windows Explorer and Finder do a lot under the hood that masquerades basic network setting mismatches due to parallelisms and automagically tuned settings like block sizes.

@geerlingguy
Copy link
Owner Author

Ah, because I have OMV installed, I had to run sudo omv-firstaid and enable the interface. Somehow it grabbed eth0 from the Pi's internal port, which seems to be why the networking stack was all confused.

I ran through the firstaid wizard, and now I'm getting an IP address and connection on the Plugable USB 2.5G adapter.

pi@pi-nas:~ $ ethtool eth0
Settings for eth0:
	Supported ports: [ TP	 MII ]
	Supported link modes:   10baseT/Half 10baseT/Full
	                        100baseT/Half 100baseT/Full
	                        1000baseT/Half 1000baseT/Full
	                        2500baseT/Full
	Supported pause frame use: No
	Supports auto-negotiation: Yes
	Supported FEC modes: Not reported
	Advertised link modes:  10baseT/Half 10baseT/Full
	                        100baseT/Half 100baseT/Full
	                        1000baseT/Full
	                        2500baseT/Full
	Advertised pause frame use: No
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Link partner advertised link modes:  10baseT/Half 10baseT/Full
	                                     100baseT/Half 100baseT/Full
	                                     1000baseT/Full
	                                     2500baseT/Full
	Link partner advertised pause frame use: Symmetric Receive-only
	Link partner advertised auto-negotiation: Yes
	Link partner advertised FEC modes: Not reported
	Speed: 2500Mb/s
	Duplex: Full
	Auto-negotiation: on
	Port: MII
	PHYAD: 32
	Transceiver: internal
netlink error: Operation not permitted
        Current message level: 0x00007fff (32767)
                               drv probe link timer ifdown ifup rx_err tx_err tx_queued intr tx_done rx_status pktdata hw wol
	Link detected: yes

Testing with iperf3:

pi@pi-nas:~ $ iperf3 --bidir -c 10.0.2.15
Connecting to host 10.0.2.15, port 5201
[  5] local 10.0.2.218 port 55412 connected to 10.0.2.15 port 5201
[  7] local 10.0.2.218 port 55414 connected to 10.0.2.15 port 5201
[ ID][Role] Interval           Transfer     Bitrate         Retr  Cwnd
[  5][TX-C]   0.00-1.00   sec   218 MBytes  1.83 Gbits/sec    0    587 KBytes       
[  7][RX-C]   0.00-1.00   sec  45.8 MBytes   384 Mbits/sec                  
[  5][TX-C]   1.00-2.00   sec   214 MBytes  1.80 Gbits/sec    0    587 KBytes       
[  7][RX-C]   1.00-2.00   sec  31.5 MBytes   264 Mbits/sec                  
[  5][TX-C]   2.00-3.00   sec   215 MBytes  1.80 Gbits/sec    0    587 KBytes       
[  7][RX-C]   2.00-3.00   sec  35.9 MBytes   301 Mbits/sec                  
[  5][TX-C]   3.00-4.00   sec   215 MBytes  1.80 Gbits/sec    0    587 KBytes       
[  7][RX-C]   3.00-4.00   sec  34.2 MBytes   287 Mbits/sec                  
[  5][TX-C]   4.00-5.00   sec   213 MBytes  1.79 Gbits/sec    0    587 KBytes       
[  7][RX-C]   4.00-5.00   sec  30.4 MBytes   255 Mbits/sec                  
[  5][TX-C]   5.00-6.00   sec   215 MBytes  1.80 Gbits/sec    0    587 KBytes       
[  7][RX-C]   5.00-6.00   sec  33.4 MBytes   280 Mbits/sec                  
[  5][TX-C]   6.00-7.00   sec   214 MBytes  1.80 Gbits/sec    0    587 KBytes       
[  7][RX-C]   6.00-7.00   sec  32.5 MBytes   272 Mbits/sec                  
[  5][TX-C]   7.00-8.00   sec   215 MBytes  1.81 Gbits/sec    0    587 KBytes       
[  7][RX-C]   7.00-8.00   sec  35.3 MBytes   296 Mbits/sec                  
[  5][TX-C]   8.00-9.00   sec   214 MBytes  1.80 Gbits/sec    0    587 KBytes       
[  7][RX-C]   8.00-9.00   sec  32.5 MBytes   272 Mbits/sec                  
[  5][TX-C]   9.00-10.00  sec   215 MBytes  1.80 Gbits/sec    0    587 KBytes       
[  7][RX-C]   9.00-10.00  sec  38.0 MBytes   319 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-10.00  sec  2.10 GBytes  1.80 Gbits/sec    0             sender
[  5][TX-C]   0.00-10.00  sec  2.10 GBytes  1.80 Gbits/sec                  receiver
[  7][RX-C]   0.00-10.00  sec   349 MBytes   293 Mbits/sec                  sender
[  7][RX-C]   0.00-10.00  sec   349 MBytes   293 Mbits/sec                  receiver

iperf Done.

And here's a file copy to the Pi 5 over the 2.5G connection from Windows 11 (average of 270 MB/sec):

2 5g copy 270 mb-ps write wow

And here's a file copy from the Pi 5 over the 2.5G connection to Windows 11 (average of 200 MB/sec):

2 5g copy 200mb-ps read wow

It seemed like the write was not really bottlenecked, but the read was bottlenecked on disk IO, of all things, according to atop. For some reason the drives were pegged and at 115% utilization, and I saw ksoftirqd rising up in the task list. Maybe an issue where all the IO is going through one CPU core? I noticed the IRQs were pegged to core0.

@ThomasKaiser
Copy link

ThomasKaiser commented Apr 4, 2024

Maybe an issue where all the IO is going through one CPU core? I noticed the IRQs were pegged to core0.

Hopefully soon to be resolved: raspberrypi/linux#6077

And then someone needs to take time/efforts to develop sane IRQ affinity settings (like mostly I did for Armbian ages ago)

@justinclift
Copy link

For some reason the drives were pegged and at 115% utilization

Maybe some kind of parity calculation?

@belag
Copy link

belag commented Apr 5, 2024

It seems to be out of stock everywhere? Any ideas who might have them for sale?

@geerlingguy
Copy link
Owner Author

I was told Arace had limited stock and is out, hopefully they will get a new shipment in soon...

@geerlingguy
Copy link
Owner Author

Someone had suggested to also try one of the video editing benchmarking tools; in this case AJA System Test Lite, running it with 5120x2700 5K RED, 16GB, 10bit RGB, uses 52.75 MB IO size, to see if that will max out the write speed. I have also tested with Blackmagic Disk Speed Test in the past. Could see if different media/copy types (besides a straight macOS Finder copy) behave differently.

@ThomasKaiser
Copy link

Could see if different media/copy types (besides a straight macOS Finder copy) behave differently.

The 'problem' with Finder is implementing hidden optimization strategies (3rd time the same link in the same issue: https://www.helios.de/web/EN/support/TI/157.html). As such testing with LanTest with 'Backbone networks, e.g. 40 Gigabit Ethernet' gives more reliable numbers.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Apr 5, 2024

@ThomasKaiser - I understand that, but from that doc, it seems to indicate Finder should be more optimized for the types of copies I'm performing, whereas my experience seems to indicate something is seriously wrong with the SMB implementation on the latest macOS releases... or with some server/client negotiation. It's crazy (to me) that iperf, SFTP, and other more direct methods get line speed no issue, but Finder copies and cp/rsync/etc. using the SMB mount are so shaky and slower.

In this case, I'm actually less interested in the theoretical, and more interested in what's causing the inconsistency with real world use cases (I do a lot of project copying, and faster total copy time for 30-90 GB folders is better for me).

@justinclift
Copy link

To investigate the Finder problem from another angle, maybe something like macOS's equivalent to strace?

https://www.shogan.co.uk/devops/dtrace-dtruss-an-alternative-to-strace-on-macos/

@belag
Copy link

belag commented Apr 6, 2024

I was told Arace had limited stock and is out, hopefully they will get a new shipment in soon...

Thank you!

@teodoryantcheff
Copy link

I was told Arace had limited stock and is out, hopefully they will get a new shipment in soon...

Thank you, @geerlingguy !

@jessepeterson
Copy link

This is exciting. It looks like the JMB585 supports SATA port multipliers. Do you have any to test with? 20 drive arrays seem possible.

I could imagine a mini-ITX NAS case with 5.25" drive cages/backplanes all wired up. You'd just need a power supply with a physical switch and molex+SATA power connectors. And some sort of mini-ITX Pi mounting bracket, of course.

@dragonfly-net
Copy link

This is exciting. It looks like the JMB585 supports SATA port multipliers. Do you have any to test with? 20 drive arrays seem possible.

I could imagine a mini-ITX NAS case with 5.25" drive cages/backplanes all wired up. You'd just need a power supply with a physical switch and molex+SATA power connectors. And some sort of mini-ITX Pi mounting bracket, of course.

maybe check with eSATA port? But, i think speed will be low... maybe for magnetic hdd, but not SSD.

@Ramog
Copy link

Ramog commented Apr 8, 2024

I was told Arace had limited stock and is out, hopefully they will get a new shipment in soon...

Sad I hope this happens soon this seems like the perfect project for me, already got a pi 5, now the Penta SATA HAT would be next. Even for just having a device to copy and move files between disks and what not this would be encredibly useful.

@teodoryantcheff
Copy link

Arace have it in stock now. I just ordered two from https://arace.tech/products/radxa-penta-sata-hat-up-to-5x-sata-disks-hat-for-raspberry-pi-5

@robson-paproski
Copy link

Question, this hat is compatible with OrangePi 5?

@Riverside96
Copy link

@geerlingguy have you experimented with hibernation at all?
I have ordered one regardless, but I don't seem to see an mention of SATA-3.3 hibernation support in the documentation.
My use-case would be 3 hdd's with a small mirrored dir with periodic backup.
Keeping them running would not be an option for me. I can't seem to ascertain this on the discord server & would like to order the drives.

@ThomasKaiser
Copy link

@geerlingguy since we were talking about Finder weirdness and I'm currently testing SMB multichannel between a MacBook and Rock 5 ITX... just did a quick test with three files 2.3 GB in size on a Samba 4.13 share with server multi channel support = yes:

Samba -> Mac constant +500 MB/s:

multichannel-read

Mac -> Samba very flaky numbers, short bursts at +450 MB/s but mostly nothing and the Finder waiting for whatever:

multichannel-write

But note that the network setup is somewhat broken anyway in direction to Rock 5 ITX so my Finder investigations need to be revisited once that is resolved.

@geerlingguy
Copy link
Owner Author

@ThomasKaiser - Thanks for posting that, and that is definitely my experience (though usually not that much of a blip where there's no writing. Definitely something weird, and watching the Console log on the Mac is almost useless :P

@ThomasKaiser
Copy link

ThomasKaiser commented Apr 17, 2024

And one last word about macOS Finder: I seem to have resolved storage problems by creating a FrankenRAID (mdraid-0 out of 4 SATA SSDs and one really crappy NVMe SSD) and am now getting with SMB Multichannel rather consistent 600 MB/s in Finder in both directions:

rock5-itx-finder-copy-multichannel

Full story: https://github.com/ThomasKaiser/Knowledge/blob/master/articles/Quick_Preview_of_ROCK_5_ITX.md#smb-multichannel

So in case you're revisiting the issues you ran into I would strongly recommend to let an iostat 10 running in the background checking for %iowait. Yesterday when writing to the RK3588 thingy with only the crappy NVMe SSD as storage device in TX direction %iowait went up to 10% or even more. With the FrankenRAID everything is fine.

@cmonty14
Copy link

Some usage notes:

  • I had to add dtparam=pciex1 to the /boot/firmware/config.txt to get the HAT to be recognized
  • I also could run it at PCIe Gen 3.0 speeds with dtparam=pciex1_gen=3
  • To get the HAT to fit on top of the Pi 5 with an active cooler, I had to use needle-nose pliers to break off the tops of the three heat sink fins in the corner closest to the Pi's USB-C port. Otherwise the barrel jack would hit the tops of those heat sink fins, and not make full contact with the GPIO pins
  • I could get 800+ MB/sec at Gen 3.0 speeds with an array of four Samsung 8TB QVO SSDs
  • I could get 74 MB/sec writing to a RAIDZ1 array over Samba (using OMV)
  • I could get 97 MB/sec writing to a RAID 0 array over Samba (using bare Linux)
  • I could get 122 MB/sec reading from either array over Samba on the Pi's built-in 1 Gbps network interface
  • I could get 240 MB/sec reading from either array over Samba on a HatNET! 2.5G adapter from Pineberry Pi (this was plugged into a HatBRICK! Commander PCIe Gen 2.0 switch, which had one port to the 2.5G HAT, and one to the Radxa Penta SATA HAT
  • Idle power consumption for the setup with just the Penta SATA HAT was 6W
  • Idle power consumption for the setup including the PCIe switch and 2.5G NIC was 8W
  • Power consumption during disk read/write operations over the network was between 8-16W
  • Peak power consumption while ZFS was doing some sort of cleanup operation or compression was 24W

Hi Jeff,
could you please share some information of the tool you're using for IO benchmark?
I think I have identified fio, but I assume you have used some kind of "wrapper-script" to run a series of qualified benchmarks.

Regards
Thomas

@ThomasKaiser
Copy link

ThomasKaiser commented Apr 21, 2024

could you please share some information of the tool you're using for IO benchmark?

He's using https://raw.githubusercontent.com/geerlingguy/pi-cluster/master/benchmarks/disk-benchmark.sh to be called as explained in any of his sbc-review issues, e.g. this

There are at least three problems with this script, one being a major one:

  • for whatever reasons the script determines sequential read performance with fio and 4 jobs in parallel while sequential write performance will be measured with iozone in a different fashion and as such both numbers don't match (just tested: when fio reports 85 MB/s sequential reads with 4 concurrent jobs, iozone will measure just ~75 MB/s with a single job). fio does allow for non-destructive write testings with also 4 concurrent jobs (which BTW is a synthentic benchmark scenario not matching real-world situations of SBC users), no idea why Jeff doesn't switch to both fio for writes (creating another unrealistic number) or to iozone for both numbers (all that's needed is another -i 1 added to command line)
  • 1M block size and only 100M data size for the iozone tests will not show real performance on many devices
  • but most importantly: disk-benchmark.sh is in reality disk-settings-benchmark.sh since it trusts in whatever (stupid) settings the OS image is running with.

To talk about disk performance a switch to performance governor would be needed prior to execution [1]

Quick test on a Rock 5 ITX with an 256 GB EVO Plus A2 SD card comparing three different settings:

performance (this represents 'storage performance w/o settings involved'):

READ: bw=87.2MiB/s (91.4MB/s), 87.2MiB/s-87.2MiB/s (91.4MB/s-91.4MB/s), io=999MiB (1048MB), run=11459-11459msec
                                                          random    random
          kB  reclen    write  rewrite    read    reread    read     write
      102400       4     2848     2924                      12238     2971
      102400    1024    62283    62087                      77176    61358

In contrast Radxa's defaults since 2022 and Armbian defaults until 2024: ondemand with io_is_busy=1:

READ: bw=81.4MiB/s (85.4MB/s), 81.4MiB/s-81.4MiB/s (85.4MB/s-85.4MB/s), io=935MiB (980MB), run=11482-11482msec
                                                          random    random
          kB  reclen    write  rewrite    read    reread    read     write
      102400       4     2838     2940                      11663     2921 
      102400    1024    60790    62549                      77639    60492                                                                

We see small drops in performance everywhere and also a bit of results variation since 2940 KB/s with ondemand compared to 2924 KB/s with performance can't be the result of settings since no other governor can 'outperform' performance:

Retesting with schedutil since being the new Armbian default from 2024 on and also what many SBC vendors might be using since for their OS images they usually don't think a single second about kernel config but just ship with the Android kernel the SoC vendor has thrown at them:

READ: bw=85.1MiB/s (89.3MB/s), 85.1MiB/s-85.1MiB/s (89.3MB/s-89.3MB/s), io=978MiB (1026MB), run=11490-11490msec
                                                          random    random
          kB  reclen    write  rewrite    read    reread    read     write
      102400       4     2062     2193                       8973     2165
      102400    1024    54671    53655                      61013    54159 

Compared to ondemand with the respective tweaks the important 4K performance dropped by 25%, with larger block sizes it's not that drastic and the fio test with the unrealistic 4 concurrent read jobs even improves (but since we haven't measured at least 3 times we have no idea whether these different numbers are due to different settings or more probably: 'results variation'. Running a benchmark only once is almost always wrong, it has to be repeated at least three times, then standard deviation has to be calculated and if too high either more measurements or results go into trash).

But what these synthetic benchmarks don't tell anyway: real-world storage performance that is easily halved by the switch to schedutil since unlike benchmarks with continuous storage access where the cpufreq driver has a chance to ramp up clockspeeds in real-world situations the clockspeeds will remain low when only short I/O accesses happen. That's what you get when you switch a central setting without any evaluation and obiously 'just for fun' :)

At least it should be obvious that disk-benchmark.sh is not able to report about disk performance but only 'disk performance tampered by some default settings' in its current form.

One might argue using 'OS defaults' would be the right thing since that's what they ship and users have to live with but me as someone who only does 'active reviews' (not just reporting numbers but improving numbers) can't disagree more since the best idea is to run the test in both modes: OS image defaults vs. performance and then pointing the OS image makers at the difference and hint at how to fix this (worked all the time, just not with the Banana Pi and Armbian guys).

[1] for Cluster in /sys/devices/system/cpu/cpufreq/policy* ; do [[ -e "${Cluster}" ]] || break; echo performance >"${Cluster}/scaling_governor"; done

@geerlingguy
Copy link
Owner Author

geerlingguy commented Apr 21, 2024

@ThomasKaiser - To properly benchmark storage solutions, you need to do a lot more than I think either of us do in a typical review cycle for a storage device.

In my case, when it actually matters, I will test across different OSes with 100+ GB files, with folders with 5,000+ small files, and average the eventual total time for the copy back and forth.

The disk-benchmark.sh script is a quick way to get an 'with the default OS image, in ideal circumstances, with smaller files, here's the kind of performance one can expect'. There are huge differences depending on if you use ext3/ext4, ZFS, Btrfs, Debian, Ubuntu, a board vendor's custom distro, performance or ondemand governors (which can change behavior even depending on the distro / image you might be using). It's a fool's game making definitive statements based on any single benchmarks, which is why I only use the disk-benchmarks.sh script for a quick "here's what this board does" thing.

And I do think it's useful to not sit there tweaking and tuning the default distro image for best performance, because I want my tests to reflect what most users will see. If they buy a Raspberry Pi, they will go to the docs and see they should flash Pi OS to the card using Imager.

The docs don't mention setting performance, so I don't do that in my "official" benchmarks. I follow the vendor guides as close as possible, and if their own images are poorly optimized, that's not a 'me' problem. And I'm happy to re-run the entire gauntlet if a vendor reaches out, like Turing Pi did with the RK1.

@ThomasKaiser
Copy link

@geerlingguy doesn't change anything wrt different testing methodology for sequential reads and writes. In case you accept geerlingguy/pi-cluster#12 this will become obvious with future testings and then you might decide to adjust your reporting or not :)

@geerlingguy
Copy link
Owner Author

True; honestly my main concern is to have a few different tests since I know many people just throw hdparm at it and call it a day. I like fio and iozone a lot better, though I have yet to find a way to test all aspects of ZFS filesystems in a way that ZFS caching doesn't interfere (I wish there were a way to tell ZFS 'fill all caches, then run the test', instead of having to copy across tens of GB of files before starting to get more useful data).

@ThomasKaiser
Copy link

True; honestly my main concern is to have a few different tests since I know many people just throw hdparm at it and call it a day.

Correct, that's the garbage the majority of 'Linux tech youtubers' rely on. Ignoring (not knowing) that hdparm uses 128K block size which was huge when it was hardcoded (last century) but is a joke today.

I like fio and iozone a lot better

Both are storage performance tests unlike hdparm (which's benchmarking capabilites were a tool for kernel developers 30 years ago when only spinning rust existed attached by dog slow interfaces)

though I have yet to find a way to test all aspects of ZFS filesystems in a way that ZFS caching doesn't interfere

Simple solution: avoid ZFS for benchmarks and try to educate your target audience about the ZFS benefits (spoiler alert: they don't want this content ;) )

@geerlingguy
Copy link
Owner Author

re: ZFS: Avoiding it is impossible if you want to show people what kind of performance you get on a modern NAS, since it seems like half the homelab world is focused on ZFS, and the other half a split between old school RAID (mdadm), Btrfs, and all the weird unholy things proprietary vendors cobble together (like Unraid).

Also, if you don't mention ZFS when talking about storage, you end up with so many random comments about 'why not ZFS', it's the modern homelab equivalent to 'btw I use Arch' or 'why don't you use [vim|emacs|nano]?' :D

Unavoidable, unfortunately!

Anyway, I plan on deploying this HAT as a replica target for my main ZFS array... we'll see how that works out! Still looking to find a case for it. Too lazy to CAD my own heh

@axiopaladin
Copy link

If you still have this on-hand and would be willing to make a few measurements... How thick of a 2.5" drive can be mounted directly to the hat? Modern 2.5" SSDs are typically 7mm thick, while (high-capacity) 2.5" HDDs (lower speed but much cheaper per-TB) usually come in at 15mm thick. Are those too fat to stack all 4 slots?

@teodoryantcheff
Copy link

@axiopaladin
image

@teodoryantcheff
Copy link

I know this is not the correct place for this question but ... is anyone willing to sell me an FPC cable for the Pi5.

This one:
image

Thought Arace have them in stock, but my order is on back-order, essentially for almost a month now and they cannot confirm when a new batch will be available.
That's why I ordered a hat with a cable for on of their ROCK clones, and now the only thing that I need to use with a Pi is the FPC.
And since as far as I get it, Radxa sends 2 FPCs, I was thinking that one of you beautiful humans may be willing to sell me one of yours.
Will cover shipping (to Europe), will pay for the cable (paypal, revolut...)

Please help me out 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests