Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with speed difference between SBCs #127

Closed
mattblovell opened this issue Oct 23, 2020 · 15 comments
Closed

Help with speed difference between SBCs #127

mattblovell opened this issue Oct 23, 2020 · 15 comments
Labels

Comments

@mattblovell
Copy link
Collaborator

mattblovell commented Oct 23, 2020

To be clear, this is not any actual problem with luma.lcd. I'm filing it as a means of asking for help in how to debug a speed difference I've observed between two SBCs. Pointers or suggestions would be appreciated, after which this issue can just be closed.

Note that one of the marketing points for the C4 is improved SPI frequency.

Also recall, from one of my earlier filings, that RPi SPI frequency appears to be constrained to an even frequency, rather than a frequency that is a power-of-2. Opening that up in luma.core's serial.py would permit a wider set of usable frequencies. I have locally hacked on the installed serial.py for the experiment below.

Type of SBCs

Raspberry Pi 3 Model B v1.2
Odroid C4

Linux Kernel version

RPi: Linux raspberrypi 5.4.51-v7+ #1333 SMP Mon Aug 10 16:45:19 BST 2020 armv7l GNU/Linux
C4: Linux C4 4.9.113 #1 SMP PREEMPT Fri Aug 14 20:30:41 UTC 2020 aarch64 GNU/Linux

Other versions

luma.core: 1.17.1 on both
luma.lcd: 2.5.0 on both
luma.examples: commit ceb01960 on both (from Wed Oct 21 16:53:13)
spidev: 3.4 on both
Pillow: 7.2.0 on both
RPi.GPIO: 0.7.0 on RPi, 0.6.3post1 (the RPi.GPIO-Odroid version) on C4

Both displays are ILI9341, connected as the sole device on each SBC's spi0 outputs.

Expected behaviour

On both SBCs I am trying the following demo from luma.examples:

python bounce.py \
  --interface spi --display ili9341 \
  --width 320 --height 240 \
  --backlight-active high \
  --gpio-reset-hold-time 0.2 --gpio-reset-release-time 0.2 \
  --spi-bus-speed 48000000

Actual behaviour

On RPi, the bounce demonstration noticeably increases in FPS as the SPI frequency is increased. The attached photo shows the results when running with a 50 MHz spi-bus-speed, getting almost 19 FPS. (It varies over time, with the photo showing 18.9. Most of the time it is running just a bit over 19.)

The second photo shows the C4, running with 48 MHz. Its FPS is behaving identically to when the bounce demo is invoked with an 8 MHz speed.

RPi:
rpi_bounce_speed

C4:
odroidC4_bounce_speed

The command line option is certainly getting parsed on the C4, for specifying spi-bus-speeds lower than 8 MHz has a definite effect. I can get it go far slower!

I'm trying to understand where the "bottleneck" is, given the lack of any error responses from spidev when speeds >8 MHz are specified. Any suggestions?

As I mentioned at the start, this issue is just a request for assistance. If we can't dream up any debug ideas, I'll just close it out.

@mattblovell
Copy link
Collaborator Author

mattblovell commented Oct 24, 2020

The Odroid wiki has an application note covering the C4 (as well as other boards):

https://wiki.odroid.com/odroid-c4/application_note/gpio/spi

Trying the tiny spidev utility program from that page, with a jumper cable installed from MOSI to MISO, I cannot get it to fail, even with ludicrous frequencies!

My suspicion is that the ioctl() call for SPI_IOC_WR_MAX_SPEED_HZ can just fail silently. If that's the case, I'm not sure how to proceed.

Since CoreELEC maintains their own version of the Linux kernel, I've asked on that forum whether they know of any artificially low bound on SPI frequency for the C4.

@rm-hull
Copy link
Owner

rm-hull commented Oct 24, 2020

Hi @mattblovell
First off, rm-hull/luma.core#194 may be mildly relevant (take a look at the latest comment)

Secondly, the SPI bus speed power of 2 thing (https://github.com/rm-hull/luma.core/blob/master/luma/core/interface/serial.py#L297 ) probably hasn’t kept up to date with reality, so if you have experimented with higher speeds and other numbers that seem to work, please submit a PR 🙏

Third, I’m contemplating trying to improve the performance of the diff-to-previous algorithm so that it finds multiple areas of change rather than just one bounding box. This would massively improve the performance of the bouncing ball demo for example.

Lastly, it would be interesting to run general system benchmarks on the odroid vs. rpi, just to rule out the differences. I did some benchmark experiments with orange pi zero a few years back when they first came out and they were pretty terrible compared to the rpi b2 at the time, although they looked good on paper

@mattblovell
Copy link
Collaborator Author

Hi @rm-hull,

You raise an excellent point regarding benchmarking. I'll have to see if there's something I can get running on both SBCs. There are benchmark comparisons against the RPi4 for CPU, GPU, and DRAM on Hardkernel's C4 page (scroll down just a bit), but it would be nice to confirm them.

Even armed with that information, though, SPI performance could differ in appreciable ways. The C4 makes use of the spi-meson-spicc device driver, and a comment block at its start speaks of only having a PIO mode implemented.

My change to serial.py is pretty much a one-liner. Is there a way to make a "small" Pull Request, something short of forking luma.core? :)

Thanks,
Matt

@rm-hull
Copy link
Owner

rm-hull commented Oct 24, 2020

If it’s a one liner change here (or elsewhere for that matter) then just let me know and I will push it out in the next release.

@thijstriemstra
Copy link
Collaborator

@mattblovell I noticed you're using an old version of Pillow. Pillow 8 was released recently and it seems to produce much better results, but I wonder if performance also improved?

@mattblovell
Copy link
Collaborator Author

mattblovell commented Oct 24, 2020

@rm-hull, the change to serial.py is indeed quite a small one:

@ -294,7 +292,7 @@ class spi(bitbang):
                  bus_speed_hz=8000000, transfer_size=4096,
                  gpio_DC=24, gpio_RST=25, spi_mode=None,
                  reset_hold_time=0, reset_release_time=0, **kwargs):
-        assert(bus_speed_hz in [mhz * 1000000 for mhz in [0.5, 1, 2, 4, 8, 16, 32]])
+        assert(bus_speed_hz in [mhz * 1000000 for mhz in [0.5, 1, 2, 4, 8, 16, 20, 24, 28, 32, 36, 40, 44, 48, 50, 52]])

All of those (well, the higher ones, I didn't try going extremely slow) worked with the ILI9341 display I have. I tested 60 MHz as well, and it did work! Of course, bounce.py performance does not keep increasing with increasing SPI frequency (maxing out in the 18.5 to 20 FPS range).

@thijstriemstra, I'll see about upgrading Pillow. On RPi, I've just been using what pip provides. On the Odroid, I used the package that entware provides.

Thanks,
Matt

Update: I just tried the 500 kHz speed. It works as well, if you want to call ~0.7 FPS working. :)

@thijstriemstra
Copy link
Collaborator

On RPi, I've just been using what pip provides.

On a clean install you'll get Pillow 8 since a week now, I'd recommend upgrading so issues cannot be blamed on older versions of Pillow since we switched our tests to it as well.

@mattblovell
Copy link
Collaborator Author

mattblovell commented Oct 24, 2020

I just checked. I think the forced re-install of luma.lcd and related modules yesterday picked up pillow 8.0.1.

All of the luma.example demos are still happy. (The "object has no attribute '_pwm'" error still occurs.)

rm-hull added a commit to rm-hull/luma.core that referenced this issue Oct 25, 2020
See rm-hull/luma.lcd#127 (comment)

Co-authored-by: Matthew Lovell <notifications@github.com>
rm-hull added a commit to rm-hull/luma.core that referenced this issue Oct 25, 2020
See rm-hull/luma.lcd#127 (comment)

Co-authored-by: Matthew Lovell <notifications@github.com>
@mattblovell
Copy link
Collaborator Author

On the Odroid C4 user forum, scope captures were presented that definitely show the C4's SPI clock capable of hitting 100 MHz (though clearly degraded somewhat at that frequency). So, the lower bounce.py performance on the C4 (compared to the RPi3) must reside elsewhere (e.g., limited SPI transfer size, PIO communication model, etc.)

They had the suggestion of trying Hardkernel's latest Ubuntu image, rather than the CoreELEC OS installation I'm currently running on the C4. I'll give that a try.

I suspect that I'll just end up closing this issue, since (given the identical code base installed on the C4 and the RPi3) there's no blame to be placed on luma.core or luma.lcd.

@rm-hull
Copy link
Owner

rm-hull commented Oct 26, 2020

FYI I’m bubbling up a new PR which promises some big performance improvements .. see #129. I am seeing somewhere around 90FPS on the bounce demo on ILI9341 with the new version on RPi4

@mattblovell
Copy link
Collaborator Author

I am seeing somewhere around 90FPS on the bounce demo on ILI9341 with the new version on RPi4

That's impressive!

@mattblovell
Copy link
Collaborator Author

@rm-hull , you may be interested in this Odroid C4 forum post:

https://forum.odroid.com/viewtopic.php?p=309888#p309888

We have found the root cause [for the slow performance] and have made a patch for fixing that.

Finally, we can see 31~32 fps for the bounce.py example with 72MHz SPI bus speed. Please see the below video.

@rm-hull
Copy link
Owner

rm-hull commented Oct 30, 2020

That's fantastic. Reading that whole thread, they investigated and patched the kernel in a few days ... good effort!

@mattblovell
Copy link
Collaborator Author

I think this issue can be closed out, between Hardkernel's changes for the Odroid C4 and in anticipation of @rm-hull's in-progress performance update in #129.

@mattblovell
Copy link
Collaborator Author

Here's the device driver change for Odroid:

hardkernel/linux@98774b1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants