New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capture performance discussion from issue 918 #971
Conversation
Add performance tips and reasonable limits to expect.
@hzeller do you feel this would be useful to include so that people have a better idea how far they can push all this now that the panels that let us hit those limits, are readily available? |
The discussion here including the rules of thumb is very specific to your anecdotal experience with 64 pixel high displays (what is an expected maximum refresh etc). However, the range of panels and refresh rates etc. is vastly different and it is hard to suggest what a good expectation should be. So keep it a bit more generic and not describe what to expect, but to just include tips to describe what parameters can influence the refresh rate and what should be considered
A lot of these things are already described in the section with the relevant options so you should emphasize in your paragraph a good systematic approach that goes beyond the flag descriptions how to work on the refresh rate (switch on --led-show-refresh, and which options to tiddle; include led-limit-refresh to counter flicker etc.) |
thanks for the review. Will rework this accordingly. |
@hzeller I had another shot. |
@hzeller gentle ping :) |
Hi @hzeller, any time to review this? |
Sorry, I am currently involved in a ton of Covid-19 response projects; will get back to this once that load lightens. |
@hzeller Just found this abandonned tab in one of my browsers :) not sure if you're still working on response projects? |
I do ( part of https://www.covidshieldnexus.org/ ), but now the 3D printers are mostly humming away on their own. |
@hzeller Maybe now is a better time? :) |
Hi @hzeller how goes? :) |
Cool, thanks! |
Hi, I am currently in process of manufacturing RGB led boards I have made 16x48 matrices both with straight scan and interleaving at 48 |
The refresh rate is dependent on scan rows and not on type of multiplexing |
The speed is set with the gpio slowdown, and with a current Pi4, we need to slow down things a lot to work with the panels to be in the 20Mhz range. The color output is of course not done with PWM (which would be strongly limited by the clock speed) but binary code modulation 11 bit resolution. The clock speed of the panels still affect the few lower bits as limiting factor. The 11 bit linear resolution is needed to get enough detail to satisfactorily convert a 8-bit value with exponential luminance curve. So from a marketing perspective, we could say 33 bit color resolution ... but of course that translates to regular 24 bit color. There are many parameters that influence each other (e.g. the length of the chain determines how much we're waiting for the clocking vs. the Output Enable. As we parallel clock things in, we're only limited by the clocking in the lower bits) that I usually suggest to do a dry-run on a Pi with |
Yes, the 710fps figure is right if you dont' take OE time into account. But OE is the dominant part of the time spent for the highter bits. Clock rate needs to be tweaked with the panels at hand, some are faster, some are slower; also how short the cables are etc. Raw color palette is (2^11)^3. |
No, the The 11 bits are bitplanes. Each bit plane is shifted out and left on with output enable a different time: |
|
Adding to this conversation it should be noted that the MSB is always sent along with its time multiple, which is currently 1024 times lsb-nanoseconds. If you only enable 3 bits of PWM the on time is 1024 + 512 + 128 times lsb-nanoseconds. The off time however is the amount of time it takes to shift out three times. For long changes this will be longer and thus decrease the display brightness which will lower the average power. However the peak power will still be unchanged. Your refresh may be constrained by on time or shift time. The Linux scheduler does come into play, but less so on multicore with core reservation. However there other cases which can impact performance such as memory operations and other processes using GPIO. These are arbitrated and can cause massive refresh drops randomly if not factored in. To factor these in you simply load the system in the worse case configuration then edit the settings to the desire min. The refresh rate will spike when unloaded and drop when loaded. Note perfection is not likely possible here. There are high priority events which cannot be blocked or filtered out. GPIO speed has a significant impact on shift time. This should be kept around 15MHz maybe up to 21MHz. The speed achievable is depending on the version of Pi used. Higher refresh is possible with FPGAs using something like S-PWM without losing as much quality here. However I put a pull request which makes an attempt to add it here. Note is not recommended in all cases. BCM is used here to reduce the amount of memory and processing cycles required to divide the LED current to give color shades. CIE1931 is used map color shades into current division steps. To get 256 color shades per color you need just over 11 bits of information. However in some panels this is may not be completely possible. You can still get decent color depth with less bits. For long chains there is not enough time for the lower bit planes to really appear without collapsing the refresh. Therefore their brightness is reduced. At a certain point you might as well not send them. You can configure the library to send them but they are just overhead. Fancy LED drivers with built in PWM do exist. However these are more expensive and are not supported. These require continuous IO updates which as mentioned with BCM can be problematic. It is technically supportable, however these are completely different. You get large amounts of bits for gamma correction, but the color depth may not improve significantly despite the increase in overall shift efficiency. BCM the timer created by PinPulser serves another very important function. It prevents memory bus arbitration. Concurrency is achieved by shifting will the previous row is being shown. This allows for a larger window of time per bit to shift a bit plane. This reduces the amount of memory operations required. Therefore if any instability exists it will not cause a significant problem. Hardware PinPulser is recommended because it shuts off the display instantly via uses of OE signal directly from PWM timer. Enabling much smoother operation and color consistency. This allows a lower bit plane to not out shine a higher bit plane. Software version is only for compatibility with different pin mappers. Linux scheduler is slow and cannot be trusted to multiplex the display. By default the Pi uses a 250Hz tick rate, which is for blocking operation on single core. However it is not enough, this is possible on some RTOS systems. Kernel threads will not likely fix this. Moving to RTthreads will likely cause compatibility issues. Using background thread library is done to avoid certain things which may be viewed as hackish. Most notably memory bus overhead, which can promote refresh turbulence. An IO processing including ping apparently is capable of causing issues. FPGA is capable of using PWM instead of BCM due to processing efficiency. It is also capable of doing S-PWM. However FPGA is a little harder to manage complexity with, depending on skill set. FPGA is also more expensive and would likely need its own memory interface. Further increasing the cost and complexity. DMA is slow because of a few reasons. DMA is not likely pipelined or optimized. Generally requires something like 5 cycles per single shot. Burst transfers are better but unsupported by GPIO. Memory locality does not exist for DMA. Main memory device is slow burst DDR not SRAM. CPU and DMA are clocked from different clock domains. Memory and GPIO likely occupy two different buses in terms of clock and size. These are just guesses but the L1 cache speeds up the CPU quite a bit. Basically BCM is time hack. This implementation uses lsb-nanoseconds as time window. This implementation starts from the most significant bit and works down. This required by CIE1931, which can be disabled via direct mapping option. For higher quality you may need to adjust this. For better performance on longer chains you may need to adjust this. However overall don't. Any support for additional LED drivers will be likely be constrained not by the serial clock but by the memory operations. Which as it would appear are protected by worst case arbitration to be more an enough depending on Pi version. PWM on OE is likely possible for GCLK. However there is more too it than that but this library would need to be completely refactored for that to even be possible. |
Note aggressive usage of set pixel may cause memory arbitration. However this is decoupled from the background thread when properly configured. Timing instability is protected via time windows. Again when properly configured. Note CIE1931 is used which causes massive drops in average power for brightness reductions. Peak power is still not changed, but that may not matter in certain cases. |
#918
Add performance tips and reasonable limits to expect.