Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volunteer: Framebuffer.get_rect_raw(x,y,w,h) #7750

Open
volkerjaenisch opened this issue Sep 5, 2021 · 11 comments
Open

Volunteer: Framebuffer.get_rect_raw(x,y,w,h) #7750

volkerjaenisch opened this issue Sep 5, 2021 · 11 comments

Comments

@volkerjaenisch
Copy link

Hi Micropython People! You are doing a great job!

I am currently developing a MicroPython Eink-Driver for the popular DollaTek T5 Microcontrollers.

My code heavily depends on the framebuffer lib of micropython.

Eink-Displays are quite slow. So the usual way to deal with them is to make partial updates of the display from the framebuffer: Copying a rectangle of the framebuffer to an rectangle in the display.

So I crave for a framebuffer method

get_rect_raw(x,y,w,h)

which returns the raw pixel information of the given area as a byte stream.

For BW this is easy, but with color/intensity things become complicated, since there are quite different pixel representations.

But I think that the functionality to extract a rectangle of the framebuffer as byte-stream is a nice feature to boost performance. Even if the byte-stream has to be postprocessed for the pixel encoding on the python side it is an order of magnitude faster than to query each pixel independently.

I volunteer to come up with a BW pull request.

I am aware that E-Ink displays are slow and that the framebuffer even queried from python by pixel will never be the bottleneck. But why not make the code better?

Cheers,
Volker

@mcauser
Copy link
Contributor

mcauser commented Sep 5, 2021

This could benefit all displays, not just eink.

When extracting the “window” rectangle, you may need to align the x or y in steps that match the byte orientation of the display.
eg. if it’s a “horizontal” display, the x would need to be multiples of 8.
Otherwise you’d need to read pixel data from the display and merge in the window data before writing back, and not all displays support reading the pixel data back out.

Only an issue for mono or greyscale displays where 1x framebuf byte is more than 1x pixel on the display.

@volkerjaenisch
Copy link
Author

volkerjaenisch commented Sep 6, 2021

@mcauser
Thank you for the warm welcome. I will give it a try. Please be gentle with me since C is not the language I really like.

My first goal is to come up with a function for BW display coding. Also I have only the esp32 facility to test the code.

Any comments appreciated

Volker

@peterhinch
Copy link
Contributor

While I see the merits of this it would be good to get some input from the maintainers. Any enhancements to framebuf inevitably add bytes, so there is a tradeoff between functionality and size. I'd be interested in adding graphics primitives (circle (ellipse?), fill_circle, triangle, fill_triangle) but haven't submitted a PR because of doubts whether the increased size would be acceptable.

It would be good to see a plan for framebuf prioritising features to add. Possibly the maintainers consider it complete.

@volkerjaenisch
Copy link
Author

Dear @peterhinch
You are correct that more code adds more bytes. So any bit of code has to be reviewed for being needed and its quality in terms of performance (RAM/CPU).
On the other hand framebuf is a buffer. And buffer have exactly one reason for there existence : They are made for fast input output operations. In case of framebuf the fast partial rectangle output is just not implemented.

A second thought: Why is framebuf in the micropython core at all? Not all systems utilizing micropython have a graphical display. It makes sense to put things like the machine lib in the core as well as the RTC and other hardware drivers. But frambuf is an 'abstract' dataclass not bound to hardware at all.
And a last thought: Your excellent Writer class for fonts is also not in the micropython core.

So I hope that we can agree that the best option would be to separate framebuf out of the core. In that case the core becomes smaller and the framebuf users get more functionality (e.g. circles) without bloat in the core.
But this is a big change to micropython.

A second option may be to write a mpy-module which adds the missing output-functionality to a given buffer instance. But I do not like this since IMHO this functionality is a crucial part of the framebuffer. [Imagine a desktop manager which can only write to the display and is not capable to read rectangle regions efficiently back.]

I will come up with a PR for implementing the missing functionality in the core. Then it is the decision of the community if the PR will find its way eventually into the core.

Cheers,
Volker

@rkompass
Copy link

rkompass commented Sep 8, 2021

How about:

  • ellipse with circle as special case, and width optional argument which has fill_ellipse and fill_circle as specializations
  • triangle with the same options
  • rounded rect with the same options?

I just finished micropython functions for these, triangle width option is missing yet but WIP.
I attach the code for circle rounded rect and ellipse. It's not many bytes :-)

  • line with width > 1 seems desirable too, because it would allow for drawing polygons.
  • functions for conversion of different color formats like @peterhinch s
    _lcopy(dest:ptr8, source:ptr8, length:int) which converts 4-bit (8-bit) colours to 12 or 16 bits
    would also be nice. Screen update times could be cut down to half probably (50 ms -> 25ms on st7735 lcd with blackpill).
    There are many combinations of colors, and to limit code size only such conversions which are needed for screen updates imho should be considered to have in C.

I do not fully understand the original intent behind @volkerjaenisch s request for get_rect_raw(x,y,w,h) . Is it about streaming the bytes out? For something like SPI with DMA? Refreshing the screens in small chunks automatically in the background?
Or is it about changing byte alignments?
Anyway I would love to have all this in the core. Or could a framebuffer module in C be created as a loadable module?

Nice regards, Raul
framebuf_circle_rounded_rect.py.zip
framebuf_ellipse.py.zip

@volkerjaenisch
Copy link
Author

@rkompass

. Is it about streaming the bytes out? For something like SPI with DMA? Refreshing the screens in small chunks automatically in the background?

Correctly!

E-Ink displays are quite slow - a complete refresh can take a good portion of a second. So to use them interactively or with rapid updates one needs partial updates, which are well supported by these displays.

The art is to stream the partial update as a byte stream via SPI.
Also there is (as you already mentioned) the problem with different orientations which have to be calculated on the fly. Doing this in Python is possible (have a look at my driver MicroPython Eink-Driver ) but at least two orders of magnitude slower than in C.

And even if the performance in the ESP32 class of controller is available to do it in Python on smaller controllers it may be not. And in any case I like to build software as

  • readable as possible
  • memory efficient as possible
  • CPU efficient as possible

The architecture of the framebuffer code is good. For each display low level byte orientation there is a special driver. This is cool since to refresh your display you only have to copy the bytes of the whole buffer and bang the display is refreshed.

But on partial updates you have border problems which lead to bit banging which is not really effective to solve in Python.

Also the rotation of bit fields is better done in C than in python.

Cheers,
Volker

@rkompass
Copy link

rkompass commented Sep 8, 2021

Hi Volker,

thank you for the explanation.

@volkerjaenisch
Copy link
Author

You are welcome

@volkerjaenisch
Copy link
Author

volkerjaenisch commented Sep 11, 2021

Dear @peterhinch

I am a bit rusted (25 years no contact) in C. So I looked for the implementation of fill_rect which is in a way the inverse function to my proposed get_rect_raw function.

The most simple case is the horizontal aligned buffer you implemented in 2017.

https://github.com/micropython/micropython/commit/231cfc84a7cae31f93208c334fc33b08278040eb#diff-dd78120930e2f8d9def0f45fad4386ab52d978b1106a1066ac09fee6a181cdde

Reviewing your code I noticed:

  • your code is really small

  • there is not a single line of comments [Why is there not a single line of comments in the micropython core? Did I miss something?]

  • understanding your code (eight lines of code) took me half a day

  • your code is not as performant as it can be. [Why program in C if not for performance?]

In my fork

https://github.com/Inqbus/micropython/blob/master/extmod/modframebuf.c

I optimized your code by a factor of five in runtime, at the cost of 160 additional bytes at a total of 1521456 Bytes.

My test routine instantiates two buffers. One with your method and a second with my method.

Then 100 random fill_rect operations will be made to the two buffers. I think that this is a fair test. The controller is under stress and not in a particulary nice condition to be measured.

import framebuf
from urandom import seed, randint
import utime


seed(123)
               
width = 256
height = 128
rounds = 100


buffer = bytearray(height * width)

frame_buffer = framebuf.FrameBuffer(buffer, width, height, framebuf.MONO_HLSB)

buffer_fast = bytearray(height * width)

frame_buffer_fast = framebuf.FrameBuffer(buffer_fast, width, height, framebuf.MONO_HLSB_FAST)

buffers = (frame_buffer, frame_buffer_fast)

stat = [[],[]]

for i in range(rounds):
    buf_id = randint(0,1)
    t = utime.ticks_us()  
    buffers[buf_id].fill_rect(randint(0,width),randint(0,height),randint(0,width),randint(0,height),randint(0,1))
    delta = utime.ticks_diff(utime.ticks_us(), t)
    stat[buf_id].append(delta)
   

def mean_std(stat):
    sum = 0
    sum2 = 0
    for msec in stat:
        sum += msec
       
    mean = sum/rounds
   
    for msec in stat:
        sum += (msec - mean)**2
   
    std = (sum/rounds)**0.5
    return mean, std


print('normal', mean_std(stat[0]))
print('fast', mean_std(stat[1]))

Results:

>>> %Run -c $EDITOR_CONTENT
[[403, 1176, 467, 410, 556, 1069, 196, 308, 496, 481, 584, 826, 329, 385, 250, 525, 108, 501, 320, 603, 827, 488, 1245, 677, 862, 1161, 893, 489, 1226, 1343, 138, 118, 345, 904, 487, 252, 1764, 726, 118, 320, 367, 193, 2624, 208, 918, 923, 195, 1349, 139, 356, 752, 897],

[173, 115, 175, 155, 122, 195, 150, 144, 112, 207, 131, 121, 148, 191, 156, 124, 164, 120, 197, 134, 120, 119, 116, 190, 112, 188, 144, 118, 111, 117, 154, 170, 131, 116, 122, 188, 186, 155, 141, 137, 114, 111, 114, 110, 160, 115, 149, 117]]

Mean/standard-deviation

normal (335.13, 407.4322)

fast (70.56, 57.20432)

Normal is your code, fast is mine.You may like to notice that the standard deviation in your code is above the mean. In my code it is a bit below. This states for robustness of my algorithm.

My optimization is not done. Theoretically a factor of eight can be gained against your implementation but I only achieved 4.7. As I said, I am not familiar with C.

Coming back to your point of code bytes.

  • If you have a RAM restricted environment the 1.5 MB of MicroPython will kill you in any case.
  • The only way to limit the RAM consumption of MicroPython is to modularize it.
  • So your initial claim "to save code bytes" falls short. Or did you really implemented a bad algorithm to save bytes of code?

Don't get me wrong. I am solely interested in clear documented code on which I base my fortune.

Cheers,
Volker

@peterhinch
Copy link
Contributor

your code is really small

That was the idea. In general MP is optimised for size rather than performance.

there is not a single line of comments

When submitting a PR I follow the project guidelines. Comments are used for code which is not readily understood by an experienced C programmer. Comments that explain C are considered to be noise.

your code is not as performant as it can be. [Why program in C if not for performance?]

The aim is to be much faster than Python in the smallest number of bytes.

Congratulations on optimising for performance. By all means submit a PR: it's up to the maintainers whether to prefer fast or small.

I will not get involved in debate on the merits of these MP conventions. I am not a maintainer.

@volkerjaenisch
Copy link
Author

volkerjaenisch commented Sep 13, 2021

Dear @peterhinch

Sorry for the whole disturbance. It is my fault. I did not read the manual.
My personal objective is to optimize code where ever I go for performance.

Sorry for the criticism of your code. Your code is perfect for a Code-ROM constricted environment.

there is not a single line of comments

When submitting a PR I follow the project guidelines. Comments are used for code which is not readily understood by an experienced C programmer. Comments that explain C are considered to be noise.

IMHO these Guidelines are Bullshit. Any code in any language has to be documented, thoroungly.
I agree with you that code comments do not have to explain the language itself.

But your famous 8 lines of code express an algorithm. This algorithm goes straight against all normal tackling of the problem. You roll up the problem in columns while the data structure is organized in rows. This is no straight forward code. This has to be explained.

And IMHO any algorithm has to be documented. And in most cases the paper where the algorithm has come from has to be cited.

Just look at the C-Python code for a positive example.

Sorry for exposing you here. My critic holds for the whole micropython core.

Cheers
Volker

tannewt added a commit to tannewt/circuitpython that referenced this issue Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants