Fast access of bitmap buffer with numpy #45

jiong3 · 2017-02-05T14:20:12Z

Hi,

currently the bitmap buffer can be accessed using freetype.Bitmap.buffer which returns a python list of all the bytes. Then I can use np.fromiter to get a numpy array, however, due to the python loop through all the bytes, this is really slow.

Is there a way to access the memory that the buffer points to directly with numpy? Anything I have to consider if I try to do that?

rougier · 2017-02-05T17:16:14Z

Good point. I think numpy.frombuffer might be useful in such a case but I've never really experienced it. However I think this might be a good starting point.

jiong3 · 2017-02-07T20:15:45Z

So I had a look around on the internet and found different ways to do that:

@staticmethod
def get_np_array0(bitmap, num_bytes):
    # 34.625 / 36.874
    return np.fromiter(bitmap.buffer, dtype=np.uint8)

@staticmethod
def get_np_array1(bitmap, num_bytes):
    # 19.933 / 21.485
    return np.fromiter(bitmap._FT_Bitmap.buffer, dtype=np.uint8, count=num_bytes)

@staticmethod
def get_np_array2(bitmap, num_bytes):
    # 0.037 / 1.158, int_asbuffer is not documented
    return np.core.multiarray.int_asbuffer(ctypes.addressof(bitmap._FT_Bitmap.buffer.contents), num_bytes)

@staticmethod
def get_np_array3(bitmap, num_bytes):
    # 0.418 / 1.540, potential memory leak according to github issue 6511
    return np.ctypeslib.as_array(bitmap._FT_Bitmap.buffer, (num_bytes,))

@staticmethod
def get_np_array4(bitmap, num_bytes):
    # 0.072 / 1.242
    bfm = ctypes.pythonapi.PyBuffer_FromMemory
    bfm.restype = ctypes.py_object
    buffer = bfm(bitmap._FT_Bitmap.buffer, num_bytes)
    return np.frombuffer(buffer, dtype=np.uint8)

@staticmethod
def get_np_array5(bitmap, num_bytes):
    # 0.079 / 1.145
    buffer = ctypes.cast(bitmap._FT_Bitmap.buffer, ctypes.POINTER(ctypes.c_ubyte * num_bytes))
    return np.frombuffer(buffer.contents, dtype=np.uint8)

The numbers in the comments are from cProfile (cumtime of get_np_arrayX) / (cumtime of main function), just to get an idea of the performance. I rendered 10000 characters.

Two things I am not sure about and that might be relevant:
When is the memory of the buffer freed?
When is bitmap.pitch different from bitmap.width, and when is it negative?

rougier · 2017-02-07T21:05:26Z

Nice ! But your last question reminds that we may have a problem with width/pitch difference.

The explanation can be found here:
https://www.freetype.org/freetype2/docs/reference/ft2-basic_types.html#FT_Bitmap

I'm not quite sure I understand it correctly.

jiong3 · 2017-02-08T07:17:55Z

Here's another explanation of the pitch:
https://www.freetype.org/freetype2/docs/glyphs/glyphs-7.html

The way I understand it is that for just reading the buffer into a numpy array, num_bytes = rows * abs(pitch) should work correctly in all cases. If the pitch is negative the order of the rows has to be reversed (easy to do in numpy). Since the pitch is the number of bytes per row and width the number of pixels, for a normal grayscale (1 pixel = 1 byte) both are the same however if it's a black and white image (1 pixel = 1 bit) you have to unpack the pixels. That should be equally easy on a numpy array.

I think it would make sense to include something that can be used directly with np.frombuffer into the library, maybe method number 4 or 5.

The remaining question is, should the user immediatly create a copy of the array? Since I am not sure how and when the memory of the buffer will be freed.

rougier · 2017-02-08T09:03:54Z

We can also directly return a copy (just in case). I think freetype can free the glyph anytime so it might be safer to return a copy.

rougier · 2017-03-18T17:55:14Z

@StephewZ Can you open a new issue for this problem ?

HinTak · 2017-04-18T01:43:44Z

Sigh. You guys don't understand what 'pitch' is. It is not the same as width, nor number of pixels in gray. It is a memory offset. It is the same concept as what is called 'stride' in numpy lingo. (see https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.ctypes.html , or whatever else is available on numpy).

The idea is that computers are a lot more efficient when dealing with say, 4-bytes or 8-byte chunks. So when you want to faster-forward or backward in memory, you want to do so in such units, instead of bytes.
For bits, it is obvious that pitch is AT LEAST (the number of bits rounded up to multiple of 8)/8, since you can't fast-forward by half a byte. But for grays, you might have stride being width rounded up to multiple of 4, or 8, depending on whether you are on a 32-bit or a 64-bit platform.

Pitch is the distance between the two memory locations of the beginning of row1 and row2, etc. It is always larger than (bit-depth * pixel width) /8 , because memory locations like to be aligned to multiple of 4 or 8, depends on platform. i.e. if you have 17 pixels of gray per row, it is possible that stride can be 20 or 24.

It is called pitch by some, but called stride in numpy's multi-dimensional array type's documentation.

jiong3 · 2017-04-19T19:03:57Z

Sigh. You guys don't understand what 'pitch' is.

?

As I wrote above, the pitch is the number of bytes per row. According to the documentation, "FreeType functions normally align to the smallest possible integer value". So for grayscale bitmaps width and pitch are likely equal, unless the alignment is changed. In the common case of accessing the buffer as a whole an alignment of the rows to 2 or 4 bytes wouldn't be faster anyway.

HinTak · 2017-04-20T22:56:54Z

No, pitch is not number of byte per row. It is the distance between two rows in bytes. Can you not read?

In cairo lingo, it is also called stride. Cairo even have a special function for converting/calculating stride from width. This tells you stride is not the same as width.

I am concerned that you are proposing fast but wrong code. Code that is wrong, is wrong, whatever the speed.

HinTak · 2017-04-21T00:31:01Z

You also do not seem to be able to read documentation - "normally" means "most of the time" . It is meaningless to quote that sentence in this context.

jiong3 · 2017-04-21T04:43:22Z

It says so in the documentation:

The pitch's absolute value is the number of bytes taken by one bitmap row [...]

I never suggested not to test for pitch != width, but since they are equal in the most common case this is what should be optimized for.

It is always larger than (bit-depth * pixel width) /8, [...]

That's wrong.

jiong3 · 2017-04-21T05:40:02Z

In general, how should the buffer be handed to the user? As a raw buffer, numpy (dependency) or python array, with or without padding, bits unpacked to bytes?

rougier · 2017-04-21T07:54:32Z

Goign back to the numpy handling, I think it would good to return a copy by default. We could provide an option to not make a copy, but we don't have real control on when the buffer will be freed.

HinTak · 2017-04-21T14:11:11Z

I think numpy itself is the problem. images are not numbers. The problem is that you insist on thinking of images as array of numbers. Performance could be much better moving to a toolkit which explicitly cater for in terms of imagning and drawing concepts, such as cairo. (and various python binding of cairo). The composite code in the worldle example would be a lot simpler and also a hell lot faster if re-written as cairo image surface compositing. You let cairo handle the semi-tranparancy, instead of python looping by hand over the pixels as numbers, numpy style.

rougier · 2017-04-21T15:31:03Z

The reason to use of numpy in the wordle example was mostly to have an easy way to test for collision. It does not pretend at anything else. I agree cairo (or the antigrain library) would be a better solution for manipulating/compositing images and drawing but that's a separate problem. Examples are really and only illustrations on how to use the library.

HinTak · 2017-04-21T16:18:55Z

"Examples are really and only illustrations on how to use the library." - well, that's what I think about comments on speed and memory usage of the examples. If you want speed (or memory efficiency), you write your code entirely differently.... and you are not even using any of the vector maths operations offered by numpy , which is another problem with using numpy - you are not using numpy properly for its main strength. All the examples of http://github.com/ldo/python_freetype (in http://github.com/ldo/python_freetype_examples ) uses cairo. And they are a lot faster than any of the ones here too! A pity (1) they use another new custom cairo python binding instead of pycairo (very much "not invented here" symptom) , (2) it is python 3 only, (3) the coding style is terrible - besides the one-big-file-as-source-code code organization. I am tempted to extract the freetype bitmap to cairo surface code from that as a stand-alone routine. The comment about gray being the most common also seems out of place. The most common imaging case is really 24-bit colour; follow by bitmap (i.e. black/whilte). 8-bit gray is really the least common usage of freetype.

rougier · 2017-04-21T20:09:40Z

A stand-alone cairo example would be a nice addition.

HinTak · 2017-04-23T05:52:06Z

So much for trying to extract the cairo surface code from the other freetype binding - it is simply wrong :
ldo/python_freetype#1
ldo/python_freetype_examples#1

That said, my corrected version is a hell lot faster than the numpy versions... Yes, I am already timing my standalone cairo example. I think numpy is just slow.

HinTak · 2017-04-25T15:33:15Z

I have rewriiten 6 of the samples with pycairo. glyph-{monochrome,alpha,color}, hello-world, example1, and wordle . The last one is the most difficult one - I needed to use a feature newly added to pycairo 1.11 (released two weeks ago), and cannot pack as tightly as the original. OTOH, cairo can paint partly off-buffer, so you can see the difference.

And it is a hell lot faster too...

HinTak · 2017-04-25T15:34:55Z

The cairo based wordle drawing. I cannot pack as tight, but can draw partly off screen.

HinTak · 2017-04-25T15:35:47Z

cairo-based glyph-alpha

HinTak · 2017-04-25T15:36:45Z

cairo-based glyph mono-chrome

HinTak · 2017-04-25T15:38:00Z

glyph-color

HinTak · 2017-04-25T15:39:28Z

The boring example1, no visual difference other than it being a lot faster.

HinTak · 2017-04-25T15:40:35Z

The hello world example.

HinTak · 2017-04-25T15:53:42Z

Since they are proper drawings rather than plots, there are no axes or padding around the figure, nor any grid lines.

glyph-outline.py is essentially half of glyph-color so I'm not going to do it; glyph-vector-2.py have grid lines. I can't really do glyph-lcd . So the above covers all the numpy-based plot example. (there the gl example also uses numpy but I'll let you figure that out...).

When I get the samples cleaned up, and adding some comments on limitations, etc, I'll issue a pull.

rougier · 2017-04-25T17:00:27Z

@HinTak Thanks, nice results. For the PR, it would make sense to add all of them with the "-cairo.py" extension and to keep the old ones (or to have a dedicated cairo subdir) because it requires an extra dependency. For the wordle example, I think the difference come from the collision test. Probably cairo uses bouding boxes and this prevent one text to be drawn over another one even if the glyphs do not collide.

@jiong3 Do you think you're ready to make a PR from your tests and out discussion ?

HinTak · 2017-04-25T17:27:21Z

Yes, that's what I have been doing - 5 *-cairo.py, and an extra bitmap_to_surface.py which consists of extracted, afjusted and bug-fixed routines from the other freetype binding. There is at the moment no separate glyph-monochrome vs glyph-alpha - they differ only by one-line (TARGET_MONO/TARGET_NORMAL) so I just comment/uncomment the alternatives at the moment.

HinTak · 2017-04-25T17:35:57Z

Also I found some of the numpy examples doing y-direction flips - worldle does it at least twice :-(. And also the arrays having width and height in fortran indexing style... haven't seen them in a while...

rougier · 2017-04-25T19:17:15Z

Y-flip in an error, matplotlib can take care of that actually. For numpy array they are C-order but indexing if row (=y) / column (=x).

HinTak · 2017-04-25T23:31:08Z

Viewing vs the saved images differing is a bit painful. The original wordle draws things up-side-down and display it up-side-down, then save it the correct way up. That numpy/matplot can cope isn't quite the point. Anyway, the cairo based one all have things drawn the same way up it is saved. Actually I don't display with any of cairo's display backend, but just save to file then launch python pillows's image displayer.

HinTak · 2017-04-26T20:33:11Z

I have decided to add a cairo version of glyph-outline anyway, quite trivial since it is just half of glyph-color.

The pull is at
#55

HinTak · 2017-04-26T20:34:47Z

BTW, the outline example has an transparent background, whereas I paint most of the other's background grey first. PIL displays transparent as black; I have another viewer displays it as white. Gimp shows a checkerboard pattern for transparent pixels.

I have also changed my mind about editing to change between mono or alpha modes of the combined mono+alpha example. It defaults to alpha but if you put any argument to it, it draws mono. Explained in the comments at its top.

jiong3 · 2017-04-27T05:40:30Z

@rougier No, but if anyone wants to make a PR I would suggest option 4 or 5, or maybe something using a python array (which I haven't tested so far).

HinTak · 2017-04-27T18:54:35Z

Here is an example when I got the stride/pitch wrong - noticed how some of the tiles collides? (only a few).
The corrected code/figures are #55 (comment) #55 (comment) (two, depends on whether one's pycairo is latest). Drawing partly over the edge requires latest pycairo.

HinTak · 2017-04-29T17:49:33Z

I thought I couldn't do the LCD example in cairo - but it get better as I get more familiar. So I have added the LCD_V case side-by-side too:
#55 (comment)

The cairo LCD example is about 4 times after than the old; with two panels, it probably means 8x .

As I get more familiar with pycairo, I feel like I could probably rewrite glyph-vector-2 also. It is a vector drawing on top of a bitmap. After that, there is only one file which uses the slow data.extend(bitmap.buffer... idiom: texture_font.py, which is used by the gl example.

HinTak · 2017-04-29T17:59:09Z

To answer an early question: I think you can get negtive pitch if you use a reflecting transform. i.e. if you do a FT_Set_Transform with a matrix which has a negative determinant. I haven't tesed this, but e.g. if you set up example_1 to use matrix = (-1 0, 0, 1) or (1, 0, 0, -1) instead.

Only two examples do FT_Set_Transform at the moment. So, example_1 and wordle would break if they ever get extended to use FT_Set_Transform that way.

HinTak · 2017-04-30T20:40:53Z

I am done with converting/rewritng all the examples from the slow numpy/matplot drawing over to cairo:
#55

A side-effect is none of my code uses the stupid data.extend(... idiom; the *-cairo.py versions are all a lot faster. There is only one data.extend(... left, in texture_font which is used by subpixel-positioning, which uses opengl for drawing so I do not touch.

So I am going to look at the perl-binding of freetype now. It should be obvious by now that I know freetype well and just looking to use it with a different language than C.

rougier mentioned this issue Apr 19, 2017

opengl.py and wordle.py can get broken by auto-clean up of FT_Bitmap #53

Open

HinTak mentioned this issue May 4, 2017

pycairo based rewrite of the drawing examples. #55

Merged

Fast access of bitmap buffer with numpy #45

Fast access of bitmap buffer with numpy #45

Comments

jiong3 commented Feb 5, 2017

rougier commented Feb 5, 2017

jiong3 commented Feb 7, 2017

rougier commented Feb 7, 2017 • edited Loading

jiong3 commented Feb 8, 2017

rougier commented Feb 8, 2017

rougier commented Mar 18, 2017

HinTak commented Apr 18, 2017 • edited Loading

jiong3 commented Apr 19, 2017

HinTak commented Apr 20, 2017 • edited Loading

HinTak commented Apr 21, 2017

jiong3 commented Apr 21, 2017

jiong3 commented Apr 21, 2017

rougier commented Apr 21, 2017

HinTak commented Apr 21, 2017 via email

rougier commented Apr 21, 2017

HinTak commented Apr 21, 2017 via email

rougier commented Apr 21, 2017

HinTak commented Apr 23, 2017 • edited Loading

HinTak commented Apr 25, 2017

HinTak commented Apr 25, 2017

HinTak commented Apr 25, 2017

HinTak commented Apr 25, 2017

HinTak commented Apr 25, 2017

HinTak commented Apr 25, 2017

HinTak commented Apr 25, 2017

HinTak commented Apr 25, 2017

rougier commented Apr 25, 2017

HinTak commented Apr 25, 2017

HinTak commented Apr 25, 2017

rougier commented Apr 25, 2017

HinTak commented Apr 25, 2017

HinTak commented Apr 26, 2017

HinTak commented Apr 26, 2017 • edited Loading

jiong3 commented Apr 27, 2017

HinTak commented Apr 27, 2017

HinTak commented Apr 29, 2017

HinTak commented Apr 29, 2017 • edited Loading

HinTak commented Apr 30, 2017

rougier commented Feb 7, 2017 •

edited

Loading

HinTak commented Apr 18, 2017 •

edited

Loading

HinTak commented Apr 20, 2017 •

edited

Loading

HinTak commented Apr 23, 2017 •

edited

Loading

HinTak commented Apr 26, 2017 •

edited

Loading

HinTak commented Apr 29, 2017 •

edited

Loading