Skip to content

Paint cursors update#2300

Merged
ghaerr merged 2 commits intoghaerr:masterfrom
Vutshi:master
Apr 16, 2025
Merged

Paint cursors update#2300
ghaerr merged 2 commits intoghaerr:masterfrom
Vutshi:master

Conversation

@Vutshi
Copy link
Copy Markdown
Contributor

@Vutshi Vutshi commented Apr 15, 2025

  • Introduced new cursor designs, with support for both XOR and non-XOR rendering modes.
  • Reduced the thickness of the active mode button outline for a cleaner appearance.

Screenshot 2025-04-15 at 23 18 07 Screenshot 2025-04-15 at 23 17 55

XOR:
Screenshot 2025-04-15 at 23 19 11 Screenshot 2025-04-15 at 23 19 19

Denis V and others added 2 commits April 15, 2025 23:31
@Vutshi
Copy link
Copy Markdown
Contributor Author

Vutshi commented Apr 15, 2025

@ghaerr, we switch XOR on and off on every call of hidecursor and showcursor. Can't we just leave it on all the time except when doing brush painting and finalising a shape?

@ghaerr ghaerr merged commit 545f3f5 into ghaerr:master Apr 16, 2025
@ghaerr
Copy link
Copy Markdown
Owner

ghaerr commented Apr 16, 2025

Thanks for the cursors @Vutshi!

I can see we might need to start a cursor collection though? IMO, the XOR cursors look pretty slick, but for outlined (normal) cursors, the small cursor seems pretty small. The large cursor looks good, but it's got a pretty short tail. Did you design that one in order to be faster than the original 16x16 cursor for display on slow 8088 systems?

we switch XOR on and off on every call of hidecursor and showcursor. Can't we just leave it on all the time except when doing brush painting and finalising a shape?

Actually, I don't recommend that. The reason being that we're then making an assumption before every draw routine that XOR would be ON, which could get problematic. The nice thing about the current design is that lower level routines don't make any assumption about the OP_SET/OP_XOR state, which speeds them up. Microwindows, for example, has to set the OP mode before every routine, which effectively slows things down, although the reason for that is the drivers have to work with any application-level graphics context.

The other reason is that changing OP mode when moving the cursor is very fast (only 4 machine instructions!), compared with actually drawing the pixels, e.g. it's ~50x slower drawing the pixels since XOR only has to change once per mouse event, versus once per pixel.

except when doing brush painting and finalising a shape?

Other reasons are, for instance, calls to draw_bmp would have to be modified, even for the latest "load bmp file" request, since it would then need to know that the upper level had decided that XOR was standard. If XOR was the standard drawing mode for the low-level drawing routines themselves, it would make more sense IMO.

@ghaerr
Copy link
Copy Markdown
Owner

ghaerr commented Apr 16, 2025

On the subject of 8088 optimization and elapsed time for say, XOR mode on/off versus drawing a pixel, I use the following table of oft-used or very slow instructions to give an idea of where time is being taken that really matters:

8088 optimization:
        **use reg,reg if possible, not call/ret or push/pop

        mov reg,reg     2
        mov reg,imm     4
        mov reg,mem     13
        and reg,reg     3
        and reg,imm     4
        add reg,reg     3
        add reg,imm     4
        inc reg         3
        shl reg,1       3
        shl reg,cl      8+4*n
        push reg        15
        pop reg         12
        in/out dx,ax    12
        call            23
        callf           36
        ret             20
        iret            44
        mul reg8        70-77
        mul reg16       118-133
        div reg8        80-90
        div reg16       144-162

        call/ret        43
        push/pop bp,sp  29
        total           72+

        mov dx,imm      4
        mul dx          118

        6 shl ax,1      18
        2 mov/add r,r   5
            23 vs 122 = 5x faster

Note that the 4-instruction set_op macro (see vgalib.h) uses ~(4+2+2+12)=20 cycles total, where as a PUSH/POP pair is 27, and a function call with standard overhead is 72+ cycles before doing anything.

That means that just calling the drawpixel routine uses 72+ cycles, even though the drawpixel routine itself is written in ASM. I haven't yet bothered to count the drawpixel ASM cycles, but its lots lots more than 20 cycles, and gets called ~30 times for the XOR small cursor. So we're probably talking 4500+ cycles just for the drawpixels, versus 40 to turn XOR on then off again.

The show cursor routine itself is ~320 instructions long, even with a (very low) average of 4 cycles/instruction, =1280 cycles, plus 4500+ cycles.

But drawing pixel by pixel is fairly fast considering that most of the bits in the MWIMAGEBITS mask aren't drawn.

I thought that a small cursor line is just one byte per plane—maybe two if the cursor position isn’t 8-pixel aligned—so what could be faster than writing a byte directly to memory?

This is potentially a very good idea, although it'd only work for XOR cursors on EGA/VGA (that is, not portable!). The tricky part will be perfecting rotating the cursor bits to match the X byte alignment of the cursor and display, and then doing that for each Y line quickly. But looking at drawpixel ASM, it spends a third of the routine just figuring out the memory address of the X,Y pixel before doing anything. So it seems that a fast "mini-blit" where an aligned monochrome bitmask (in other words, an XOR cursor) could be transferred quickly to memory in a single function call, would really speed things up.

Another idea, as I'm looking at the source, would be to have a drawpixel routine that takes a memory address, rather than X, Y location, would allow the show/hidecursor routines to calculate an address and mask once, then very quickly use a macro to draw the pixel, or a group of pixels within the single address being passed.

Overall, a lot more could be done for speed, especially by using a few well-thought-out macros. I'll think more about it.

@Vutshi
Copy link
Copy Markdown
Contributor Author

Vutshi commented Apr 16, 2025

Thanks for the very useful table. I am surprised that mul and div are almost equally slow on 8088. Are they both realized on a software level in microcode?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants