Skip to content

Paint save/read BMP improvement#2323

Merged
ghaerr merged 1 commit intoghaerr:masterfrom
Vutshi:fixsavebmp
Apr 25, 2025
Merged

Paint save/read BMP improvement#2323
ghaerr merged 1 commit intoghaerr:masterfrom
Vutshi:fixsavebmp

Conversation

@Vutshi
Copy link
Copy Markdown
Contributor

@Vutshi Vutshi commented Apr 25, 2025

  • Fix imagesize and filesize in BMP header
  • Implement 4bpp BMP file support for drawing and saving

Implement 4bpp BMP file support for drawing and saving
@ghaerr
Copy link
Copy Markdown
Owner

ghaerr commented Apr 25, 2025

This is great @Vutshi, thanks! Are you now seeing half the size for export.bmp using 4BPP, in the ~115k range or so?
Nice improvement.

BTW, there are a couple cases where your use of long casts generates more code than is needed. Would you like me to point them out so you can take a look and possibly remove them for smaller code size?

Thank you!

@ghaerr ghaerr merged commit 8b6f96b into ghaerr:master Apr 25, 2025
@Vutshi Vutshi deleted the fixsavebmp branch April 26, 2025 05:18
@Vutshi
Copy link
Copy Markdown
Contributor Author

Vutshi commented Apr 26, 2025

Are you now seeing half the size for export.bmp using 4BPP, in the ~115k range or so?

yes, i did it because 8BPP didn’t fit into my fdd1440. RLE is next.

BTW, there are a couple cases where your use of long casts generates more code than is needed. Would you like me to point them out so you can take a look and possibly remove them for smaller code size?

yes, please. I’m doing things mostly blindly, iterating until compiler is happy and it works :)

@ghaerr
Copy link
Copy Markdown
Owner

ghaerr commented Apr 27, 2025

Hello @Vutshi,

a couple cases where your use of long casts generates more code than is needed.

Well, after looking more closely, it seems both the IA16 and OWC compilers optimized out my concerns. However, the C86 compiler doesn't, for what that's worth.

The couple of issues I noticed were:

hdrsize = (long)sizeof(bmpf) + (long)sizeof(bmpi) + (long)sizecolortable;

In this case, there's no need to explicitly cast the three operands, since there's no chance of an overflow using 16-bit addition here to later store into long hdrsize. While both IA16 and OWC recognized this and didn't generate long add code, C86 won't and lots of extra code is generated. There isn't an issue with loading a 16 bit result into a long (hdrsize). In this case, the IA16 compiler was smart enough that it realized that hdrsize itself doesn't need to be declared long, and doesn't allocate space for it. It doesn't need to be long, just because bmpf.fBoffBits is DWORD. (The previous draw_bmp routine declared hdrsize DWORD (unsigned long) because it executes an fread into it for parsing the BMP file).

The other case was:

imagesize = (long)padded_row_bytes * (long)cy;

Here, there is the possibility of 16-bit multiply overflow, so one of the operands does need to be cast (long). In C, when one operand is a higher bit-width than the other, the remaining operand is automatically up-converted to the higher bit-width. There are sometimes cases where the compiler can generate better code multiplying 16x32 instead of 32x32 if the second (long) were removed, but this isn't one of them.

padded_row_bytes = ((bpp * cx + 31)/32) * 4; /* row must be 4 bytes aligned */

Both IA16 and OWC generate right shifts for the /32 and left shifts for * 4 here, which is fast. I haven't checked yet what C86 does if/when optimization constant DIVs and MULs. (I can't yet easily checkout C86 output since I have not yet completed our disassembler for the C86 assembler AS86's .o output object format yet - coming). So sometimes it is better to use left/right shifts instead of multiply/divide. (This gets complicated, as we found recently that IA16 was converting two left shifts back to MUL * 80 with -Os specified, which is why we changed here to -O2). If a right shift and left shift were coded here, it might be possible to change that combo to a single right shift, with an appropriate comment; although it is possible the IA16 and OWC (but not C86) compilers are already optimizing that. Phew! :)

Anyways, your code looks great. In general, my recommendation is to use casts only when needed, as explicit casts can cause problems later or with other compilers. The compilers can figure out what to do with multiple operand sizes and/or assignments, but don't always detect overflow. In general, it looks like our IA16 and OWC compilers are pretty smart, while C86 not as much. I just happen to be compiling Paint with C86 to keep any eye on C86 code generation, so am being a bit more watchful.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants