Skip to content

feat: modify 1-bit data encoding#21

Merged
EscapedGibbon merged 8 commits intomainfrom
20-modify-1-bit-encoder
Mar 25, 2025
Merged

feat: modify 1-bit data encoding#21
EscapedGibbon merged 8 commits intomainfrom
20-modify-1-bit-encoder

Conversation

@EscapedGibbon
Copy link
Contributor

close: #20

@EscapedGibbon EscapedGibbon linked an issue Mar 20, 2025 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Mar 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.88%. Comparing base (0607b6b) to head (3495ad2).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #21      +/-   ##
==========================================
- Coverage   99.13%   98.88%   -0.25%     
==========================================
  Files           5        5              
  Lines         116       90      -26     
  Branches       17       12       -5     
==========================================
- Hits          115       89      -26     
  Misses          1        1              
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@EscapedGibbon EscapedGibbon marked this pull request as draft March 20, 2025 09:43
@EscapedGibbon EscapedGibbon force-pushed the 20-modify-1-bit-encoder branch from 8aacca0 to 1dc91eb Compare March 20, 2025 10:00
@EscapedGibbon
Copy link
Contributor Author

There were some changes in the way the encoding was tested, because buffer from fs.readFileSync() was not giving the same result, when it came to partially filled data bytes. In my implementation the non-used bits were zeroed, while readFileSync version had random values. After discussion with @stropitek we arrived to the conclusion that those bits probably kept their previous value in memory. I haven't been able to find any citations about it, but it seems that there isn't any particular standard on how these bits must be encoded, since they have no effect on the encoded data. I did the bmp encoding in python, and its pixel data encoding was the same as mine(all 0s). Therefore, now the testHeaderDataEncode checks the data before pixel encoding, and pixel data is checked through snapshots.

@EscapedGibbon
Copy link
Contributor Author

I also looked at the images after my encoding and they seem to be the same. Since images are small and only have 2 colors, the differences would be noticeable visually if encoding did not work

@EscapedGibbon EscapedGibbon marked this pull request as ready for review March 20, 2025 10:08
@stropitek
Copy link
Contributor

If you regenerate an image with python, do you get exactly the same encoded bmp (header + pixels) as with this library?

In that case I would prefer that you regenerate all the images with python and not do snapshots.

@EscapedGibbon
Copy link
Contributor Author

If you regenerate an image with python, do you get exactly the same encoded bmp (header + pixels) as with this library?

In that case I would prefer that you regenerate all the images with python and not do snapshots.

I don't get the same images with python, it has smaller header size. I think it might be because the python library that i used, uses v3 bitmap headers, when we use v5. I will look a bit more if there is a liabrary that supports v5 encoding

@targos
Copy link
Member

targos commented Mar 21, 2025

readFileSync version had random values

This seems highly unlikely. I would be surprised if file systems store data in blocks that are not multiples of 8bits

@stropitek
Copy link
Contributor

This seems highly unlikely. I would be surprised if file systems store data in blocks that are not multiples of 8bits

I think what he means is that when those bmp were generated (unfortunately I don't remember what program I used to generate them), they were initialized in memory with random bytes (not initialized to 0), so the bytes that don't encode anything were random when they were written to the disk.

@EscapedGibbon
Copy link
Contributor Author

This seems highly unlikely. I would be surprised if file systems store data in blocks that are not multiples of 8bits

I think what he means is that when those bmp were generated (unfortunately I don't remember what program I used to generate them), they were initialized in memory with random bytes (not initialized to 0), so the bytes that don't encode anything were random when they were written to the disk.

Yes, exactly

@EscapedGibbon
Copy link
Contributor Author

EscapedGibbon commented Mar 21, 2025

Gimp does export 1 bit images with v5 headers and its pixel data aligns with our implementation. The header's data, however is slightly different. Judging by wikipedia specs and indexes of the bytes that were different I suspect that Gimp version also adds "horizonal and vertical pixel per metre" data.

CleanShot 2025-03-21 at 12 01 16

@stropitek
Copy link
Contributor

Those resolution values seem useful, if we can we should decode and encode them.

@targos
Copy link
Member

targos commented Mar 21, 2025

I'd open an issue about this and specifically override these values before running the test assertion so this PR can be merged.

@EscapedGibbon
Copy link
Contributor Author

EscapedGibbon commented Mar 21, 2025

After discussion with @stropitek I regenerated images with GIMP. I also added the data on pixelsPerMeter and modified some header data that was not coherent with GIMP buffer. For now, changed values are not really important for 1 bit encoding/decoding so they were changed for alignment with GIMP version. Tests now pass by simply matching buffer from fs.writeFileSync with the buffer from our implementation. So snapshots are completely removed.

@EscapedGibbon EscapedGibbon force-pushed the 20-modify-1-bit-encoder branch from d72a763 to a9d50c9 Compare March 21, 2025 16:07
Copy link
Contributor

@stropitek stropitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show us the commit message for this PR.

I would use feat: with the new resolution data which is encoded / decoded.

With the list of breaking changes regarding the codec format and the changes in the encoded data.

@EscapedGibbon
Copy link
Contributor Author

EscapedGibbon commented Mar 21, 2025

Show us the commit message for this PR.

I would use feat: with the new resolution data which is encoded / decoded.

With the list of breaking changes regarding the codec format and the changes in the encoded data.

Something like this. I am not sure if last two are a breaking change though. Added them just in case.

feat!: modify 1-bit image encoder

BREAKING CHANGES:

  • In BMPEncoder's data each pixel is now stored as 1 byte per pixel instead of 1 bit per pixel.
  • Optional fields xPixelsPerMeter and yPixelsPerMeter have been added to ImageCodec. Their default value is 2835.
  • Decode() function now also returns xPixelsPerMeter and yPixelsPerMeter in the ImageCodec.
  • Class BMPEncoder doesn't inherit IOBuffer anymore.
  • Class BMPEncoder now has data field to store Uint8Array data of the image.

@stropitek
Copy link
Contributor

stropitek commented Mar 24, 2025

You should write the commit message with the changelog it will generate in mind (this is what the users of the library will read if they upgrade).

What is in the main feat: message will be in the changelog under the "Features" category.

What is in breaking-changes: will be in the footer of the changelog.

My suggestion was to put everything related to the new encoded / decoded property in the feat main message, and also document the breaking changes related to the change of data structure in the breaking changes.

  • In BMPEncoder's data each pixel is now stored as 1 byte per pixel instead of 1 bit per pixel.

Isn't it more relevant to refer to the encoder's input and decoder's output? It's not obvious to me that the encoder's data is its input and not its output.

  • Optional fields xPixelsPerMeter and yPixelsPerMeter have been added to ImageCodec. Their default value is 2835

As @targos mentioned, it's a good idea to say it corresponds to 72 dpi which is a legacy resolution. Otherwise it just seems to be a random number. Also this is part of the added feature and is not a breaking change. Should be in the main commit message.

  • Class BMPEncoder now has data field to store Uint8Array data of the image.

Not a breaking change. But ok to mention it there if it is worth having it in the changelog.

@targos
Copy link
Member

targos commented Mar 24, 2025

Class BMPEncoder now has data field to store Uint8Array data of the image.

Not a breaking change. But ok to mention it there if it is worth having it in the changelog.

The BMPEncoder class is not exposed publicly so this shouldn't be mentioned in the changelog.

@EscapedGibbon
Copy link
Contributor Author

EscapedGibbon commented Mar 24, 2025

feat: encode and decode resolution fields xPixelsPerMeter and yPixelsPerMeter

Encode / decode the image's horizontal and vertical number of pixels per meter.

BREAKING CHANGE: In decoder's output data and encoder's input data, each pixel is now stored as 1 byte per pixel instead of 1 bit per pixel.

Copy link
Contributor

@stropitek stropitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can merge when the last comment is resolved.

@EscapedGibbon EscapedGibbon merged commit f239678 into main Mar 25, 2025
10 checks passed
@EscapedGibbon EscapedGibbon deleted the 20-modify-1-bit-encoder branch March 25, 2025 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Modify 1 bit encoder

3 participants

Comments