Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bmp format #22

Merged
merged 4 commits into from Mar 10, 2019

Conversation

Projects
None yet
3 participants
@rymdhund
Copy link
Contributor

commented Mar 9, 2019

Here is some code for parsing bmp files. I would love to get some feedback if it looks wierd in some way. It mostly uses the result type instead of exceptions behind the scenes.

It should work for most modern versions of bmp >= BitmapInfoHeader (Windows NT, 3.1x or later) but it does not yet support negative heights (top to bottom).

Add a bmp parser
Does not yet support negative heights or versions < BitmapInfoHeader
@rlepigre

This comment has been minimized.

Copy link
Owner

commented Mar 9, 2019

Cool, thanks for the contribution! Looks great so far.

I don't mind the use of result types in the internals as you do, but I like to have a high-level interface of the full library which uses exceptions. Although it would actually be possible to have two different styles of interfaces that people could choose from.

A couple of questions:

  1. Are you planning on adding the missing features? Is the negative height support the only thing missing for the version you target?
  2. What about previous versions of the format? Would it be difficult to support them? And in any case, would it be useful?
  3. How much testing have you done?

I'd be happy to merge your PR at some point if you can convince me that the version you have working is sufficient support for the format. What I mean by that is: I don't want users of the library to lose the possibility of reading (reasonable version if some are obsolete) BMP files that were previously handled fine by our conversion mechanism (based on Imagemagick).

Thanks again! And I hope we can merge soon!

@rymdhund

This comment has been minimized.

Copy link
Contributor Author

commented Mar 9, 2019

Thaks!

I think all features that are not supported atm is:

None of the standard features should be hard to support but I haven't been able to find any examples of images using them. I think most bmps these days are using the BITMAPINFOHEADER or above, but I don't really know how to verify that claim :). I didn't really consider potential regressions compared to imagemagick so maybe it would be nice to fix the remaining issues before merging. I will see what I can do!

I've mostly tested it against a set of images I've found. And I put in some protection against out of memory attacks so that we have some limit on how big images we allocate, but I don't really know how big images to support? It would for sure be nice with a proper test suite but ¯_(ツ)_/¯

@cfcs
Copy link
Collaborator

left a comment

I think this looks good!

Show resolved Hide resolved src/imageBMP.ml Outdated
Show resolved Hide resolved src/imageBMP.ml Outdated
Show resolved Hide resolved src/imageUtil.ml
*)
let int_of_str2_le s =
(int_of_char s.[0]) lor ((int_of_char s.[1]) lsl 8)

(*
* Converts a string of size 4 into an integer WITHOUT taking care of

This comment has been minimized.

Copy link
@cfcs

cfcs Mar 9, 2019

Collaborator

@rlepigre should we add a comment here detailing that it's a big-endian decoder?
Also, how is this intended to work on 32bit platforms?

This comment has been minimized.

Copy link
@rlepigre

rlepigre Mar 9, 2019

Owner

@cfcs: Well, this part will probably need to use int32 at some point, but this is going to be annoying. I wish OCaml did not need to have 31 (or 63) bits integers...

@rlepigre

This comment has been minimized.

Copy link
Owner

commented Mar 9, 2019

None of the standard features should be hard to support but I haven't been able to find any examples of images using them. I think most bmps these days are using the BITMAPINFOHEADER or above, but I don't really know how to verify that claim :). I didn't really consider potential regressions compared to imagemagick so maybe it would be nice to fix the remaining issues before merging. I will see what I can do!

In a way, regressions are a good way to force us to fix and extend the library. But we should still try to prevent them as much as possible, as long as this only requires a reasonable effort. What would be acceptable I guess is to trigger a specific error if some required feature is missing, so that we can fallback on conversion with imagemagick. We could do that, for example, for the (seemingly) obsolete versions of the format.

What do you guys think about that option?

My impression is that this could potentially ease the integration of new implementations of formats, while preserving a fully functioning library (at least when imagemagick can be used). Alternatively, we could use flags to control the default for some formats (imagemagick or native parser).

@cfcs

This comment has been minimized.

Copy link
Collaborator

commented Mar 9, 2019

@rlepigre If what you're suggesting is to have more detailed exceptions, I'm not very opposed to that, but I think that the cleaner approach would be to split the imagemagick stuff out into a separate module that people can use if they don't want the pure implementation.

imagemagick is currently preventing MirageOS builds due to the load-time OS dependencies that Unix.create_process introduced when we got rid of Sys.command (Sys.command of course also fails on Mirage if you call it, but it doesn't crash the program at load-time).

Moving it to a ImageLib_unix module will be a breaking change, but after that we will not really need to touch the imagemagick code much, and we will have less risk of breaking that while merging features in the pure code.

I just discovered now that I forgot to push the pull request, but I have this branch from November:
https://github.com/cfcs/ocaml-imagelib/tree/mirage_compat

I plan to rebase it on master once this BMP code is merged so @rymdhund doesn't have to deal with it, and will open a PR then.

@rlepigre

This comment has been minimized.

Copy link
Owner

commented Mar 9, 2019

@cfcs OK cool, I'll have a look to what you did when you send the PR. This is probably not the best place to discuss all that, but whatever. What I have in mind for the library, that would probably suit everyone, is the following:

  • Have a library imagelib.common defining the internal representation of images, the high-level abstractions for inputs and outputs (platform independent), and some useful utilities (checksums, byte-manipulation functions, etc.).
  • Provide core libraries for every image format that we (partially) implement, that does not depend on Unix or anything like that at all. So we would have sub-libraries imagelib.png, imagelib.ppm, imagelib.pbm, etc. Of course, all depending on imagelib.common. Of course, each of these libraries should provide at least the bare minimum so that they can be handled uniformly in the higher-level layers (this is what we do with the list of extensions and the related stuff), but they should probably provide format-specific features. For instance, the gif format would need support for animations.
  • Have a library imagelib.pure_ocaml that provides a high-level interface for the library, with functions for reading / printing images based on the extension, but that only relies on what is implemented in the library (no Unix, no imagemagick).
  • Have a library imagelib.with_fallback that does the same thing, with the same interface, but relies on Imagemagick and everything so that the library is 100% reliable, even when we have missing features.
  • Optionally, the two high-level libraries should provide two kinds of interfaces: with exceptions / with result types.

We will not avoid incompatibility issues with previous versions of imagelib, but whatever. Hopefully that will be the last time that we need to do such a big change.

What do you think?

(@rymdhund of course, all that is completely independent from the PR, but feel free to give comment, or throw us out of here!)

@rymdhund

This comment has been minimized.

Copy link
Contributor Author

commented Mar 9, 2019

In a way, regressions are a good way to force us to fix and extend the library. But we should still try to prevent them as much as possible, as long as this only requires a reasonable effort. What would be acceptable I guess is to trigger a specific error if some required feature is missing, so that we can fallback on conversion with imagemagick. We could do that, for example, for the (seemingly) obsolete versions of the format.

What do you guys think about that option?

I think that could be an idea. I think it makes sense to have different error cases for corrupt images and unimplemented features. We already have a Not_yet_implemented exception, maybe we could use that.

@cfcs

This comment has been minimized.

Copy link
Collaborator

commented Mar 9, 2019

@rlepigre That sounds good to me too!

@rymdhund Yes, please use Not_yet_implemented for unimplemented features!

@rlepigre We currently have in the library:

exception Corrupted_image of string
exception Invalid_argument of string
exception Not_yet_implemented
exception PNG_Zlib_error of string

I think the PNG_Zlib_error should probably be replaced by Corrupted_image.
The strategy of catch Not_yet_implemented and retrying with imagemagick when using the imagemagick-friendly library sounds fine.

@cfcs

This comment has been minimized.

Copy link
Collaborator

commented Mar 9, 2019

@rlepigre should we merge this as is now, and just wait with next release until we have the retrying in place?

@rlepigre

This comment has been minimized.

Copy link
Owner

commented Mar 9, 2019

@cfcs OK let's do that. We will clean up everything, including the exceptions before the next release. I thing that there should really just be exactly one exported exception in the high level interfaces (maybe just called Error?), and similarly exactly one exported exception in the specific image format libraries. I'll make a proposal for interfaces when I have a moment, and we can discuss that at this point.

@rymdhund before I merge, can you just edit the CONTRIBUTIONS file? I like to keep track of contributions.

@rymdhund

This comment has been minimized.

Copy link
Contributor Author

commented Mar 10, 2019

@rlepigre I updated the docs!

@rlepigre

This comment has been minimized.

Copy link
Owner

commented Mar 10, 2019

@rymdhund Great, let's merge then. Thank you!

@rlepigre rlepigre merged commit 82ec85e into rlepigre:master Mar 10, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@cfcs

This comment has been minimized.

Copy link
Collaborator

commented Mar 10, 2019

@rlepigre Cool, I'll have a look at the unix split tomorrow, was busy hacking on GIF87a support today. :)

@rlepigre

This comment has been minimized.

Copy link
Owner

commented Mar 11, 2019

@cfcs I started to work on some refactoring as well (https://github.com/rlepigre/ocaml-imagelib/tree/new_inteface), following what was said earlier. I took the opportunity to write some documentation, and also simplified a couple of things. I'll open a PR shortly so that we can discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.