Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document image format expectations for Bumblebee.Vision.ImageClassification #103

Closed
kipcole9 opened this issue Dec 12, 2022 · 3 comments · Fixed by #189
Closed

Document image format expectations for Bumblebee.Vision.ImageClassification #103

kipcole9 opened this issue Dec 12, 2022 · 3 comments · Fixed by #189
Labels
kind:documentation Improvements or additions to documentation

Comments

@kipcole9
Copy link

I would like to contribute some documentation that clarifies the expected image format to Bumblebee.Vision.image_classification. The type t:Bumblebee.Vision.image says:

@type image() :: Nx.Container.t()
A term representing an image.
Either Nx.Tensor in HWC order or a struct implementing Nx.Container and
resolving to such tensor.

However it does not clarify:

  • If the image should be resized first to the same size as that used to train the model (224 x 224 for the resnet models?)
  • Whether the image data should be {:u, 8} or some other type (some models suggest data should be in the range [0.0..1.0]
  • Whether the image can have an alpha layer (reading the code suggests yes, but perhaps that is model dependent)
  • Whether the image should be preprocessed? This stack overflow article suggests they should be?

If I can get some guidance I'll write a doc PR.

@jonatanklosko
Copy link
Member

Hey Kip! The image doesn't need to be particularly normalized, because it first goes through a featurizer. In other words, it's not the direct model input, but plain image as pixels. In fact, the type is Nx.Container.t(), because it may also be a struct that implements Nx.Container, which we already do for StbImage (ref).

A featurizer usually casts to float, resizes, scales into [0.0, 1.0]. Whether an alpha layer is used is usually up to the model configuration. So I think the only generally applicable expectation is that the image values are 0..255.

A PR improving the docs is welcome!

@kipcole9
Copy link
Author

kipcole9 commented Dec 13, 2022

@jonatanklosko, TIL what a featurizer is! I suppose the assumption is also the the channel order is RGB (not BGR). I'll work on a doc PR this weekend. For validation then, the input image has the following assumed characteristics:

  • HWC order
  • RGB color (channel order, not CMYK or some other color space)
  • Alpha channel support is model specific
  • {:u, 8} or {:f, 32} or {:f, 64} data type

Thanks for the continuing education and the great library.

@jonatanklosko
Copy link
Member

The type is not as strict, pretty much any :u or :s type would do. Other than that sounds good!

@jonatanklosko jonatanklosko added the kind:documentation Improvements or additions to documentation label Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants