Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image Captioning Example #15

Closed
shiffman opened this issue Nov 3, 2017 · 9 comments
Closed

Image Captioning Example #15

shiffman opened this issue Nov 3, 2017 · 9 comments
Assignees

Comments

@shiffman
Copy link
Member

shiffman commented Nov 3, 2017

There is interest in my A2Z class re: image captioning. I thought I would try to create a simple example modeled after the deeplearn.js imagenet that provides captions for a p5.Image as well as real-time captions of a recorded or live video. @cvalenzuela, just wanted to check whether I'd be duplicating anything you have already done? Will leave this thread open to discuss / plan.

@shiffman shiffman changed the title ImageNet Example Image Captioning Example Nov 3, 2017
@cvalenzuela
Copy link
Member

Nice! I haven't started anything with images yet.
Maybe porting this https://github.com/tensorflow/models/tree/master/research/im2txt will be interesting too!

@cvalenzuela
Copy link
Member

cvalenzuela commented Nov 9, 2017

This is now implemented using SqueezeNet

Here is a simple example
And here is another one with a webcam

Should we use different models?

@shiffman
Copy link
Member Author

shiffman commented Nov 9, 2017

Yes, definitely! It also might be nice like with the LSTM examples to offer a README on how to train your own model. . . although ultimately if students want to train models for image classification, some transfer learning process will likely be more workable in terms of creative coding scenarios.

shiffman added a commit that referenced this issue Nov 11, 2017
@shiffman
Copy link
Member Author

shiffman commented Nov 11, 2017

I've been working on the ImageNet/SqueezeNet examples. I've changed the following:

  • predict() returns an array of classes sorted by probability
  • predict() takes an optional 3rd argument for number of classes to return

The main thing I'd like to work through now is how to make this simpler to use with p5.Element and/or p5.Image. As of now example code requires createImg() and setting width and height attributes. @cvalenzuela does it only work with square, odd number resolutions? I think it would be nice for the library to internally handle any resizing of the image as well as accept a variety of image inputs direct from p5.

  • predict() should accept p5.Image
  • predict() should accept p5.Element that is an <img>.
  • predict() should internally do any image resizing and let the user keep the image sized as it was for displaying.

Also, I think the setTimeout() if SqueezeNet isn't ready might be problematic. Probably best to return "not ready yet" message or something?

  • rethink what predict() does when model not ready.

Also, the webcam example is broken, I will fix.

  • fix webcam example.

@cvalenzuela feel free to fix up anything in 85c7311 if it's not conforming to the style you adopted.

Also, what's a better word than "label" for a class prediction? I was afraid to use "class" given it's a JS reserved word?

Thoughts?

@cvalenzuela
Copy link
Member

  • I haven't tried inputting images of different sizes. My guess is that should work too.
    Are you thinking of handling the resize/scale of images with p5 or plain js?

  • Yes, the setTimeout() is an issue. 85c7311 looks good!

  • Maybe category? That's what Google API calls it

Let me know if you need help with any of this

@shiffman
Copy link
Member Author

shiffman commented Nov 12, 2017

  • category is great. I can update that.
  • I wasn't able to get it to work with different image resolutions but I can investigate more, I didn't try for long and might have made silly mistakes.
  • re: p5 or plain js, I don't know! Ideally I think the predict function could take any of the following:
    • p5.Image
    • p5.Element (either video or image or canvas)
    • p5.Renderer (i.e result of createGraphics() )
    • native JS DOM element <img> or <video> or <canvas>
    • array of pixels?

@shiffman
Copy link
Member Author

Adding:

  • constructor should have 2nd optional callback for when model is ready. (If not using preload()).

@shiffman shiffman self-assigned this Nov 14, 2017
@shiffman
Copy link
Member Author

Noting here that passing in the optional third argument for number of categories is broken at the moment.

@cvalenzuela
Copy link
Member

This seems resolved. #92 and #27 are following updates on this in separate threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants