Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alt text generation using AI #350

Open
Illyism opened this issue Feb 8, 2024 · 7 comments
Open

Alt text generation using AI #350

Illyism opened this issue Feb 8, 2024 · 7 comments

Comments

@Illyism
Copy link

Illyism commented Feb 8, 2024

No description provided.

@erikyo
Copy link

erikyo commented Feb 8, 2024

I'm curious. in what way? is a good idea, but how would you implement it?

@erikyo
Copy link

erikyo commented Feb 10, 2024

I tried mobileNet and I must say that although the result is impressive for what little it weighs, unfortunately it is far from what is necessary to achieve.

Maybe it also need to implement a kind of training with the images already in the WordPress library, but it assumes they are there and are (well) tagged.

@Illyism
Copy link
Author

Illyism commented Feb 10, 2024

Fair - I think it's fairly difficult without an external or bigger API

@erikyo
Copy link

erikyo commented Feb 10, 2024

if I have to tell you the truth I was amazed how mobileNet with a few 2-3MB can do this:
https://storage.googleapis.com/tfjs-examples/mobilenet/dist/index.html

maybe with a custom model and some "homemade" training it's not so impossible

@swissspidy
Copy link
Owner

Automatic alt text generation has been on my to-do list for a while, it's even in the readme:

https://github.com/swissspidy/media-experiments/tree/29c6d473d149a5cb08c379ac94cb779a96ce13e9#alt-text-generation

I think it would be really cool to have a simple on-device implementation for such a feature, even just to demonstrate that it's possible.

As for custom models, the WP photo directory or Openverse would make for excellent sources for training data.
As @erikyo mentioned, sites could also train models on their own media library for example, which would be really cool also from a privacy perspective as the model would never leave their site.

If someone wants something more powerful, they can always use an external service. The same goes for the video captioning and the like.

@swissspidy swissspidy added the enhancement New feature or request label Feb 13, 2024
@swissspidy swissspidy changed the title Alt text ai generator Alt text generation using AI Feb 13, 2024
@erikyo
Copy link

erikyo commented Feb 13, 2024

I've done some research in this direction, and generating "interesting" captions for photos through a homebrew method is not impossible, but it can be highly resource-consuming and may result in lower quality compared to other models. This is especially true if the desired outcome is a diverse set of descriptions for images in a media library (why we care for seo mainly).

I came across an interesting approach that I'd like to mention. To address the challenge of generating varied image descriptions, you can create an API using the blip2-opt-2.7b model. I successfully implemented this by following a guide found here, and I made some additional modifications to meet specific requirements, such as the ability to add a prompt. The result of this implementation can be accessed at the following link:

blip-api by erikyo

In the first input you have to put the url of an image and then press submit, it takes about 60 seconds and then returns the description for it, the result seems to me generally very good but, as @swissspidy was pointing out, it is a method that requires something external (because the model weight 15gb and you need a lot of computing power to run it)

@swissspidy
Copy link
Owner

https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/ looks veeery promising

image

@swissspidy swissspidy added feature p3 and removed enhancement New feature or request labels Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants