-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alt text generation using AI #350
Comments
I'm curious. in what way? is a good idea, but how would you implement it? |
I tried mobileNet and I must say that although the result is impressive for what little it weighs, unfortunately it is far from what is necessary to achieve. Maybe it also need to implement a kind of training with the images already in the WordPress library, but it assumes they are there and are (well) tagged. |
Fair - I think it's fairly difficult without an external or bigger API |
if I have to tell you the truth I was amazed how mobileNet with a few 2-3MB can do this: maybe with a custom model and some "homemade" training it's not so impossible |
Automatic alt text generation has been on my to-do list for a while, it's even in the readme: I think it would be really cool to have a simple on-device implementation for such a feature, even just to demonstrate that it's possible. As for custom models, the WP photo directory or Openverse would make for excellent sources for training data. If someone wants something more powerful, they can always use an external service. The same goes for the video captioning and the like. |
I've done some research in this direction, and generating "interesting" captions for photos through a homebrew method is not impossible, but it can be highly resource-consuming and may result in lower quality compared to other models. This is especially true if the desired outcome is a diverse set of descriptions for images in a media library (why we care for seo mainly). I came across an interesting approach that I'd like to mention. To address the challenge of generating varied image descriptions, you can create an API using the blip2-opt-2.7b model. I successfully implemented this by following a guide found here, and I made some additional modifications to meet specific requirements, such as the ability to add a prompt. The result of this implementation can be accessed at the following link: In the first input you have to put the |
https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/ looks veeery promising |
No description provided.
The text was updated successfully, but these errors were encountered: