Alt text generation using AI #350

Illyism · 2024-02-08T19:08:21Z

No description provided.

erikyo · 2024-02-08T20:48:11Z

I'm curious. in what way? is a good idea, but how would you implement it?

erikyo · 2024-02-10T12:29:33Z

I tried mobileNet and I must say that although the result is impressive for what little it weighs, unfortunately it is far from what is necessary to achieve.

Maybe it also need to implement a kind of training with the images already in the WordPress library, but it assumes they are there and are (well) tagged.

Illyism · 2024-02-10T14:23:09Z

Fair - I think it's fairly difficult without an external or bigger API

erikyo · 2024-02-10T15:14:02Z

if I have to tell you the truth I was amazed how mobileNet with a few 2-3MB can do this:
https://storage.googleapis.com/tfjs-examples/mobilenet/dist/index.html

maybe with a custom model and some "homemade" training it's not so impossible

swissspidy · 2024-02-10T15:37:55Z

Automatic alt text generation has been on my to-do list for a while, it's even in the readme:

https://github.com/swissspidy/media-experiments/tree/29c6d473d149a5cb08c379ac94cb779a96ce13e9#alt-text-generation

I think it would be really cool to have a simple on-device implementation for such a feature, even just to demonstrate that it's possible.

As for custom models, the WP photo directory or Openverse would make for excellent sources for training data.
As @erikyo mentioned, sites could also train models on their own media library for example, which would be really cool also from a privacy perspective as the model would never leave their site.

If someone wants something more powerful, they can always use an external service. The same goes for the video captioning and the like.

erikyo · 2024-02-13T13:59:40Z

I've done some research in this direction, and generating "interesting" captions for photos through a homebrew method is not impossible, but it can be highly resource-consuming and may result in lower quality compared to other models. This is especially true if the desired outcome is a diverse set of descriptions for images in a media library (why we care for seo mainly).

I came across an interesting approach that I'd like to mention. To address the challenge of generating varied image descriptions, you can create an API using the blip2-opt-2.7b model. I successfully implemented this by following a guide found here, and I made some additional modifications to meet specific requirements, such as the ability to add a prompt. The result of this implementation can be accessed at the following link:

blip-api by erikyo

In the first input you have to put the url of an image and then press submit, it takes about 60 seconds and then returns the description for it, the result seems to me generally very good but, as @swissspidy was pointing out, it is a method that requires something external (because the model weight 15gb and you need a lot of computing power to run it)

swissspidy · 2024-05-15T12:47:31Z

https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/ looks veeery promising

swissspidy added the enhancement New feature or request label Feb 13, 2024

swissspidy changed the title ~~Alt text ai generator~~ Alt text generation using AI Feb 13, 2024

swissspidy added feature p3 and removed enhancement New feature or request labels Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alt text generation using AI #350

Alt text generation using AI #350

Illyism commented Feb 8, 2024

erikyo commented Feb 8, 2024

erikyo commented Feb 10, 2024

Illyism commented Feb 10, 2024

erikyo commented Feb 10, 2024

swissspidy commented Feb 10, 2024

erikyo commented Feb 13, 2024

swissspidy commented May 15, 2024

Alt text generation using AI #350

Alt text generation using AI #350

Comments

Illyism commented Feb 8, 2024

erikyo commented Feb 8, 2024

erikyo commented Feb 10, 2024

Illyism commented Feb 10, 2024

erikyo commented Feb 10, 2024

swissspidy commented Feb 10, 2024

erikyo commented Feb 13, 2024

swissspidy commented May 15, 2024