# Alt text generation for images
On serlo.org we have a lot of images without alternative texts that would help visually impaired people using screen readers. 
Let's have a look at how a machine learning model could help us automatically generating such alternative texts!

A popular choice for such tasks seem to be the BLIP 2 models. Two options to try them are:
## Use a Hugging Face inference endpoint 
This means the model runs on a Hugging Face server to which we send API requests. You find the code for that on the [model page](https://huggingface.co/Salesforce/blip-image-captioning-large) (or [base](https://huggingface.co/Salesforce/blip-image-captioning-base)) clicking on the button '🚀 Deploy' and then '⚡ Inference API'. To use it, you have to make a Hugging Face account and [create an access token](https://huggingface.co/settings/tokens).
## Run the model locally
Download the model from Hugging Face and run it where you are running this notebook, i.e. your computer or Google Colab.
Running BLIP 2 seems to work on a normal computer, so we can do that and do not need to get a Hugging Face access token for the inference endpoint. 
### Function to run the model
Now let's get to it and implement a function that takes the image URL and a model as input and outputs the generated alternative text:

In [1]:
!pip install transformers
!pip install torch

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


In [2]:
from transformers import pipeline

def generate_alt_text(image_url: str, model: str) -> str:
    output: list[dict[str, str]] = pipeline("image-to-text",model=model)(image_url)
    return output[0]['generated_text']

### Comparing base and large model

Set an image URL from serlo.org to test and use the above defined function to see what alternative text the two models generate:

In [3]:
import IPython.display

image_url: str = "https://assets.serlo.org/c48ee040-0aaa-11ee-95fa-c71a5f058ba5/radiusunddurchmesserolaf.jpg"
display(IPython.display.Image(url=image_url))

print("generated alt text with blip-image-captioning-base:  " + generate_alt_text(image_url, "Salesforce/blip-image-captioning-base"))
print("generated alt text with blip-image-captioning-large: " + generate_alt_text(image_url, "Salesforce/blip-image-captioning-large"))



generated alt text with blip-image-captioning-base:  a circle with a line of radius and a line of radius
generated alt text with blip-image-captioning-large: a diagram of a circle with a line that is in the middle
