# 🏷️ Image Classification with Pretrained Models

In this section, we will use a pre-trained image classification model to predict what is shown in a given image.

We'll use Hugging Face's `image-classification` pipeline.

In [1]:
from transformers import pipeline
from PIL import Image

# Load an image from  sample-images folder
image = Image.open("sample-images/red-fox.jpg")  

# Load image classification pipeline
classifier = pipeline("image-classification")

# Run classification
predictions = classifier(image)

# Show top prediction(s)
for pred in predictions:
    print(f"Label: {pred['label']}, Confidence: {pred['score']:.4f}")


No model was supplied, defaulted to google/vit-base-patch16-224 and revision 3f49326 (https://huggingface.co/google/vit-base-patch16-224).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


Label: red fox, Vulpes vulpes, Confidence: 0.9257
Label: grey fox, gray fox, Urocyon cinereoargenteus, Confidence: 0.0345
Label: kit fox, Vulpes macrotis, Confidence: 0.0307
Label: dhole, Cuon alpinus, Confidence: 0.0022
Label: Arctic fox, white fox, Alopex lagopus, Confidence: 0.0016


# 🖼️ Image Captioning with Pretrained Models

In this section, we use a pre-trained image captioning model to generate full sentence descriptions based on input images.

We'll use the `image-to-text` pipeline from Hugging Face.

In [2]:
from transformers import pipeline
from PIL import Image

# Load an image
image = Image.open("sample-images/Penguin.jpg") 

# Load the image-to-text (captioning) pipeline
captioner = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base")

# Generate caption
caption = captioner(image)

# Display caption
print("Generated Caption:", caption[0]['generated_text'])

config.json:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


pytorch_model.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/506 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Device set to use cpu


Generated Caption: a penguin standing on a snowy surface
