# Object detection fine tuning on a custom dataset, deployment in Spaces and Gradio API communication

_Authored by: [Sergio Paniego](https://github.com/sergiopaniego)_

In this cookbook, we will be fine tuning an object detection model (DETR) using a custom dataset. After that, we will deploy it as a Gradio Space in HF and we will see how we can leverage the Gradio API to directly communicate with the deployed Spaces.

![DETR architecture](https://github.com/facebookresearch/detr/raw/main/.github/DETR.png)



[Link to DETR HF docs]

[More relevant references]

* https://huggingface.co/docs/transformers/tasks/object_detection


## Install dependencies

Let's install the libraries needed!

In [None]:
!pip install -U -q datasets transformers[torch] timm wandb torchmetrics

## Load dataset

[Image of the dataset]

The dataset that we will use is [Fashionpedia](https://huggingface.co/datasets/detection-datasets/fashionpedia).

This dataset comes from the paper [Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset](https://arxiv.org/abs/2004.12276). The authors describe it in the following terms:

````
Fashionpedia is a dataset which consists of two parts: (1) an ontology built by fashion experts containing 27 main apparel categories, 19 apparel parts, 294 fine-grained attributes and their relationships; (2) a dataset with 48k everyday and celebrity event fashion images annotated with segmentation masks and their associated per-mask fine-grained attributes, built upon the Fashionpedia ontology.
````

It contains:

* 46781 images
* 342182 bounding-boxes

It is available in the HF Dataset: https://huggingface.co/datasets/detection-datasets/fashionpedia

In [None]:
from datasets import load_dataset

dataset = load_dataset('detection-datasets/fashionpedia')
dataset

In [None]:
dataset["train"][0]

## Get splits of the dataset for training and testing

The dataset contains two data splits: train and test. We will use the first one for traing and the second one for validation.

In [None]:
train_dataset = dataset['train']
test_dataset = dataset['val']

**Optional**

In the following two cells, we take a 1% sample from the original dataset for each split. This is a convetion done so the training is faster since the dataset contains a lot of examples.
For the best results, we would recommend skipping these two cells.

In [None]:
'''
sample_size = int(0.01 * len(train_dataset))

train_dataset = train_dataset.shuffle(seed=42).select(range(sample_size))

print(f"Original size: {len(train_dataset)}")
print(f"Sample size: {len(train_dataset)}")
'''

In [None]:
'''
sample_size = int(0.01 * len(test_dataset))

test_dataset = test_dataset.shuffle(seed=42).select(range(sample_size))

print(f"Original size: {len(test_dataset)}")
print(f"Sample size: {len(test_dataset)}")
'''

## Visualize one example from the dataset with its objects

Now that we've loaded the dataset, let's visualize some examples.

### Generate id2label and label2id

These variables contain the mapping between the ids and the actual labels for the objects.

In [None]:
import numpy as np
from PIL import Image, ImageDraw


id2label = {
    0: 'shirt, blouse', 1: 'top, t-shirt, sweatshirt', 2: 'sweater', 3: 'cardigan',
    4: 'jacket', 5: 'vest', 6: 'pants', 7: 'shorts', 8: 'skirt', 9: 'coat',
    10: 'dress', 11: 'jumpsuit', 12: 'cape', 13: 'glasses', 14: 'hat',
    15: 'headband, head covering, hair accessory', 16: 'tie', 17: 'glove',
    18: 'watch', 19: 'belt', 20: 'leg warmer', 21: 'tights, stockings',
    22: 'sock', 23: 'shoe', 24: 'bag, wallet', 25: 'scarf', 26: 'umbrella',
    27: 'hood', 28: 'collar', 29: 'lapel', 30: 'epaulette', 31: 'sleeve',
    32: 'pocket', 33: 'neckline', 34: 'buckle', 35: 'zipper', 36: 'applique',
    37: 'bead', 38: 'bow', 39: 'flower', 40: 'fringe', 41: 'ribbon',
    42: 'rivet', 43: 'ruffle', 44: 'sequin', 45: 'tassel'
}


label2id = {v: k for k, v in id2label.items()}

### Let's draw one image!

In [None]:
def draw_image_from_idx(dataset, idx):
    sample = dataset[idx]
    image = sample["image"]
    annotations = sample["objects"]
    draw = ImageDraw.Draw(image)
    width, height = sample["width"], sample["height"]

    print(annotations)

    for i in range(len(annotations["bbox_id"])):
        box = annotations["bbox"][i]
        x1, y1, x2, y2 = tuple(box)

        # Normalize coordinates if necessary
        if max(box) <= 1.0:
            x1, y1 = int(x1 * width), int(y1 * height)
            x2, y2 = int(x2 * width), int(y2 * height)
        else:
            x1, y1 = int(x1), int(y1)
            x2, y2 = int(x2), int(y2)

        draw.rectangle((x1, y1, x2, y2), outline="red", width=3)
        draw.text((x1, y1), id2label[annotations["category"][i]], fill="green")

    return image


draw_image_from_idx(dataset=train_dataset, idx=10) # You can test changing this id

### Let's visualize some more images

In [None]:
import matplotlib.pyplot as plt

def plot_images(dataset, indices):
    """
    Plot images and their annotations.
    """
    num_cols = 3
    num_rows = int(np.ceil(len(indices) / num_cols))
    fig, axes = plt.subplots(num_rows, num_cols, figsize=(15, 10))

    for i, idx in enumerate(indices):
        row = i // num_cols
        col = i % num_cols

        image = draw_image_from_idx(dataset, idx)

        axes[row, col].imshow(image)
        axes[row, col].axis("off")

    for j in range(i + 1, num_rows * num_cols):
        fig.delaxes(axes.flatten()[j])

    plt.tight_layout()
    plt.show()

plot_images(train_dataset, range(9))

## Filter invalid bboxes

To start with the preprocessing of the dataset, we will be filtering some invalid bboxes that it contains.
After reviewing the dataset, we found that some bboxs didn't have a valid structure, so we decide to discart them.

In [None]:
from datasets import Dataset

def filter_invalid_bboxes(example):
    valid_bboxes = []
    valid_bbox_ids = []
    valid_categories = []
    valid_areas = []

    for i, bbox in enumerate(example['objects']['bbox']):
        x_min, y_min, x_max, y_max = bbox[:4]
        if x_min < x_max and y_min < y_max:
            valid_bboxes.append(bbox)
            valid_bbox_ids.append(example['objects']['bbox_id'][i])
            valid_categories.append(example['objects']['category'][i])
            valid_areas.append(example['objects']['area'][i])
        else:
            print(f"Image with invalid bbox: {example['image_id']} Invalid bbox detected and discarded: {bbox} - bbox_id: {example['objects']['bbox_id'][i]} - category: {example['objects']['category'][i]}")


    example['objects']['bbox'] = valid_bboxes
    example['objects']['bbox_id'] = valid_bbox_ids
    example['objects']['category'] = valid_categories
    example['objects']['area'] = valid_areas

    return example

train_dataset = train_dataset.map(filter_invalid_bboxes)
test_dataset = test_dataset.map(filter_invalid_bboxes)

print(train_dataset)
print(test_dataset)

## Visualize each class ocurrences

Let's understand further the dataset contain. In this case, we will be plotting each class ocurrences to get a better understanding of biases.

In [None]:
print(train_dataset)

id_list = []
category_examples = {}
for example in train_dataset:
  id_list += example['objects']['bbox_id']
  for category in example['objects']['category']:
    if id2label[category] not in category_examples:
      category_examples[id2label[category]] = 1
    else:
      category_examples[id2label[category]] += 1

id_list.sort()
print(id_list)
print(len(id_list))
print(category_examples)

In [None]:
import matplotlib.pyplot as plt

# Separate the keys and values
categories = list(category_examples.keys())
values = list(category_examples.values())

# Create the bar chart
plt.bar(categories, values, color='skyblue')

# Add titles and labels
plt.xlabel('Categories')
plt.ylabel('Number of Occurrences')
plt.title('Number of Occurrences by Category')
plt.xticks(rotation=90)

# Display the chart
plt.show()


We can see that some classes are overrepresented like shoe or sleeve.

## Add data augmentation to the dataset

Data augmentation is key for performance in this type of problems. In this case, we leverage albumentations capabilities for our needs.

[Albumentations image]

In [None]:
import albumentations as A


train_transform = A.Compose(
    [
        A.LongestMaxSize(500),
        A.PadIfNeeded(500, 500, border_mode=0, value=(0, 0, 0)),

        A.HorizontalFlip(p=0.5),
        A.RandomBrightnessContrast(p=0.5),
        A.HueSaturationValue(p=0.5),
        A.Rotate(limit=10, p=0.5),
        A.RandomScale(scale_limit=0.2, p=0.5),
        A.GaussianBlur(p=0.5),
        A.GaussNoise(p=0.5),
    ],
    bbox_params=A.BboxParams(
        format="pascal_voc",
        label_fields=["category"]
    ),
)

val_transform = A.Compose(
    [
        A.LongestMaxSize(500),
        A.PadIfNeeded(500, 500, border_mode=0, value=(0, 0, 0)),
    ],
    bbox_params=A.BboxParams(
        format="pascal_voc",
        label_fields=["category"]
    ),
)

## Init image processor from model checkpoint

We instantiate the image processor from the pretrained checkpoint. In this case, we will be using facebook/detr-resnet-50-dc5 model.

In [None]:
from transformers import AutoImageProcessor

checkpoint = "facebook/detr-resnet-50-dc5"
image_processor = AutoImageProcessor.from_pretrained(checkpoint)

### We add some methods to process the dataset

In [None]:
def formatted_anns(image_id, category, area, bbox):
    annotations = []
    for i in range(0, len(category)):
        new_ann = {
            "image_id": image_id,
            "category_id": category[i],
            "isCrowd": 0,
            "area": area[i],
            "bbox": list(bbox[i]),
        }
        annotations.append(new_ann)

    return annotations

def convert_voc_to_coco(bbox):
    xmin, ymin, xmax, ymax = bbox
    width = xmax - xmin
    height = ymax - ymin
    return [xmin, ymin, width, height]

def transform_aug_ann(examples, transform):
    image_ids = examples["image_id"]
    images, bboxes, area, categories = [], [], [], []
    for image, objects in zip(examples["image"], examples["objects"]):
        image = np.array(image.convert("RGB"))[:, :, ::-1]
        out = transform(image=image, bboxes=objects["bbox"], category=objects["category"])

        area.append(objects["area"])
        images.append(out["image"])

        # Convert to COCO format
        converted_bboxes = [convert_voc_to_coco(bbox) for bbox in out["bboxes"]]
        bboxes.append(converted_bboxes)

        categories.append(out["category"])

    targets = [
        {"image_id": id_, "annotations": formatted_anns(id_, cat_, ar_, box_)}
        for id_, cat_, ar_, box_ in zip(image_ids, categories, area, bboxes)
    ]

    return image_processor(images=images, annotations=targets, return_tensors="pt")

def transform_train(examples):
    return transform_aug_ann(examples, transform=train_transform)

def transform_val(examples):
    return transform_aug_ann(examples, transform=val_transform)


train_dataset_transformed = train_dataset.with_transform(transform_train)
test_dataset_transformed = test_dataset.with_transform(transform_val)

In [None]:
def collate_fn(batch):
    pixel_values = [item["pixel_values"] for item in batch]
    encoding = image_processor.pad(pixel_values, return_tensors="pt")
    labels = [item["labels"] for item in batch]

    batch = {}
    batch["pixel_values"] = encoding["pixel_values"]  # Do not move to GPU here
    batch["pixel_mask"] = encoding["pixel_mask"]      # Do not move to GPU here
    batch["labels"] = labels

    return batch

## Plot augmented examples

We are close to training our model! But first, let's visualize some samples after the augmentation is done, so we can doble check that the augmentations are suitable for the training procedure.

In [None]:
# Updated draw function to accept an optional transform
def draw_augmented_image_from_idx(dataset, idx, transform=None):
    sample = dataset[idx]
    image = sample["image"]
    annotations = sample["objects"]

    # Convert image to RGB and NumPy array
    image = np.array(image.convert("RGB"))[:, :, ::-1]

    if transform:
        augmented = transform(image=image, bboxes=annotations["bbox"], category=annotations["category"])
        image = augmented["image"]
        annotations["bbox"] = augmented["bboxes"]
        annotations["category"] = augmented["category"]

    image = Image.fromarray(image[:, :, ::-1])  # Convert back to PIL Image
    draw = ImageDraw.Draw(image)
    width, height = sample["width"], sample["height"]

    for i in range(len(annotations["bbox_id"])):
        box = annotations["bbox"][i]
        x1, y1, x2, y2 = tuple(box)

        # Normalize coordinates if necessary
        if max(box) <= 1.0:
            x1, y1 = int(x1 * width), int(y1 * height)
            x2, y2 = int(x2 * width), int(y2 * height)
        else:
            x1, y1 = int(x1), int(y1)
            x2, y2 = int(x2), int(y2)

        draw.rectangle((x1, y1, x2, y2), outline="red", width=3)
        draw.text((x1, y1), id2label[annotations["category"][i]], fill="green")

    return image

# Updated plot function to include augmentation
def plot_augmented_images(dataset, indices, transform=None):
    """
    Plot images and their annotations with optional augmentation.
    """
    num_rows = len(indices) // 3
    num_cols = 3
    fig, axes = plt.subplots(num_rows, num_cols, figsize=(15, 10))

    for i, idx in enumerate(indices):
        row = i // num_cols
        col = i % num_cols

        # Draw augmented image
        image = draw_augmented_image_from_idx(dataset, idx, transform=transform)

        # Display image on the corresponding subplot
        axes[row, col].imshow(image)
        axes[row, col].axis("off")

    plt.tight_layout()
    plt.show()

# Now use the function to plot augmented images
plot_augmented_images(train_dataset, range(9), transform=train_transform)

## Init model from checkpoint

We init the model from the same checkpoint as the image processor. We load an already pretrained model that we wil fine tune for this particular dataset.

In [None]:
from transformers import AutoModelForObjectDetection

model = AutoModelForObjectDetection.from_pretrained(
    checkpoint,
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True,
)

In [None]:
output_dir = "detr-resnet-50-fashionpedia-finetuned-test" # change this

## Connect to HF Hub to upload fine tuned model

In [None]:
from huggingface_hub import notebook_login

notebook_login()

## Set training arguments, connect to W&B and train! <img src="https://wandb.ai/logo.png" alt="W&B logo" width="3%">


In [None]:
from transformers import TrainingArguments
from transformers import Trainer

import torch

# Define the training arguments

training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=3,
    max_steps=10000,
    fp16=True,
    save_steps=10,
    logging_steps=1,
    learning_rate=1e-5,
    weight_decay=1e-4,
    save_total_limit=2,
    remove_unused_columns=False,
    evaluation_strategy="steps",
    eval_steps=50,
    eval_strategy = "steps",
    report_to="wandb",
    push_to_hub=True,
    batch_eval_metrics=True
)

### Connect to W&B to track the training

In [None]:
import wandb

wandb.init(
    project="detr-resnet-50-fashionpedia-finetuned-test", # change this
    name="detr-resnet-50-fashionpedia-finetuned-test", # change this
    config=training_args
)

### Let's train the model!

In [None]:
from torchmetrics.detection.mean_ap import MeanAveragePrecision
from torch.nn.functional import softmax

def denormalize_boxes(boxes, width, height):
    boxes = boxes.clone()
    boxes[:, 0] *= width  # xmin
    boxes[:, 1] *= height  # ymin
    boxes[:, 2] *= width  # xmax
    boxes[:, 3] *= height  # ymax
    return boxes

batch_metrics = []
def compute_metrics(eval_pred, compute_result):
    global batch_metrics

    (loss_dict, scores, pred_boxes, last_hidden_state, encoder_last_hidden_state), labels = eval_pred

    image_sizes = []
    target = []
    for label in labels:

        image_sizes.append(label['orig_size'])
        width, height = label['orig_size']
        denormalized_boxes = denormalize_boxes(label["boxes"], width, height)
        target.append(
            {
                "boxes": denormalized_boxes,
                "labels": label["class_labels"],
            }
        )
    predictions = []
    for score, box, target_sizes in zip(scores, pred_boxes, image_sizes):
        # Extract the bounding boxes, labels, and scores from the model's output
        pred_scores = score[:, :-1]  # Exclude the no-object class
        pred_scores = softmax(pred_scores, dim=-1)
        width, height = target_sizes
        pred_boxes = denormalize_boxes(box, width, height)
        pred_labels = torch.argmax(pred_scores, dim=-1)

        # Get the scores corresponding to the predicted labels
        pred_scores_for_labels = torch.gather(pred_scores, 1, pred_labels.unsqueeze(-1)).squeeze(-1)
        predictions.append(
            {
                "boxes": pred_boxes,
                "scores": pred_scores_for_labels,
                "labels": pred_labels,
            }
        )


    metric = MeanAveragePrecision(box_format='xywh', class_metrics=True)

    if not compute_result:
        # Accumulate batch-level metrics
        batch_metrics.append({"preds": predictions, "target": target})
        return {}
    else:
        # Compute final aggregated metrics
        # Aggregate batch-level metrics (this should be done based on your metric library's needs)
        all_preds = []
        all_targets = []
        for batch in batch_metrics:
            all_preds.extend(batch["preds"])
            all_targets.extend(batch["target"])

        # Update metric with all accumulated predictions and targets
        metric.update(preds=all_preds, target=all_targets)
        metrics = metric.compute()

        # Convert and format metrics as needed
        classes = metrics.pop("classes")
        map_per_class = metrics.pop("map_per_class")
        mar_100_per_class = metrics.pop("mar_100_per_class")

        for class_id, class_map, class_mar in zip(classes, map_per_class, mar_100_per_class):
            class_name = id2label[class_id.item()] if id2label is not None else class_id.item()
            metrics[f"map_{class_name}"] = class_map
            metrics[f"mar_100_{class_name}"] = class_mar

        # Round metrics for cleaner output
        metrics = {k: round(v.item(), 4) for k, v in metrics.items()}

        # Clear batch metrics for next evaluation
        batch_metrics = []

        return metrics

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=collate_fn,
    train_dataset=train_dataset_transformed,
    eval_dataset=test_dataset_transformed,
    tokenizer=image_processor,
    compute_metrics=compute_metrics  # Add this line to compute custom metrics
)

In [None]:
trainer.train()

In [None]:
trainer.push_to_hub()

## Test how the model behaves on a test image

Now that the model is trained, we can check its capabilities easily since it is already available as a HF model.
As you can see in the following cell, making a prediction for a new image is straight forward.

_The image.jpg is an image uploaded to Google Colab, so you can upload any new image for testing_

In [None]:
import requests
from transformers import pipeline
import numpy as np
from PIL import Image, ImageDraw

url = "https://images.pexels.com/photos/27980131/pexels-photo-27980131/free-photo-of-mar-moda-hombre-pareja.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=2"

image = Image.open(requests.get(url, stream=True).raw)

obj_detector = pipeline(
    "object-detection", model="sergiopaniego/fashionpedia-finetuned_albumentations_coco" # Change with your model name
)


results = obj_detector(image)
print(results)

### Now, we show the results

In [None]:
from PIL import Image, ImageDraw
import numpy as np

def plot_results(image, results, threshold=0.5):
    image = Image.fromarray(np.uint8(image))
    draw = ImageDraw.Draw(image)
    width, height = image.size

    for result in results:
        score = result['score']
        label = result['label']
        box = list(result['box'].values())

        if score > threshold:
            x1, y1, x2, y2 = tuple(box)

            # Normalize coordinates if necessary
            if max(box) <= 1.0:
                x1, y1 = int(x1 * width), int(y1 * height)
                x2, y2 = int(x2 * width), int(y2 * height)
            else:
                x1, y1 = int(x1), int(y1)
                x2, y2 = int(x2), int(y2)

            draw.rectangle((x1, y1, x2, y2), outline="red", width=3)
            draw.text((x1 + 5, y1 - 10), label, fill="white")
            draw.text((x1 + 5, y1 + 10), f'{score:.2f}', fill='green' if score > 0.7 else 'red')

    return image

In [None]:
plot_results(image, results)

## Evaluation of the model in the test set

After training and visualizating the results for a test image, we generate the metrics for the whole test dataset.

In [None]:
outputs = trainer.predict(test_dataset_transformed)
print(outputs.metrics)

## Evaluate model over test set

In [None]:
metrics = trainer.evaluate(test_dataset_transformed)
print(metrics)

## Deploy the model in a HF Space  <img src="https://seeklogo.com/images/G/gradio-icon-logo-908AE1836C-seeklogo.com.png" alt="Gradio logo" width="5%">

<img src="https://huggingface.co/front/thumbnails/spaces.png" alt="HF Spaces logo" width="20%">

Now we have our model available in the HF models. HF offers free Spaces for small applications so we can generate a new application where we can upload a test image via web and test the capabilities of the model.

I've created an example application here: https://huggingface.co/spaces/sergiopaniego/DETR_object_detection_fashionpedia-finetuned

### Create a new Space

### Create the application with the following code

You can copy paste this code to a new app.py file.

In [None]:
# app.py

import gradio as gr
import spaces
import torch

from PIL import Image
import requests
from transformers import DetrImageProcessor
from transformers import DetrForObjectDetection
import matplotlib.pyplot as plt
import io


processor = DetrImageProcessor.from_pretrained("sergiopaniego/fashionpedia-finetuned_albumentations_coco") # Change with your model
model = DetrForObjectDetection.from_pretrained("sergiopaniego/fashionpedia-finetuned_albumentations_coco") # Change with your model


COLORS = [[0.000, 0.447, 0.741], [0.850, 0.325, 0.098], [0.929, 0.694, 0.125],
          [0.494, 0.184, 0.556], [0.466, 0.674, 0.188], [0.301, 0.745, 0.933]]

def get_output_figure(pil_img, scores, labels, boxes):
    plt.figure(figsize=(16, 10))
    plt.imshow(pil_img)
    ax = plt.gca()
    colors = COLORS * 100
    for score, label, (xmin, ymin, xmax, ymax), c in zip(scores.tolist(), labels.tolist(), boxes.tolist(), colors):
        ax.add_patch(plt.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin, fill=False, color=c, linewidth=3))
        text = f'{model.config.id2label[label]}: {score:0.2f}'
        ax.text(xmin, ymin, text, fontsize=15,
                bbox=dict(facecolor='yellow', alpha=0.5))
    plt.axis('off')

    return plt.gcf()

@spaces.GPU
def detect(image):
    encoding = processor(image, return_tensors='pt')
    print(encoding.keys())

    with torch.no_grad():
        outputs = model(**encoding)

    width, height = image.size
    postprocessed_outputs = processor.post_process_object_detection(outputs, target_sizes=[(height, width)], threshold=0.5)
    results = postprocessed_outputs[0]


    output_figure = get_output_figure(image, results['scores'], results['labels'], results['boxes'])

    buf = io.BytesIO()
    output_figure.savefig(buf, bbox_inches='tight')
    buf.seek(0)
    output_pil_img = Image.open(buf)

    return output_pil_img

with gr.Blocks() as demo:
    gr.Markdown("# Object detection with DETR fine tuned on detection-datasets/fashionpedia")
    gr.Markdown(
        """
        This application uses a fine tuned DETR (DEtection TRansformers) to detect objects on images.
        This version was trained using detection-datasets/fashionpedia dataset.
        You can load an image and see the predictions for the objects detected.
        """
    )

    gr.Interface(
        fn=detect,
        inputs=gr.Image(label="Input image", type="pil"),
        outputs=[
            gr.Image(label="Output prediction", type="pil")
        ]
    )#.launch()

demo.launch(show_error=True)

### Remember setting the requirements.txt!

In [None]:
# requirements.txt

transformers
timm
torch

## Access the Space as an API

Some interesting thing about these spaces is that they provide an API that you can call from outside, which can be used to generate a new application.
We will see how easy it is to call the application as an API and to obtain the results. You could call the Space from a JS application, Python app... imagine the possibilities!

You can find more info in: https://huggingface.co/learn/cookbook/enterprise_cookbook_gradio

In [None]:
!pip install gradio_client

In [None]:
from gradio_client import Client, handle_file

url = "https://images.pexels.com/photos/27980131/pexels-photo-27980131/free-photo-of-mar-moda-hombre-pareja.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=2"

image = Image.open(requests.get(url, stream=True).raw)

client = Client("sergiopaniego/DETR_object_detection_fashionpedia-finetuned") # change this with your Space
result = client.predict(
		image=handle_file("https://images.pexels.com/photos/27980131/pexels-photo-27980131/free-photo-of-mar-moda-hombre-pareja.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=2"),
		api_name="/predict"
)

In [None]:
from PIL import Image

img = Image.open(result).convert('RGB')

In [None]:
from IPython.display import display
display(img)