# Extract and Cartoonize an Image

Configure these before start:

* `SOURCE_IMAGE_URL`: The URL of the image you intend to process;
* `CARTOON_SIZE`: The width $\times$ height of the generated cartoon image. 

In [1]:
SOURCE_IMAGE_URL = "https://cdn7.dissolve.com/p/D2115_143_759/D2115_143_759_1200.jpg"
CARTOON_SIZE = "1024x1024"

from IPython.display import Image

# Display the image
Image(url=SOURCE_IMAGE_URL, height=300)

## First: Get the most significant object in the image by detection confidence

> Model used: **detr-resnet-50** from [huggingface](https://huggingface.co/facebook/detr-resnet-50)

In [2]:
from transformers import DetrImageProcessor, DetrForObjectDetection
import torch
from PIL import Image as PILImage
import requests
import numpy as np

image = PILImage.open(requests.get(SOURCE_IMAGE_URL, stream=True).raw).convert('RGB')

processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50")

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)

# convert outputs (bounding boxes and class logits) to COCO API
# let's only keep detections with score > 0.9
target_sizes = torch.tensor([image.size[::-1]])
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0]

for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    box = [round(i, 2) for i in box.tolist()]
    print(
            f"Detected {model.config.id2label[label.item()]} with confidence "
            f"{round(score.item(), 3)} at location {box}"
    )

max_index = torch.argmax(results['scores']).item()
max_confidence = model.config.id2label[results["labels"][max_index].item()]

print("\nTherefore, we select the object with max confidence to generate cartoon:", max_confidence)

  from .autonotebook import tqdm as notebook_tqdm
The `max_size` parameter is deprecated and will be removed in v4.26. Please specify in `size['longest_edge'] instead`.
Some weights of the model checkpoint at facebook/detr-resnet-50 were not used when initializing DetrForObjectDetection: ['model.backbone.conv_encoder.model.layer3.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer1.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer2.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer4.0.downsample.1.num_batches_tracked']
- This IS expected if you are initializing DetrForObjectDetection from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DetrForObjectDetection from the checkpoint of a model that you expect to be exactly identical (initializin

Detected bottle with confidence 0.982 at location [44.87, 0.97, 181.52, 355.71]
Detected bed with confidence 0.912 at location [795.52, 317.47, 1199.16, 791.85]
Detected book with confidence 0.986 at location [291.74, 534.56, 495.68, 672.02]
Detected book with confidence 0.963 at location [337.93, 614.01, 542.14, 713.21]
Detected book with confidence 0.949 at location [347.23, 671.22, 561.6, 754.4]
Detected teddy bear with confidence 0.999 at location [565.73, 223.91, 898.75, 792.22]

Therefore, we select the object with max confidence to generate cartoon: teddy bear


## Second: Generate a cartoon image with the detected object's label as the prompt

> API used: [OpenAI](https://platform.openai.com/docs/guides/images)

OpenAI required your secret API Key. If you don't have one, please refer to [this page](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety). 

In [6]:
import openai
from dotenv import load_dotenv
import os

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY") # You have to get your own OpenAI API Key

print("This is to generate this cartoon:", max_confidence)

response = openai.Image.create(
  prompt="draw a cute, delightful, colorful, and single cartoon character of " + max_confidence, # Modify the prompt based on your intention
  n=1,
  size=CARTOON_SIZE
)

image_url = response['data'][0]['url']

print("Generated Cartoon URL:", image_url)
Image(url=image_url, height=300)

This is to generate this cartoon: teddy bear
Generated Cartoon URL: https://oaidalleapiprodscus.blob.core.windows.net/private/org-mBE2n0PaO4ZSs7r3e8XUTDGT/user-PYAqvMYpsXC4j1DtSEUbOYRN/img-6NSqZTEa35f0ofOKRSPOxSm9.png?st=2023-09-03T17%3A07%3A26Z&se=2023-09-03T19%3A07%3A26Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-09-03T05%3A39%3A08Z&ske=2023-09-04T05%3A39%3A08Z&sks=b&skv=2021-08-06&sig=iNtwUSdc3c1HahogkxHczP34wczl5Vlpw1ms2gIU1tg%3D
