# Lesson 2: Image Segmentation

<p style="background-color:#fff6e4; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px"> ⏳ <b>Note <code>(Kernel Starting)</code>:</b> This notebook takes about 30 seconds to be ready to use. You may start and watch the video while you wait.</p>

* In this classroom, the libraries have been already installed for you.
* If you would like to run this code on your own machine, you need to install the following:
    ```
    !pip install ultralytics torch
    ```

### Load the sample image

In [None]:
from PIL import Image
raw_image = Image.open("dogs.jpg")
raw_image

>Note: the images referenced in this notebook have already been uploaded to the Jupyter directory, in this classroom, for your convenience. For further details, please refer to the **Appendix** section located at the end of the lessons.

* Resize the image.

In [None]:
from utils import resize_image
resized_image = resize_image(raw_image, input_size=1024)
resized_image

### Import and prepare the model

In [None]:
import torch

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

Info about [torch](https://pytorch.org/).

In [None]:
from ultralytics import YOLO
import os
model = YOLO('./FastSAM-x.pt')

Info about ['FastSAM'](https://docs.ultralytics.com/models/fast-sam/)

### Use the model

>Note: ```utils``` is an additional file containing the methods that have been already developed for you to be used in this classroom. 
For further details, please refer to the **Appendix** section located at the end of the lessons.

In [None]:
from utils import show_points_on_image

In [None]:
# Define the coordinates for the point in the image
# [x_axis, y_axis]
input_points = [ [350, 450 ] ]

In [None]:
input_labels = [1] # positive point

In [None]:
# Function written in the utils file
show_points_on_image(resized_image, input_points)

In [None]:
# Run the model
results = model(resized_image, device=device, retina_masks=True)

* Filter the mask based on the point defined before.

In [None]:
from utils import format_results, point_prompt

In [None]:
results = format_results(results[0], 0)

In [None]:
# Generate the masks
masks, _ = point_prompt(results, input_points, input_labels)

In [None]:
from utils import show_masks_on_image

In [None]:
# Visualize the generated masks
show_masks_on_image(resized_image, [masks])

* Define 'semantic masks' - two points to be masked.

In [None]:
# Specify two points in the same model
# [x_axis, y_axis]
input_points = [ [350, 450], [620, 450] ]

In [None]:
# Specify both points as "positive prompt"
input_labels = [1 , 1] # both positive points

In [None]:
# Visualize the points defined before
show_points_on_image(resized_image, input_points)

In [None]:
# Run the model
results = model(resized_image, device=device, retina_masks=True)

In [None]:
results = format_results(results[0], 0)

In [None]:
# Generate the masks
masks, _ = point_prompt(results, input_points, input_labels)

In [None]:
# Visualize the generated masks
show_masks_on_image(resized_image, [masks])

>Note: Please note that the results obtained from running this notebook may vary slightly from those demonstrated by the instructor in the video. 

* Identify subsections of the image by adding a **negative prompt**.

In [None]:
# Define the coordinates for the points to be masked
# [x_axis, y_axis]
input_points = [ [350, 450], [400, 300]  ]

In [None]:
input_labels = [1, 0] # positive prompt, negative prompt

In [None]:
# Visualize the points defined before
show_points_on_image(resized_image, input_points, input_labels)

>Note: From the image above, the red star indicates the negative prompt and the green star the positive prompt.

In [None]:
# Run the model
results = model(resized_image, device=device, retina_masks=True)

In [None]:
results = format_results(results[0], 0)

In [None]:
# Generate the masks
masks, _ = point_prompt(results, input_points, input_labels)

In [None]:
# Visualize the generated masks
show_masks_on_image(resized_image, [masks])

>Note: From the image above, only the jacket, from the dog in the left, was segmented, so, it is following the command given by the positive prompt!

### Prompting with bounding boxes

In [None]:
from utils import box_prompt

In [None]:
# Set the coordinates for the box
# [xmin, ymin, xmax, ymax]
input_boxes = [530, 180, 780, 600]

In [None]:
from utils import show_boxes_on_image

In [None]:
# Visualize the bounding box defined with the coordinates above
show_boxes_on_image(resized_image, [input_boxes])

* Now, try to isolate the mask from the total output of the model.

In [None]:
from utils import box_prompt

In [None]:
results = model(resized_image, device=device, retina_masks=True)

In [None]:
#Generate the masks
masks = results[0].masks.data

In [None]:
masks

In [None]:
# Convert to True/False booleans
masks = masks > 0

In [None]:
masks

In [None]:
masks, _ = box_prompt(masks, input_boxes)

In [None]:
# Visualize the masks
show_masks_on_image(resized_image, [masks])

In [None]:
# Print the segmentation mask, but in its raw format
masks

In [None]:
# To visualize, import matplotlib
from matplotlib import pyplot as plt

In [None]:
# Plot the binary mask as an image
plt.imshow(masks, cmap='gray')

### Try yourself! 
Try the image segmentation explained before with your own images.

In [None]:
# Start opening images, we have sample images, for example: younes.png
# The image younes.png is already uploaded in this classroom
raw_image = Image.open('istockphoto.jpeg')
raw_image

In [None]:
# Resize image


In [None]:
# Define the coordinates for the point: [x_axis, y_axis]


In [None]:
# Define the positive or negative prompt


In [None]:
# show_points_on_image(resized_image, input_points)

### Additional Resources

* For more on how to use Comet for experiment tracking, check out this [Quickstart Guide](https://colab.research.google.com/drive/1jj9BgsFApkqnpPMLCHSDH-5MoL_bjvYq?usp=sharing) and the [Comet Docs](https://www.comet.com/docs/v2/).
* This course was based off a set of two blog articles from Comet. Explore them here for more on how to use newer versions of Stable Diffusion in this pipeline, additional tricks to improve your inpainting results, and a breakdown of the pipeline architecture:
  * [SAM + Stable Diffusion for Text-to-Image Inpainting](https://www.comet.com/site/blog/sam-stable-diffusion-for-text-to-image-inpainting/)
  * [Image Inpainting for SDXL 1.0 Base Model + Refiner](https://www.comet.com/site/blog/image-inpainting-for-sdxl-1-0-base-refiner/)


# Lesson 3: Object Detection

<p style="background-color:#fff6e4; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px"> ⏳ <b>Note <code>(Kernel Starting)</code>:</b> This notebook takes about 30 seconds to be ready to use. You may start and watch the video while you wait.</p>


* In this classroom, the libraries have been already installed for you.
* If you would like to run this code on your own machine, you need to install the following:
    ```
    !pip install comet_ml --quiet
    !pip install transformers
    !pip install ultralytics torch
    ```

### Set up Comet

In [None]:
import comet_ml

Info about ['Comet'](https://www.comet.com/site/)

In [None]:
comet_ml.init(anonymous=True, project_name="3: OWL-ViT + SAM")

In [None]:
exp = comet_ml.Experiment()

### Load the image

In [None]:
# To display the image
from PIL import Image

In [None]:
logged_artifact = exp.get_artifact("L3-data", "anmorgan24")

>Note: the images referenced in this notebook have already been uploaded to the Jupyter directory, in this classroom, for your convenience. For further details, please refer to the **Appendix** section located at the end of the lessons.

In [None]:
local_artifact = logged_artifact.download("./")

In [None]:
# Display the images
raw_image = Image.open("L3_data/dogs.jpg")
raw_image

### Get bounding boxes with OWL-ViT object detection model

>Note: `pipeline` is already installed for you in this classroom.

In [None]:
from transformers import pipeline

In [None]:
OWL_checkpoint = "google/owlvit-base-patch32"

Info about ['google/owlvit-base-patch32'](https://huggingface.co/google/owlvit-base-patch32)

* Build the pipeline for the detector model.

In [None]:
# Load the model
detector = pipeline(
    model= OWL_checkpoint,
    task="zero-shot-object-detection"
)

In [None]:
# What you want to identify in the image
text_prompt = "dog"

In [None]:
output = detector(
    raw_image,
    candidate_labels = [text_prompt]
)

In [None]:
# Print the output to identify the bounding boxes detected
output

* Use the **util**'s function to prompt boxes in top of the image.

>Note: ```utils``` is an additional file containing the methods that have been already developed for you to be used in this classroom. 
For further details, please refer to the **Appendix** section located at the end of the lessons.

In [None]:
from utils import preprocess_outputs

In [None]:
input_scores, input_labels, input_boxes = preprocess_outputs(output)

In [None]:
from utils import show_boxes_and_labels_on_image

In [None]:
# Show the image with the bounding boxes
show_boxes_and_labels_on_image(
    raw_image,
    input_boxes[0],
    input_labels,
    input_scores
)

### Get segmentation masks using Mobile SAM

In [None]:
# Load the SAM model from the imported ultralytics library
from ultralytics import SAM

In [None]:
SAM_version = "mobile_sam.pt"

Info about [mobile_sam.pt](https://docs.ultralytics.com/models/mobile-sam/)

In [None]:
model = SAM(SAM_version)

* Generate an array using numpy.

In [None]:
import numpy as np

In [None]:
labels = np.repeat(1, len(output))

In [None]:
# Print the number of bounding boxes
labels

In [None]:
result = model.predict(
    raw_image,
    bboxes=input_boxes[0],
    labels=labels
)

In [None]:
result

In [None]:
masks = result[0].masks.data
masks

In [None]:
from utils import show_masks_on_image

In [None]:
# Visualize the masks
show_masks_on_image(
    raw_image,
    masks
)

>Note: Please note that the results obtained from running this notebook may vary slightly from those demonstrated by the instructor in the video. 

### Image Editing: blur out faces

* Load the image.

In [None]:
from PIL import Image

In [None]:
image_path = "L3_data/people.jpg"

>Note: the images referenced in this notebook have already been uploaded to the Jupyter directory, in this classroom, for your convenience. For further details, please refer to the **Appendix** section located at the end of the lessons.

In [None]:
raw_image = Image.open(image_path)
raw_image

In [None]:
raw_image.size

* Resize the image.

In [None]:
# Width transformation
mywidth = 600
wpercent = mywidth / float(raw_image.size[0])
wpercent

In [None]:
# Height transformation
hsize = int( float(raw_image.size[1]) * wpercent )
hsize

In [None]:
# Resize
raw_image = raw_image.resize([mywidth, hsize])
raw_image

In [None]:
raw_image.size

In [None]:
# Save the resized image
image_path_resized = "people_resized.jpg"
raw_image.save(image_path_resized)

### Detect faces

In [None]:
candidate_labels = ["human face"]

In [None]:
# Define a new Comet experiment for this new pipeline
exp = comet_ml.Experiment()

In [None]:
# Log raw image to the experiment
_ = exp.log_image(
    raw_image,
    name = "Raw image"
)

* Create bounding boxes with OWL-ViT.

In [None]:
# Apply detector model to the raw image
output = detector(
    raw_image,
    candidate_labels=candidate_labels
)

In [None]:
input_scores, input_labels, input_boxes = preprocess_outputs(output)

In [None]:
# Print values of the bounding boxes identified
input_boxes

#### Log the images and bounding boxes.

In [None]:
metadata = {
    "OWL prompt": candidate_labels,
    "SAM version": SAM_version,
    "OWL Version": OWL_checkpoint
}

In [None]:
from utils import make_bbox_annots

In [None]:
annotations = make_bbox_annots(
    input_scores,
    input_labels,
    input_boxes,
    metadata
)

In [None]:
_ = exp.log_image(
    raw_image,
    annotations= annotations,
    metadata=metadata,
    name= "OWL output"
)

### Segmentation masks using SAM

In [None]:
result = model.predict(
    image_path_resized,
    bboxes=input_boxes[0],
    labels=np.repeat(1, len(input_boxes[0]))
)

### Blur entire image first

In [None]:
from PIL.ImageFilter import GaussianBlur

In [None]:
blurred_img = raw_image.filter(GaussianBlur(radius=5))

In [None]:
blurred_img 

In [None]:
masks = result[0].masks.data.cpu().numpy()

In [None]:
# Obtain only a single mask
total_mask = np.zeros(masks[0].shape)

In [None]:
for mask in masks:
    total_mask = np.add(total_mask,mask)

In [None]:
output = np.where(
    np.expand_dims(total_mask != 0, axis=2),
    blurred_img,
    raw_image
)

In [None]:
import matplotlib.pyplot as plt

In [None]:
# Print image with faces blured
plt.imshow(output)

* Log this image in the **Comet** platform.

In [None]:
metadata = {
    "OWL prompt": candidate_labels,
    "SAM version": SAM_version,
    "OWL version": OWL_checkpoint
}

In [None]:
_ = exp.log_image(
    output,
    name="Blurred masks",
    metadata = metadata,
    annotations=None
)

### Blur just faces of those not wearing sunglasses

In [None]:
# New label
candidate_labels = ["a person without sunglasses"]

* Re-run the pipeline.

In [None]:
exp = comet_ml.Experiment()

In [None]:
_ = exp.log_image(raw_image, name="Raw image")

In [None]:
output = detector(raw_image, candidate_labels=candidate_labels)

In [None]:
input_scores, input_labels, input_boxes = preprocess_outputs(output)

In [None]:
# Print the number of bounding boxes
input_boxes

* Explore in the **Comet** platform what is happening.

In [None]:
from utils import make_bbox_annots

In [None]:
metadata = {
    "OWL prompt": candidate_labels,
    "SAM version": SAM_version,
    "OWL version": OWL_checkpoint,
}

In [None]:
annotations = make_bbox_annots(
    input_scores,
    input_labels,
    input_boxes,
    metadata
)

In [None]:
_ = exp.log_image(
    raw_image,
    annotations=annotations,
    metadata=metadata,
    name="OWL output no sunglasses"
)

In [None]:
result = model.predict(
    image_path_resized,
    bboxes=input_boxes[0],
    labels=np.repeat(1, len(input_boxes[0]))
)

In [None]:
from PIL.ImageFilter import GaussianBlur
blurred_img = raw_image.filter(GaussianBlur(radius=5))

In [None]:
masks = result[0].masks.data.cpu().numpy()

total_mask = np.zeros(masks[0].shape)
for mask in masks:
    total_mask = np.add(total_mask, mask)

In [None]:
# Print the result
output = np.where(
    np.expand_dims(total_mask != 0, axis=2),
    blurred_img,
    raw_image
)
plt.imshow(output)

* Analyze results in the **Comet** platform.

In [None]:
metadata = {
    "OWL prompt": candidate_labels,
    "SAM version": SAM_version,
    "OWL version": OWL_checkpoint,
}

In [None]:
_ = exp.log_image(
    output,
    name="Blurred masks no sunglasses",
    metadata=metadata,
    annotations=None
)

### Try yourself! 
Try the image editing with the following images.

>Note: the images referenced in this notebook have already been uploaded to the Jupyter directory, in this classroom, for your convenience. For further details, please refer to the **Appendix** section located at the end of the lessons.

In [None]:
cafe_img = Image.open("L3_data/cafe.jpg")
cafe_img

In [None]:
crosswalk_img = Image.open("L3_data/crosswalk.jpg")
crosswalk_img

In [None]:
metro_img = Image.open("L3_data/metro.jpg")
metro_img

In [None]:
friends_img = Image.open("L3_data/friends.jpg")
friends_img

### Additional Resources

* For more on how to use Comet for experiment tracking, check out this [Quickstart Guide](https://colab.research.google.com/drive/1jj9BgsFApkqnpPMLCHSDH-5MoL_bjvYq?usp=sharing) and the [Comet Docs](https://www.comet.com/docs/v2/).
* This course was based off a set of two blog articles from Comet. Explore them here for more on how to use newer versions of Stable Diffusion in this pipeline, additional tricks to improve your inpainting results, and a breakdown of the pipeline architecture:
  * [SAM + Stable Diffusion for Text-to-Image Inpainting](https://www.comet.com/site/blog/sam-stable-diffusion-for-text-to-image-inpainting/)
  * [Image Inpainting for SDXL 1.0 Base Model + Refiner](https://www.comet.com/site/blog/image-inpainting-for-sdxl-1-0-base-refiner/)

# Lesson 4: Image Generation

### Set up Comet

In [None]:
import  comet_ml

In [None]:
comet_ml.init(anonymous=True, project_name="4: Diffusion Prompting")

In [None]:
# Create the Comet Experiment for logging
exp = comet_ml.Experiment()

logged_artifact = exp.get_artifact("L4-data", "anmorgan24")
local_artifact = logged_artifact.download("./")

### Load images

In [None]:
from PIL import Image

In [None]:
image=Image.open("L4_data/boy-with-kitten.jpg").resize((256, 256))
image_mask=Image.open("L4_data/cat_binary_mask.png").resize((256, 256))

>Note: the images referenced in this notebook have already been uploaded to the Jupyter directory, in this classroom, for your convenience. For further details, please refer to the **Appendix** section located at the end of the lessons.

In [None]:
import matplotlib.pyplot as plt

In [None]:
# Print the image
plt.imshow(image)

In [None]:
# Print the mask
plt.imshow(image_mask)

### Import and prepare the model

#### Import [torch](https://pytorch.org/).

In [None]:
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

* Initialize the Stable Diffusion inpainting pipeline.
  -- Note, if you'd like to learn more about the `float16` versus `bfloat16` data type and when you would use one or the other, please check out the short course ["Quantization Fundamentals" Lesson "Data Types and Sizes](https://learn.deeplearning.ai/courses/quantization-fundamentals/lesson/3/data-types-and-sizes)

In [None]:
from diffusers import StableDiffusionInpaintPipeline
sd_pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-inpainting",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.bfloat16,
    low_cpu_mem_usage=False if torch.cuda.is_available() else True,
).to(device)

In [None]:
# Set the value of seed manually for reproducibility of the results
seed = 66733
generator = torch.Generator(device).manual_seed(seed)

In [None]:
prompt = "a realistic green dragon"

### Note
- Starting from this point, the code that generates the images will  take too long to run in the given classroom environment. 
  - So the code is left as markdown in the classroom.
- In the classroom, and regardless of whether you have access to GPUs, you can still run the code that retrieves these image generation results using the experiment tracking tool.
> - Thank you for your understanding as we try to make these courses free and accessible to as many people as possible. 💕 💫

- **Hardware requirements:** 
To generate the images taught in this lab, a machine with at least 8 GB of CPU should suffice, and you can expect results within approximately 3 minutes for 3 inference steps. However, for tasks involving 10 steps, using a CPU may extend the execution time to about 10 minutes. Tasks involving 100 steps will significantly prolong the execution time on a CPU.
Alternatively, consider utilizing a GPU for faster processing. With a GPU, such as a local one or through platforms like [Colab GPU](https://colab.research.google.com/), all three steps can be completed in under 1 second.

#### Define new **Comet** experiment.
- The following image generation code will takes hours without a GPU.
- Its results are saved with an experiment tracking tool (Comet), so that you can retrieve them in this classroom environment (and on any computer, regardless of GPU access).

```Python
exp = comet_ml.Experiment()

output = sd_pipe(
  image=image,
  mask_image=image_mask,
  prompt=prompt,
  generator=generator,
  num_inference_steps=3,
)
```


```Python
generated_image = output.images[0]

exp.log_image(
    generated_image,
    name=f"{prompt}",
    metadata={
        "prompt": prompt,
        "seed": seed,
        "num_inference_steps": 3
    }
)

exp.end()
```

#### Retrieve the experiment results
- Regardless of the environment that you are running in, you can retrieve the results of the experiment using the experiment tracking tool (Comet).

In [None]:
import io

reference_experiment = comet_ml.APIExperiment(
    workspace="ckaiser",
    project_name="4-diffusion-prompting",
    previous_experiment="b1b9e80bb0054b52a8914beec97d36a6"
)

reference_image = reference_experiment.get_asset_by_name(f"{prompt}")

-  Print the reference_image

In [None]:
plt.imshow(Image.open(io.BytesIO(reference_image)))

### Note: 
- We'll now explore different hyperparameters.
- As before, running the image generation code would take hours in the classroom environment (or in any environment with GPUs).

* Set up a different 'number of inference steps'.


```Python
exp = comet_ml.Experiment()

prompt = "a realistic green dragon"

exp.log_parameters({
    "seed": seed,
    "num_inference_steps": 100
})
```

```Python
output = sd_pipe(
  image=image,
  mask_image=image_mask,
  prompt=prompt,
  generator=generator,
  num_inference_steps=100,
)

generated_image = output.images[0]

exp.log_image(
    generated_image,
    name=f"{prompt}",
    metadata={
        "prompt": prompt,
        "seed": seed,
        "num_inference_steps": 100
    }
)

exp.end()
```

#### Retrieve the experiment results
- In the classroom or in any environment, you can retrieve the results of the image generation run by accessing the logs.

In [None]:
reference_experiment = comet_ml.APIExperiment(
    workspace="ckaiser",
    project_name="4-diffusion-prompting",
    previous_experiment="948c8e6cfd23420c86a0de5f65719955"
)

reference_image = reference_experiment.get_asset_by_name(f"{prompt}")

In [None]:
plt.imshow(Image.open(io.BytesIO(reference_image)))

#### Set up the different 'Guidance Scale' values.


- This code is best run on a GPU.  It's left as markdown in the classroom.

```Python
import numpy as np
guidance_scale_values = [x for x in np.arange(0, 21, 10)]
```

```Python
exp = comet_ml.Experiment()

prompt = "a realistic green dragon"

num_inference_steps = 100 #if torch.cuda.is_available() else 10

exp.log_parameters({
    "seed": seed,
})
```

- Pass the guidance_scale to this pipeline

```Python
for guidance_scale in guidance_scale_values:

    output = sd_pipe(
      image=image,
      mask_image=image_mask,
      prompt=prompt,
      generator=generator,
      num_inference_steps=num_inference_steps,
      guidance_scale=guidance_scale
    )

    generated_image = output.images[0]

    exp.log_image(
        generated_image,
        name=f"{prompt}",
        metadata={
            "prompt": prompt,
            "seed": seed,
            "num_inference_steps": num_inference_steps,
            "guidance_scale": guidance_scale
        }
    )

exp.end()
```

#### Retrieve the experiment results
- As before, regardless of whether you have access to GPUs or not, you can retrieve the results of the image generation code from the logs.

In [None]:
reference_experiment = comet_ml.APIExperiment(
    workspace="ckaiser",
    project_name="4-diffusion-prompting",
    previous_experiment="b34b94f94c594802b7090b6f2f1224f2"
)

reference_experiment.display(tab="images")

#### Set up another hyperparameter: 'strength'.

- Add the strength hyperparameter
- This code is best run on a GPU.  It's left as markdown in the classroom.

```Python
strength_values = [x for x in np.arange(0.1, 1.1, 0.2)]
```

```Python
exp = comet_ml.Experiment()

prompt = "a realistic green dragon"

num_inference_steps = 200 if torch.cuda.is_available() else 10

exp.log_parameters({
    "seed": seed,
})

```

```Python
for strength in strength_values:

    output = sd_pipe(
      image=image,
      mask_image=image_mask,
      prompt=prompt,
      generator=generator,
      num_inference_steps=num_inference_steps,
      strength=strength
    )

    generated_image = output.images[0]

    exp.log_image(
        generated_image,
        name=f"{prompt}",
        metadata={
            "prompt": prompt,
            "seed": seed,
            "num_inference_steps": num_inference_steps,
            "strength": strength
        }
    )

exp.end()
```

#### Retrieve the experiment results
- With experiment tracking, you can compare the most recent run with the earlier ones.

In [None]:
reference_experiment = comet_ml.APIExperiment(
    workspace="ckaiser",
    project_name="4-diffusion-prompting",
    previous_experiment="2964615a382d46f09c3a36c50c74deef"
)

reference_experiment.display(tab="images")

### Try adding a Negative Prompt.
- If you set the negative prompt to "cartoon", this is asking the image generation model to not generate an image that looks like a cartoon.
- Again, the image generation code is best run on a GPU.

```Python
exp = comet_ml.Experiment()

prompt = "a realistic green dragon"
negative_prompt = "cartoon"

num_inference_steps = 100 if torch.cuda.is_available() else 10

exp.log_parameters({
    "seed": seed,
})

```

```Python
output = sd_pipe(
  image=image,
  mask_image=image_mask,
  prompt=prompt,
  negative_prompt=negative_prompt,
  generator=generator,
  num_inference_steps=num_inference_steps,
  guidance_scale=10
)

generated_image = output.images[0]

exp.log_image(
    generated_image,
    name=f"{prompt}",
    metadata={
        "prompt": prompt,
        "seed": seed,
        "num_inference_steps": num_inference_steps,
        "guidance_scale": 10
    }
)

exp.end()
```

#### Retrieve the experiment results

In [None]:
reference_experiment = comet_ml.APIExperiment(
    workspace="ckaiser",
    project_name="4-diffusion-prompting",
    previous_experiment="f05b04ac203a4f9aa606ea6cf9417fa3"
)

reference_image = reference_experiment.get_asset_by_name(f"{prompt}")

plt.imshow(Image.open(io.BytesIO(reference_image)))


### Additional Resources
* For more on how to use Comet for experiment tracking, check out this [Quickstart Guide](https://colab.research.google.com/drive/1jj9BgsFApkqnpPMLCHSDH-5MoL_bjvYq?usp=sharing) and the [Comet Docs](https://www.comet.com/docs/v2/).
* This course was based off a set of two blog articles from Comet. Explore them here for more on how to use newer versions of Stable Diffusion in this pipeline, additional tricks to improve your inpainting results, and a breakdown of the pipeline architecture:
  * [SAM + Stable Diffusion for Text-to-Image Inpainting](https://www.comet.com/site/blog/sam-stable-diffusion-for-text-to-image-inpainting/)
  * [Image Inpainting for SDXL 1.0 Base Model + Refiner](https://www.comet.com/site/blog/image-inpainting-for-sdxl-1-0-base-refiner/)

# Lesson 5: Fine-Tuning

<p style="background-color:#fff6e4; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px"> ⏳ <b>Note <code>(Kernel Starting)</code>:</b> This notebook takes about 30 seconds to be ready to use. You may start and watch the video while you wait.</p>

* In this classroom, the libraries have been already installed for you.
* If you would like to run this code on your own machine, you need to install the following:
    ```
    !pip install -q accelerate torch diffusers transformers comet_ml
    ```

### Set up Comet

* Here you will use the [HuggingFace DreamBooth](https://huggingface.co/docs/diffusers/en/training/dreambooth) training.

In [None]:
import comet_ml

In [None]:
comet_ml.init(anonymous=True)

### Import and prepare the model

In [None]:
import torch

if torch.cuda.is_available():
    model_name = 'stabilityai/stable-diffusion-xl-base-1.0'
else:
    model_name = './models/runwayml/stable-diffusion-v1-5'

In [None]:
# Define hyperparameters

hyperparameters = {
    "instance_prompt": "a photo of a [V] man",
    "class_prompt": "a photo of a man",
    "seed": 4329,
    "pretrained_model_name_or_path": model_name,
    "resolution": 1024 if torch.cuda.is_available() else 512,
    "num_inference_steps": 50,
    "guidance_scale": 5.0,
    "num_class_images": 200,
    "prior_loss_weight": 1.0
}

* Set new **Comet** experiment

In [None]:
experiment = comet_ml.Experiment()

### Load images

In [None]:
from utils import DreamBoothTrainer

In [None]:
trainer = DreamBoothTrainer(hyperparameters)

#### Note
- The code that generates images requires a GPU to run.
- The code is left here in markdown, but if you have access to GPUs outside of the classroom, you can run it there.
- In the classroom, you'll still be able to follow along by retrieving the generated images from the experiment tracking tool (Comet).

```Python
# To run the training pipeline
trainer.generate_class_images()
```

```Python
# To see the content of generate_class_image
??trainer.generate_class_images
```

#### Get class images (using artifacts).

In [None]:
import shutil

In [None]:
# Get images
class_artifact = experiment.get_artifact('ckaiser/class-images-15')
class_artifact.download('./')

In [None]:
shutil.unpack_archive('./class.zip', './class')

>Note: the images referenced in this notebook have already been uploaded to the Jupyter directory, in this classroom, for your convenience. For further details, please refer to the **Appendix** section located at the end of the lessons.

In [None]:
# Print some images
trainer.display_images("class")

* Get the instance dataset (images of Andrew)

In [None]:
andrew_artifact = experiment.get_artifact('ckaiser/andrew-dataset')
andrew_artifact.download('./')

shutil.unpack_archive('./andrew-dataset.zip', './instance')

In [None]:
# Print some images
trainer.display_images("instance")

### Initialize the model
- It will take some time (several minutes) to initialize the model.

In [None]:
tokenizer, text_encoder, vae, unet = trainer.initialize_models()

> Note: see the video lesson for the LoRA explanation.

In [None]:
# Add noise to generate images in Stable Diffusion
from diffusers import DDPMScheduler

noise_scheduler = DDPMScheduler.from_pretrained(
    trainer.hyperparameters.pretrained_model_name_or_path,
    subfolder="scheduler"
)

In [None]:
unet = trainer.initialize_lora(unet)

In [None]:
optimizer, params_to_optimize = trainer.initialize_optimizer(unet)

In [None]:
# Initialize the datasets
train_dataset, train_dataloader = trainer.prepare_dataset(tokenizer, text_encoder)
lr_scheduler = trainer.initialize_scheduler(train_dataloader, optimizer)

In [None]:
unet, optimizer, train_dataloader, lr_scheduler = trainer.accelerator.prepare(
    unet, optimizer, train_dataloader, lr_scheduler)

In [None]:
total_batch_size = \
    trainer.hyperparameters.train_batch_size * \
    trainer.hyperparameters.gradient_accumulation_steps

#### Note
- Starting from this point, the code demonstrated by the instructor will not execute in this notebook due to computational resource constraints. However, we provide the code here for you to run if you have access to a GPU or similar resources.
- Thank you for your understanding as we work to provide free and accessible courses.

```Python
from tqdm import tqdm

global_step = 0
epoch = 0

progress_bar = tqdm(
    range(0, trainer.hyperparameters.max_train_steps),
    desc="Steps"
)
```

```Python
for epoch in range(0, trainer.hyperparameters.num_train_epochs):
    unet.train()

    for step, batch in enumerate(train_dataloader):
        with trainer.accelerator.accumulate(unet):
            pixel_values = batch["pixel_values"].to(dtype=vae.dtype)
            model_input = vae.encode(pixel_values).latent_dist.sample()
            model_input = model_input * vae.config.scaling_factor

            noise = torch.randn_like(model_input)
            bsz, channels, height, width = model_input.shape

            timesteps = torch.randint(
                0,
                noise_scheduler.config_num_train_timesteps,
                (bsz,),
                device=model_input.device
            )

            timesteps = timesteps.long()
            noisy_model_input = noise_scheduler.add_noise(
                model_input,
                noise,
                timesteps
            )

            encoder_hidden_states = batch["input_ids"]

            model_predict = unet(
                noisy_model_input,
                timesteps,
                encoder_hidden_states,
                return_dic=False,
            )[0]

            target = noise

            model_pred, model_pred_prior = torch.chunk(model_pred, 2, dim=0)
            target, target_prior = torch.chunk(target, 2, dim=0)

            instance_loss = \
                F.mse_loss(
                    model_pred.float(),
                    target.float(),
                    reduction="mean"
                )
            
            prior_loss = \
                F.mse_loss(
                    model_pred_prior.float(),
                    target_prior.float(),
                    eduction="mean"
                )
            
            loss = \
                instance_loss + \
                trainer.hyperparameters.prior_loss_weight * \
                prior_loss
            
            trainer.accelerator.backward(loss)
            optimizer.step()
            lr_scheduler.step()
            optimizer.zero_grad()
            global_step +=1

        loss_metrics = {
            "loss": loss.detach().item,
            "prior_loss": prior_loss.detach().item,
            "lr": lr_scheduler.get_last_lr()[0],
        }

        experiment.log_metrics(loss_metrics, step=global_step)

        progress_bar.set_postfix(**loss_metrics)
        progress_bar.update(1)


        if global_step >= trainer.hyperparameters.max_train_steps:
            break

    trainer.save_lora_weights(unet)
experiment.add_tag(f"dreambooth-training")
experiment.log_parameteres(trainer.hyperparameters)
trainer.accelerator.end_training()
```

#### Retrieve the training results
- You can get the training results using the experiment tracking tool, Comet.

In [None]:
training_experiment = \
    comet_ml.APIExperiment(
        previous_experiment="d92519b1f657497e8569a2c8e989b457"
    )


In [None]:
# See the experiment
training_experiment.display()


* Prompts to generate images of Andrew.

In [None]:
prompts = [
    "a photo of a [V] man playing basketball",
    "a photo of a [V] man riding a horse",
    "a photo of a [V] man at the summit of a mountain",
    "a photo of a [V] man driving a convertible",
    "a photo of a [V] man riding a skateboard on a huge halfpipe",
    "a mural of a [V] man, painted by graffiti artists"
]

validation_prompts = [
    "a photo of a man playing basketball",
    "a photo of a man riding a horse",
    "a photo of a man at the summit of a mountain",
    "a photo of a man driving a convertible",
    "a photo of a man riding a skateboard on a huge halfpipe",
    "a mural of a man, painted by graffiti artists"
]

#### Note
- The folowing code requires GPUs.

```Python
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.load_lora_weights("./andrew-model")

for prompt in prompts:
    with torch.no_grad():
        images = pipeline(
            prompt = prompt,
        ).images

        experiment.log_image(images[0], metadata={
            "prompt": prompt,
            "model": hyperparameters.pretrained_model_name_or_path,
        })

for prompt in validation_prompts:
    with torch.no_grad():
        images = pipeline(
            prompt=prompt,
        ).images

    experiment.log_image(images[0], metadata={
            "prompt": prompt,
            "model": hyperparameters.pretrained_model_name_or_path,
        })
```

#### Retrieve the image generation results
- You can view the results of image generation regardless of whether you have access to GPUs, using the experiment tracking tool.

In [None]:
inference_experiment = comet_ml.APIExperiment(
        previous_experiment="0eb292126ab5476ab0c863061a400bdc"
    )


In [None]:
# See the experiment
inference_experiment.display(tab="images")


### Additional Resources
* For more on how to use Comet for experiment tracking, check out this [Quickstart Guide](https://colab.research.google.com/drive/1jj9BgsFApkqnpPMLCHSDH-5MoL_bjvYq?usp=sharing) and the [Comet Docs](https://www.comet.com/docs/v2/).
* This course was based off a set of two blog articles from Comet. Explore them here for more on how to use newer versions of Stable Diffusion in this pipeline, additional tricks to improve your inpainting results, and a breakdown of the pipeline architecture:
  * [SAM + Stable Diffusion for Text-to-Image Inpainting](https://www.comet.com/site/blog/sam-stable-diffusion-for-text-to-image-inpainting/)
  * [Image Inpainting for SDXL 1.0 Base Model + Refiner](https://www.comet.com/site/blog/image-inpainting-for-sdxl-1-0-base-refiner/)

## Did you like this course?

- If you liked this course, could you consider giving a rating and share what you liked? 💕
- If you did not like this course, could you also please share what you think could have made it better? 🙏

#### A note about the "Course Review" page.
The rating options are from 0 to 10, and used to calculate the "Net Promoter Score"
- A score of 9 or 10 means you like the course.💫 💕
- A score of 7 or 8 means you feel neutral about the course (neither like nor dislike). 🙄
- A score of 0,1,2,3,4,5 or 6 all mean that you do not like the course. 😭 