# Tutorial 01: quickstart

Let's train a quick set of black-and-white patches to disguise a toy car from an YOLO model pretrained on MSCOCO.

In [None]:
import numpy as np
import pandas as pd
import torch
import ultralytics

In [None]:
import electricmayhem.whitebox as em

In [None]:
COCO_CLASSES = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat',
                'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench','bird', 'cat',
                'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack',
                'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
                 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
                'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
                 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair',
                'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
                'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book',
                'clock', 'vase', 'scissors', 'teddy bear', 'hair drier','toothbrush']

## create

We're going to train single-channel patches, but are embedding in RGB images. We'll start the pipeline with `em.PatchStacker()` which will stack the patch into 3 channels:

In [None]:
stacker = em.PatchStacker(num_channels=3)

## implant

We'll need to start with a dataset of target images in a pandas `DataFrame`. Each row will contain information on where to implant one patch in a target image (so if we're training multiple patches, one target image may take up several rows in the `DataFrame`. The dataset will have columns:

* **image:** a path to the target image. Assumes all target images have the same dimensions.
* **ulx**, **uly**, **llx**, **lly**, **urx**, **ury**, **lrx**, **lry:** pixel coordinates giving the corners of the patch within the image ("upper left x", etc)
* **patch:** name of the patch (omit this if you're only training one)
* **split:** whether this image is "train" or "eval". If you omit this column all target images will be used for train **and** eval. I'd strongly recommend against that though!

This particular dataset contains 4 patches that were localized using aruco tags and painted out using Stable Diffusion. For this tutorial we'll only use the 3 patches on the car and skip the ground patch.

In [None]:
labels = pd.read_csv("data/toycar/toycar_warp_dataset.csv")
labels = labels[labels.patch != "ground"]
len(labels)

In [None]:
labels.head()

Names of the 3 patches we'll train:

In [None]:
labels.patch.unique()

The `em.WarpPatchImplanter()` class will take care of differentiably deforming and implanting patches (with kornia doing most of the heavy lifting). We need two inputs:

* the `DataFrame` of target labels
* a dictionary of patch shapes (at the point of implanting, so they'll be 3-channel); the implanter will use this to precompute transformation matrices

In [None]:
patch_shapes = {k:(3,64,64) for k in ['hood', 'roof', 'door']}
imp = em.WarpPatchImplanter(labels, patch_shapes=patch_shapes, dataset_name="toycar_warp_no_ground")

Quick visual check:

In [None]:
imp.plot_boxes()

## compose

The main tool `electricmayhem` has so far is `em.KorniaAugmentationPipeline()`, which just wraps the `kornia.augmentation` API. Initialize it with a dictionary of image augmentations, where each value is the keyword arguments that augmentation takes.

In [None]:
aug = em.KorniaAugmentationPipeline({"ColorJiggle":{"brightness":0.2, "contrast":0.2, "hue":0.1, "saturation":0.1},
                                    "RandomAffine":{"scale":(0.9,1.1), "shear":10, "padding_mode":"reflection", "degrees":0}})

## a quick visual check

Let's put together the start of our pipeline, just using the create/implant/compose steps, and run some patches through it.

First create the pipeline:

In [None]:
pipeline_start = stacker+imp+aug
type(pipeline_start)

Then initialize a dictionary of **batches** of patches (in this case batchsize=1):

In [None]:
patches = {k:0.5*torch.ones((1,1,64,64)).type(torch.float32) for k in ['hood', 'roof', 'door']}

Pass the inputs through the pipeline. It will a return a tuple containing the output (a batch of implanted images) and a dictionary of additional information (not much there right now):

In [None]:
output, kwargs = pipeline_start(patches)

In [None]:
em.plot(output[0])

If we run the patches through in `evaluate` mode the results should look similar- but will be pulled from a different set of target images:

In [None]:
output_eval, kwargs = pipeline_start(patches, evaluate=True)
em.plot(output_eval[0])

And if we run the patches through in `control` mode it should repeat the last batch **exactly**, but without the patch implanted:

In [None]:
output_control, kwargs = pipeline_start(patches, evaluate=True, control=True)
em.plot(output_control[0])

## infer

When you load a model using the `ultralytics` library, the `ultralytics.models.yolo.model.YOLO` object it returns isn't really designed for doing adversarial attacks; we want to pull one of the lower-level objects out. 

* The `YOLO` object contains a `model` attribute that we can compute gradients through
* Make sure you set it to `eval()` mode before training, to freeze the batchnorm layers and make sure outputs are in the appropriate format

The `em.YOLOWrapper()` class reformats outputs from different YOLO versions into the v5 format; a list where the first element is:

* `[batch_size, num_boxes, 5+num_classes]`

Where the final dimension is `[x, y, w, h, objectness, score_class1, score_class2, ...]` and box coordinates are in pixels.

Some newer versions (like the `ultralytics` versions we're using here) output a different format and omit the objectness score, so the wrapper will compute it as the highest-value class score for each batch index/box index combination.

In [None]:
model = ultralytics.YOLO("yolov8n.pt")

We don't need the class names for training, but it'll make our tensorboard logs more interpretable:

In [None]:
yolo = em.YOLOWrapper(model.model.eval(), yolo_version=8, classnames=COCO_CLASSES)

## assemble the pipeline

Take all of the steps we built above and assemble into a `Pipeline` object:

In [None]:
pipeline = stacker+imp+aug+yolo

## Write a loss function

Loss and metrics need to be adapted fairly closely to the specific problem you're solving, so this part happens outside the `electricmayhem` API. Write a Python function that inputs the pipeline outputs (a list containing a rank-3 tensor) and has a `**kwargs` option (where any extra information generated by your pipeline stages will be available).

The outputs should be a dictionary where each key is a metric or a term in your loss function, and each value is the **unaggregated** value (so each should be a 1D tensor of length `batch_size`).

In [None]:
def loss(output, **kwargs):
    # output[0] should be (batch, num_boxes, 5+num_classes)
    # pull out max detection score across classes for every batch element and box
    maxdetect_boxes = output[0][:,:,4] # (batch, num_boxes)
    maxdetect = torch.max(maxdetect_boxes, 1)[0]  # (batch,)
    # let's also compute an ASR at 25%
    asr25 = (maxdetect < 0.25).type(torch.float32)

    return {"maxdetect":maxdetect, "asr25":asr25}

Pass the loss function to your pipeline along with a dictionary giving the shapes of a batch of test patches, so it can check the inputs/outputs before you start training:

In [None]:
pipeline.set_loss(loss, test_patch_shape={k:(2,1,64,64) for k in ['hood', 'roof', 'door']})

## Train the patch

We'll do this in four steps:

1. Tell the pipeline where to log diagnostics (for TensorBoard) and metrics (in MLFlow)
2. Initialize your patches
3. Copy the pipeline to your GPU
4. Train the actual patch!

For the first step- start an MLFlow server if you need to by running `mlflow server` from the command line. 

In [None]:
pipeline.set_logging(logdir="logs/01_quickstart",
                    mlflow_uri="http://127.0.0.1:5000",
                    experiment_name="electricmayhem_tutorial_01_quickstart")

Second, explicitly tell it to initialize the patches. If you want you could alternatively pass it a dictionary of patches pre-initialized to whatever you want.

In [None]:
pipeline.initialize_patch_params(patch_shape={k:(1,64,64) for k in ['hood', 'roof', 'door']})

All of our classes inherit from `torch.nn.Module` so this should look familiar:

In [None]:
pipeline.cuda();

Now, train the patch. The first two arguments of `pipeline.train_patch()` are the batch size and number of training steps.

* `eval_every` and `num_eval_steps` sets how many steps between calling `pipeline.evaluate()` and how many eval batches to run each time.
* `optimizer` can take values `'adam'`, `'bim'` (basic iterative method/iterative FGSM), or `'mifgsm'` (momentum-iterative FGSM)
* Weights for every term in your loss function are set to zero unless specified here (so you have to specify at least one). That's the `maxdetect=1` below.

In [None]:
patch = pipeline.train_patch(
    20,
    1000,
    learning_rate=0.01, 
    eval_every=10, #100,
    num_eval_steps=10,
    optimizer='adam',
    lr_decay='cosine',
    maxdetect=1.,
)

`pipeline.train_patch()` returns a dictionary containing the trained patches (though they'll all be saved as MLFlow artifacts too, just to be safe).

The final evaluation results will be available in `pipeline.df` (also in MLFlow):

In [None]:
pipeline.df.head()

We can use these results to try and drill down on which factors in our pipeline impact patch performance:

In [None]:
em.viz.eval_result_permutation_importance(pipeline.df, "asr25_delta");