# Tutorial 07: Bayesian optimization

In this notebook we'll take a first shot at reimagining CVPR2024's *Overload: Latency Attacks on Object Detection for Edge Devices* by Chen *et al* as a physical patch attack. "Overload" exploits the quadratic scaling of non-maximum suppression to try and slow the postprocessing phase of inference down by generating a lot of overlapping detections.

We'll set up a pipeline for training a ground patch to generate false positives- but since what we really want is to slow NMS time, we'll run the pipeline iteratively and train a Gaussian process to predict the best parameter combinations for patch training.

In [None]:
import numpy as np
import pandas as pd
import torch
import ultralytics
import time

In [None]:
import electricmayhem.whitebox as em

In [None]:
COCO_CLASSES = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat',
                'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench','bird', 'cat',
                'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack',
                'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
                 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
                'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
                 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair',
                'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
                'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book',
                'clock', 'vase', 'scissors', 'teddy bear', 'hair drier','toothbrush']

## create

I'm starting with the guess that a tiled patch would be a cheap way to create minimally-overlapping detections (so that there are more detections that other detections have to be compared against).

In [None]:
tile_size = 256
patch_size = 64

In [None]:
tiler = em.PatchTiler({"ground":(tile_size, tile_size)})

In [None]:
proofer = em.SoftProofer("data/profile.icc")

## implant

Reuse the same target dataset from tutorial 01, but just the ground patch this time.

In [None]:
labels = pd.read_csv("data/toycar/toycar_warp_dataset.csv")
labels = labels[labels.patch == "ground"]
len(labels)

In [None]:
labels.head()

In [None]:
patch_shapes = {k:(3,patch_size, patch_size) for k in ['ground']}
imp = em.WarpPatchImplanter(labels, patch_shapes=patch_shapes, dataset_name="toycar_warp_only_ground")

## compose

In [None]:
aug = em.KorniaAugmentationPipeline({"ColorJiggle":{"brightness":0.2, "contrast":0.2, "hue":0.1, "saturation":0.1},
                                    "RandomAffine":{"scale":(0.9,1.1), "shear":10, "padding_mode":"reflection", "degrees":0}})

## infer

For this test I'll just use one model. Feel free to make this as complicated as you want.

In [None]:
yolov8n = ultralytics.YOLO("yolov8n.pt").model.eval()

One slight change to `em.YOLOWrapper` compared to other notebooks- let's raise the `iouthresh` kwarg, which **only** effects visualizations, so we can see how many boxes are actually created.

In [None]:
yolo = em.YOLOWrapper(yolov8n, yolo_version=8, classnames=COCO_CLASSES, iouthresh=1.)

## assemble the pipeline

Take all of the steps we built above and assemble into a `Pipeline` object:

In [None]:
pipeline = tiler+proofer+imp+aug+yolo

## Write a loss function

Note that in this case, success is when the patch is detected **above** 0.25 (the default minimum) instead of below. 

I think there are likely better loss terms than the ones I chose here; these are literally the first two things I thought of:

* Finding the max detection score of each batch/box index, and taking the average value of `1-score`
* Same as above but capping the max detection score at 0.3, so once a particular box is enough to get detected we don't try to get it higher

If I really wanted to get an Overload patch working, this is probably where I'd spend some time.

In [None]:
threshold = 0.3

def loss(output, **kwargs):
    maxdetect_boxes = output[0][:,:,4] # (batch, num_boxes)
    maxdetect = torch.max(maxdetect_boxes, 1)[0]  # (batch,)

    inverse_maxdetect = torch.mean(1-maxdetect_boxes, -1)
    hard_threshold = torch.mean(1 - torch.minimum(maxdetect_boxes, torch.tensor(threshold)), -1)    

    # how many boxes per image above the default detection threshold? this is a handy
    boxcount = torch.sum((maxdetect_boxes >= 0.25).type(torch.float32), -1)

    # run NMS, collecting a timestamp before and after so we can estimate the wall time. this is probably
    # not a perfect estimate of NMS performance on a mobile device but the scaling should be the same.
    with torch.no_grad():
        detects = output[0].permute(0,2,1) # (batch, 5+num_classes, num_boxes)
        detects = torch.concatenate([detects[:,:4,:], detects[:,5:,:]],1) # (batch, 4+num_classes, num_boxes)
        t0 = time.time()
        nms = ultralytics.utils.ops.non_max_suppression(detects, conf_thres=0.1)
        t1 = time.time()

    outdict = {
        "inverse_maxdetect":inverse_maxdetect,
        "hard_threshold":hard_threshold,
        "boxcount":boxcount,
        "nms_time":(t1-t0)*torch.ones_like(maxdetect)
    }
    return outdict

Pass the loss function to your pipeline along with a dictionary giving the shapes of a batch of test patches, so it can check the inputs/outputs before you start training:

In [None]:
pipeline.set_loss(loss, test_patch_shape={k:(2,3,patch_size, patch_size) for k in ['ground']})

## Train the patches

The `pipeline.optimize()` method takes in most of the inputs you'd use to prepare a patch for training normally- logging directory, MLFlow location and experiment, and patch dimensions for initialization.

After that, when you specify keyword arguments for training, you can replace any of them with an interval to tell `electricmayhem` to include them in a Gaussian process. The different ways you can specify variables to optimize:

* A tuple with low and high values (`hard_threshold=(0,1)`)
* A tuple with low and high values and "log" to sample on a logarithmic scale (`learning_rate=(1e-4, 1e-1, "log"),`)
* A tuple with low and hight values and "int" to sample integer values (`accumulate=(1, 25, "int"),`)
* A list of categorical values, such as for optimizer or LR decay (`optimizer=["adam", "mifgsm"],`)

In [None]:
pipeline.cuda();

In [None]:
pipeline.optimize(
    "nms_time", # this is the objective for the black-box optimization loop
    "logs_latency_attack/",
    {"ground":(3,patch_size, patch_size)},
    1000, # number of experiments (I stopped after a few dozen)
    2500, # budget of steps per experiment
    24, # batch size
    num_eval_steps=100,
    mlflow_uri="http://127.0.0.1:5000",
    experiment_name="electricmayhem_tutorial_07_bayesian_optimization_2",
    extra_params={"tile_size":tile_size, "pach_size":patch_size},
    minimize=False,
    learning_rate=(1e-4, 1e-1, "log"),
    lr_decay="cosine",
    optimizer=["adam", "mifgsm"],
    inverse_maxdetect=(0,1),
    hard_threshold=(0,1)
)

When training the patch- the loss function will return two `maxdetect` terms, one for each model, so we'll need to specify weights for each explicitly:

In [None]:
import ultralytics

In [None]:
ultralytics.utils.ops.non_max_suppression?

In [None]:
#ultralytics.utils.ops.non_max_suppression?

In [None]:
patch

In [None]:
patch["ground"].shape

In [None]:
foo, _ = tiler({"ground":patch["ground"].unsqueeze(0)}, evaluate=True)

In [None]:
em.plot(patch["ground"])

In [None]:
em.plot(foo["ground"])

In [None]:
foo["ground"].shape

In [None]:
import numpy as np
import matplotlib.pyplot as plt


In [None]:
def sig(x):
    return 1/(1+np.exp(-x))

In [None]:
x = np.linspace(-10,10,100)
plt.plot(x, sig(x));

In [None]:
torch.minimum?