## Bounding Box model

The concept and data is from this kaggle kernel: [Bounding Box Model](https://www.kaggle.com/martinpiotte/bounding-box-model)

Also from this notebook: [github](https://github.com/radekosmulski/whale/blob/master/fluke_detection_redux.ipynb)

The idea is standardize the image focus and to make it easier for the classification model to recognize whale id. The data provided was manually created by placing landmarks on the whale tail and using the maximum value of them to create the the cropping location. 

There are 1200 bounding box samples. The data is not from this competition but from the [playground](https://www.kaggle.com/c/whale-categorization-playground) 

Make sure to make to upgrade your fastai. I was using 1.0.38 & 1.0.39 which was giving errors when creating the databunch until it was update to 1.0.41. error: ' can't convert np.ndarray of type numpy.object_.'

In [1]:
import fastai
from fastai import *
from fastai.vision import *
print(f'fastai version: {fastai.__version__}')
print(f'torch version: {torch.__version__}')

verbose = True  # print out extra details?

# import matplotlib.patches.Path
from matplotlib.patches import Rectangle
%matplotlib inline

import json
import warnings
warnings.filterwarnings('ignore')
# to stop fastai from printing out "UserWarning: Tensor is int32: upgrading to int64;"

fastai version: 1.0.41
torch version: 1.0.0


ModuleNotFoundError: No module named 'pynvml'

In [None]:
# global setting
data_fp = Path('data')
data_train = data_fp/'train'
data_playground = data_fp/'train_playground'
data_test = data_fp/'train_playground'  # this should be change to train for the cropping
crop_fp = data_fp/'cropping.txt'

In [None]:
bs = 16
num_workers = 3  # set to zero when using kaggle kernel. It crashing the kernel if not
sz = 224 ## resize images

## Look into the cropping dataset

In [None]:
with open(crop_fp, 'rt') as f:
    crop_ls = [r.split(',') for r in f.read().split('\n') if len(r.split(',')) > 1]

In [None]:
crop_data = [(img, [(int(coords[i]), int(coords[i+1])) for i in range(0, len(coords), 2)]) 
                                                         for img, *coords in crop_ls]

set([len(r[1]) for r in crop_data])

There are varying number of point pairs for each image ranging from 4 to 11 points. To obtain a bounding box, one need to get the min, max value of the x and y axis. 

Each step converting data into the right format to use
```python
    text file: "image_filename, x1, y1, x2, x3, ... \n .....]
    crop_ls: [image_filename, x1, y1, x2, x3, ...]
    crop_data: [image_filename, [(x1, y1), (x2, y2), ..]]
```

In [None]:
crop_data[0]

In [None]:
def get_bbox(coords):
    x, y = [x for x,_ in coords], [y for _,y in coords]
    xmin, xmax= min(x), max(x)
    ymin, ymax = min(y), max(y)
    # lower left corner, width and height
    return xmin, ymin, xmax-xmin, ymax-ymin

def draw_bbox(box):
    return Rectangle((box[0], box[1]), box[2], box[3],
                     linewidth=1, edgecolor='r', facecolor='none')

def img_bbox(data):
    img = PIL.Image.open(data_playground/data[0])
    _, ax = plt.subplots(figsize=(10,10))
    ax.imshow(img)
    ax.axis('off')
    ax.scatter([x for x,_ in data[1]], [y for _,y in data[1]], marker='o', c='r')
    ax.add_patch(draw_bbox(get_bbox(data[1])))
    ax.set_title(data[0])
    plt.show()

In [None]:
# bbox is created from the landmarks
img_bbox(crop_data[0])

## Convert data into Coco dataset format and then to fastai format


**Coco format**
```json
{
    "categories": [
        {"id": 0, "name": "whale"},
        {"id": 1, "name": "placeholder"}
    ],
    "images": [
        {"id": 1000, "file_name": "whale1.jpg"},
        {"id": 1001, "file_name": "whale2.jpg"}
    ],
    "annotations": [
        {"image_id": 1000, "bbox": [x, y, width, height], "category_id": 0},
        {"image_id": 1001, "bbox": [x, y, width, height], "category_id": 0}
    ]
}
```

**Fastai Format** for multiple objects in an image
```python
[
    [image_fn, image_fn],
    [[
        [[[top, left, bottom, right], 
          [[top, left, bottom, right]], 
         ['whale', 'whale']],
        [[[[top, left, bottom, right], 
          [[top, left, bottom, right]], 
         ['whale', 'whale']]
    ]]  
]
```

In [None]:
images, annotations = [], []
for i, v in enumerate(crop_data, start=1000):
    images.append({"id": i, "file_name": v[0]})
    annotations.append({"image_id": i, "bbox": get_bbox(v[1]), "category_id": 0})

categories = [{"id": 0, "name": "whale"}]

coco_whale = {"categories": categories,
              "images": images,
              "annotations": annotations}

with open("data/coco_whale.json", "w+") as f:
    json.dump(obj=coco_whale, fp=f, indent=4)

In [None]:
del images, annotations, categories, coco_whale, i, v, crop_data, crop_fp, crop_ls

## Testing that the conversion was done corrected

In [None]:
tmp_images, tmp_lbl_bbox = get_annotations('data/coco_whale.json')
len(tmp_images)

coco_dataset format is (x, y, width, height)

fastai expect (y_upper_left, x_upper_left, y_lower_right, x_lower_right) with the origin in the upper left hand corner of the image. 

In [None]:
img = open_image(Path('data/train_playground')/tmp_images[0])
print(f'BBox coords: {tmp_lbl_bbox[0][0]}')
bbox = ImageBBox.create(*img.size, tmp_lbl_bbox[0][0])
img.show(y=bbox, figsize=(10,10))

In [None]:
del tmp_images, tmp_lbl_bbox, img, bbox

## Create DataBunch with Coco Format

In [None]:
images, lbl_bbox = get_annotations('data/coco_whale.json')
img2bbox = dict(zip(images, lbl_bbox))
get_y_func = lambda o: img2bbox[Path(o).name]

In [None]:
if verbose: lbl_bbox[:2]

In [None]:
tfm = get_transforms(flip_vert=False, 
                     # doesn't make sense to have upside down tails
                     max_rotate=0.3,
                      # rotating too much will cause the bbox to be super large and not accurate
                     max_zoom=1)
                     # remove zooming 
    
if verbose: # show the list of transformation
    for i in tfm:
        for j in i: print(j)
        print("\n")

In [None]:
# class OneObjectCategoryList(ObjectCategoryList):
#     def analyze_pred(self, pred): return [pred.unsqueeze(0), torch.zeros(1).long().unsqueeze(0)]
# class ObjectItemListOne(ImageItemList):
#     _label_cls,_square_show_res = OneObjectCategoryList,False

# It might just be because windows is not supported that is why i am getting the [Errno 32] Broken pipe
# when ever I do data.show_batch

In [None]:
data = (ObjectItemList.from_df(pd.DataFrame(data=images), path=data_fp, folder='train_playground')
        .random_split_by_pct(seed=52)                          
        #How to split in train/valid? -> randomly with the default 20% in valid
        .label_from_func(get_y_func)
        #How to find the labels? -> use get_y_func
        .add_test_folder('test')  # TODO change to actual competition data
        .transform(get_transforms(), 
                   tfm_y= True, 
                   size=sz, 
                   resize_method=ResizeMethod.SQUISH,
                   padding_mode='border')
        #Data augmentation? -> Standard transforms with tfm_y=True
        .databunch(bs=bs, collate_fn=bb_pad_collate, num_workers=num_workers)   
        #Finally we convert to a DataBunch and we use bb_pad_collate
        .normalize(imagenet_stats))

In [None]:
data.test_ds.tfm_y = False  # test set has no y value so no transformation for it

In [None]:
len(data.test_ds)

In [None]:
idx = 65
fig, axes = plt.subplots(3,3, figsize=(9,9))
for i, ax in enumerate(axes.flat):
    img = data.train_ds[idx]
    # image is augmented each time it is retrived
    img[0].show(y=img[1], ax=ax)

In [None]:
data.show_batch(rows=2)

## For troubleshooting. Remove when ready. 

In [None]:
data.valid_ds.y

In [None]:
data.test_ds.y[0].data

In [None]:
count = 1
for i in data.test_ds.y:
    count += 1
    if count % 100 == 0: print(i)

In [None]:
count = 1

In [None]:
count += 1

In [None]:
count %

## Training

First attempt with resnet18 with a simple custom head of nn.Sequential(Flatten(), nn.Linear(25088,4)) did not have good result: the xaxis are out of the range [-1,1] and are consistently around -2.8 and 1.5. the y axis is two narrow. I believe the reason is because the bbox for the xaxis is consistently at the edge of the image and so being outside is reasonable.

Second attempt:
TODO:
* Increase the complexity of the custom head with some non-linear features
* new loss function
* new metrics
* more augmentation
* Use a larger resnet
* increase bs
* using  fit_one_cycle instead of fit


In [None]:
# L1Loss is used instead of MSE is because MSE penalize mistake more than it should 
def loss_func(preds, targs, class_idx, **kwargs):
    return nn.L1Loss()(preds, targs.squeeze())

In [None]:
head_reg4 = nn.Sequential(
    Flatten(), 
    nn.ReLU(),
    nn.Dropout(0.5),
    nn.Linear(25088,256),
    nn.ReLU(),
    nn.BatchNorm1d(256),
    nn.Dropout(0.5),
    nn.Linear(256, 4))
    # Maybe add nn.tanh since the values are [-1,1]
learn = create_cnn(data=data, arch=models.resnet18, pretrained=True, custom_head=head_reg4,
#                    model_dir = '/tmp/models'  ## For kaggle kernel 
                  )
learn.loss_func = loss_func
# change the loss function??

In [None]:
if verbose: print(learn.summary())

In [None]:
learn.lr_find()
learn.recorder.plot()

In [None]:
learn.fit_one_cycle(15, pct_start=0.5)

In [None]:
learn.recorder.plot_losses()

In [None]:
learn.unfreeze()

In [None]:
learn.lr_find()
learn.recorder.plot()

In [None]:
learn.fit_one_cycle(15, max_lr = slice(0.001, 0.001/5), pct_start=0.5)

In [None]:
learn.save('bounding-box-model', return_path=True)

## Check Validation Set

In [None]:
# implement different loss function like detn_l1
    # IoU??
preds, targs = learn.get_preds(ds_type=DatasetType.Valid)
targs = targs.squeeze()  # fastai outputs multiple objects per image but we only have 1
# making sure the preds values are within the picture
preds = torch.clamp(preds, -1,1)

In [None]:
# check to see what the output looks like
if verbose:
    print(preds.shape, targs.shape)
    print(preds[:2])
    print(targs[:2])

In [None]:
np.random.seed(24)
n = 10  # look at n samples, must be even
idxs = np.random.randint(0,len(data.valid_ds), size=n)
_, axes = plt.subplots(nrows=n//2, ncols=2, figsize = (n, n*2))
for i, ax in zip(idxs, axes.flat):
    img = data.valid_ds[i][0].data  # image resize after data is called else original image size
    img_name = Path(data.valid_ds.items[i]).name
    img_size = img.shape[1:]
    targ, pred = targs[i], preds[i]
    if verbose: print(f'target: {targ}, pred: {pred}')
    Image(img).show(ax=ax,
                    # target is white
                     y=ImageBBox.create(*img_size, 
                                        bboxes=targ.unsqueeze(0),
                                        scale=False),
                    title=img_name)
    # Prediction is red
    ImageBBox.create(*img_size, 
                     bboxes=pred.unsqueeze(0),
                     scale=False).show(ax=ax, color='red')

In [None]:
# TODO: Display the ones that are most way off. 
## use the custom loss function

## Crop image based on pred (not ready)

In [None]:
learn.data = data

In [None]:
cropped_fp = data_fp/"train-crop-224"
cropped_fp.mkdir(parents=True, exist_ok=True) # save crop images to reduce computation

In [None]:
test_fp = data_fp/'train_playground'
files = get_files(test_fp)

In [None]:
res = learn.get_preds(ds_type=DatasetType.Test)

In [None]:
%debug

In [None]:
tmp = iter(learn.data.test_dl.batch_sampler)

In [None]:
type(learn.data.test_ds)

In [None]:
learn.pred_batch(ds_type=DatasetType.Test)

In [None]:
learn.pred_batch??

In [None]:
learn.data.num_workers = 0

In [None]:
learn.data.test_dl.num_workers

In [None]:
tmp = learn.data.one_batch(DatasetType.Test)

In [None]:
%debug

In [None]:
tmp = learn.data.valid_ds[0][1]

In [None]:
tmp.data[1]

In [None]:
learn.pred_batch(ds_type=DatasetType.Test, batch=([tmp[0].data],[tmp[1]]))

In [None]:
tmp = learn.data.valid_ds[0]

In [None]:
[tmp[1]]*2

In [None]:
tmp[1].data

In [None]:
ImageBBox.create(224, 224, bboxes=[[-.5,-.5,.5,.5]], scale=False, labels=[0], classes=['whale'])

In [None]:
learn.data.test_ds.y = [tmp]* 7960

In [None]:
type(learn.data.valid_ds.y)

In [None]:
type(learn.data.test_ds)

In [None]:
learn.data.valid_ds.y.new([ImageBBox.create(224, 224, bboxes=[[-.5,-.5,.5,.5]], scale=False, labels=[0], classes=['whale'])]*10)

In [None]:
tmp = learn.get_preds(ds_type=DatasetType.Valid)

In [None]:
res = learn.get_preds(ds_type=DatasetType.Test)

In [None]:
tmp = learn.data.one_batch(DatasetType.Test)

In [None]:
cropped_fp.name

In [None]:
img = open_image('data\\train_playground\\.\\9c855d38.jpg')

In [None]:
print(img.size)
print(bbox.data)
img

In [None]:
(bbox.data[0]+1) * torch.tensor([img.size[0]//2,img.size[1]//2]*2).float()

In [None]:
img.show( y=ImageBBox.create(*img.size, 
                             bboxes=(bbox.data[0]+1) * torch.tensor([img.size[0]//2,img.size[1]//2]*2).float(), 
                             scale=True) )

In [None]:
(bbox.data[0]+1).shape

In [None]:
torch.tensor([[1.,2.]]).shape

In [None]:
(bbox.data[0]+1).squeeze().unsqueeze(1) * torch.tensor([[1.,2.]]).squeeze().unsqueeze(1)

In [None]:
torch.empty(5,3,4,1).shape

In [None]:
torch.empty(  3,1,1).shape

In [None]:
img, bbox = data.valid_ds[89]
print(img.size)
img.show(y=bbox)

In [None]:
print(img.data.size())

In [None]:
crop_bbox = ((bbox.data[0]+1)*112).int().squeeze().numpy(); crop_bbox

In [None]:
crop_img = Image(img.data[:, crop_bbox[0]:crop_bbox[2], crop_bbox[1]:crop_bbox[3]])
print(crop_img)
crop_img

In [None]:
crop_img.resize(224)

In [None]:
crop_pad(Image(img.data[:, crop_bbox[0]:crop_bbox[2], crop_bbox[1]:crop_bbox[3]]), 224, 'zeros')

In [None]:
crop_bbox[0], crop_bbox[2], crop_bbox[1],crop_bbox[3]

In [None]:
i

In [None]:
crop_pad(img, 224, 'reflection', row_pct=, col_pct=)

In [None]:
crop_pad??  ## resize_method=ResizeMethod.PAD, padding_mode='reflection'  Do not wnat continue to squish the image

In [None]:
Image(img)

In [None]:
Image(img[:, 0:112, 0:112])

In [None]:
type(preds)

In [None]:
pd.DataFrame(data = preds.numpy()).to_csv('testing.csv')

In [None]:
# TODO: Export the model as well so people do not need to retrain