# Use Case #1

## Finding related images without pretraining, on toy data

The notebook below builds an MVP for this simple use case.

### User Story

The user provides an image to the system, and the number of similar images to be found in the Imagenette data set.

The system will return the requested number of images.

### Internal Steps

1. Download the data using `!python '../src/data/get_imagenette.py'`, which will be stored at `../data/raw/imagenette-160`
2. Create a DataBucnh object (packages train, validation datasets and dataloaders). No need for validation dataset, all the data can be used as train.
3. Create a `fastai` learner, based on pre-trained ResNet-18 (to be able to run on laptop).
4. Pass the data through the learner for prediction, collect acitvations from the layer preceding the fully connected layer.
5. Pass the query image through the learner, collect activations.
6. Find indices of closest to the query image activations from dataset activations.
7. Use indices to retreive images and return to the user
8. Plot the query image and the returned images **[optional]**


## Imports

In [None]:
from pathlib import Path
from fastai.vision import *
from fastai.metrics import accuracy
from fastai.callbacks.hooks import *

In [None]:
DATA_PATH = '../data/raw/imagenette-160'
GET_DATA_PATH = '../src/data/get_imagenette.py'

## Getting the Data

In [None]:
!ls ../src/data/

In [None]:
# # $ allows to pass python variable to jupyter magic command
# !python $GET_DATA_PATH

In [None]:
!ls $DATA_PATH

## Creating the Learner

In [None]:
bs = 16
size = 160

data = (ImageList.from_folder(DATA_PATH)
        .use_partial_data(0.01)
        .split_none()
        .label_from_folder()
        .transform(size=size)
        .databunch(bs=bs, num_workers=0)
        .normalize(imagenet_stats))

data.show_batch(rows=4, figsize=(8,8))

In [None]:
print(data.train_ds)
print('-'*42)
print(data.valid_ds)

# Creating the CallbackHook

Callback hook will be registered with the learner and will allow to access activations of a given layer.

# Creating the Learner
## Trying to Collect **All** Activations [NO GOOD]

In [None]:
class StoreHook(HookCallback):
    def on_train_begin(self, **kwargs):
        print("NOICE!")
        super().on_train_begin(**kwargs)
        self.acts = []
    def on_batch_begin(self, **kwargs):
        print("beginning batch")
    def hook(self, m, i, o):
#         print("Hooking!!!")
        return o
    def on_batch_end(self, train, **kwargs):
        self.acts.append(self.hooks.stored)
        print("Batch Done!")
        
class StoreHook2(Callback):
    def __init__(self, module):
        print("Initting!!!")
        super().__init__()
        self.custom_hook = hook_output(module)
        self.outputs = []
        
    def on_batch_end(self, train, **kwargs): 
        if (not train): self.outputs.append(self.custom_hook.stored)
        
learner = cnn_learner(data, models.resnet18, callback_fns=partial(StoreHook, do_remove=False))

f = learner.fit_one_cycle(1, max_lr=0)
# p = learner.get_preds(data.train_ds)

In [None]:
learner.fit(1)

In [None]:
learner.fit_one_cycle(1)

In [None]:
learner.callbacks

In [None]:
learner.callback_fns

In [None]:
learner.callbacks = [cb(learner) for cb in learner.callback_fns]

In [None]:
learner.get_preds(data.train_ds)

In [None]:
learner.fit_one_cycle??

In [None]:
class StoreHook(HookCallback):
    def hook(self, m, i, o):
        pass
    def on_batch_end(self, train, **kwargs):
        print("Batch Done!")
        
learner = cnn_learner(data, models.resnet18, callback_fns=StoreHook)

learner.fit_one_cycle(1)

learner.get_preds

In [None]:
class StoreHook(HookCallback):
    def hook(self, m, i, o):
        pass
    def on_batch_end(self, train, **kwargs):
        print("Batch Done!")     
        
learner = cnn_learner(data, models.resnet18)

store_hook_callback = StoreHook(learner)
learner.callbacks += [store_hook_callback]

learner.get_preds(data.train_ds)

In [None]:
learner.get_preds(data.train_ds)

In [None]:
learner.model.__class__.__name__

In [None]:
len([m for m in learner.model.modules() if list(m.parameters())])

In [None]:
len(learner.store_hook.acts[0])

In [None]:
learner.get_preds(data.train_ds)

# Hook Approach

In [None]:
activations = []

def printer(self, i, o):
    activations.append(o)

# learner = cnn_learner(data, models.resnet18, metrics=[accuracy])
learner = cnn_learner(data, models.resnet18)

list(learner.model.modules())[-3].register_forward_hook(printer)

p = learner.get_preds(data.train_ds)

In [None]:
len(activations)

In [None]:
a = torch.cat(activations)

In [None]:
a.shape

In [None]:
data.train_ds[0][0]

In [None]:
preds = learner.get_preds(data.train_ds)

In [None]:
type(preds)

In [None]:
len(preds)

In [None]:
preds[0].shape

In [None]:
preds[1].shape

In [None]:
probas = preds[0].numpy()

In [None]:
probas.shape

In [None]:
np.argmax(np.max(probas, axis=1))

In [None]:
data.train_ds[38][0]

# Good Approach

1. create databunch
2. create model by hand
3. create hook callback, pass model modules to be collected, create partial for learner creation
4. create learner
5. train with zero learning rate, collect activations
6. ...
7. PROFIT!!!