This competition focuses on determining whether a 32x32px image contains a cactus or not. The dataset comes from Efren López-Jiménez master's thesis titled _Sistema embebido para la supervisión inteligente de terrenos con vehı́culos aéreos no tripulados_ which translates to _Embedded system for intelligent monitoring of land with unmanned aerial vehicles_.

The abstract of the paper reads as follows:

> The terrain survillance by an aerial vehicle provides the view of an area of interest represented by a polygon through an on-board sensor. In the present work we propose an integral system, it has three mainly approaches: coverage path planning, object recognize and its processing on an embedded system. For solve the coverage path planning problem, we used the rotating caliper algorithm proposed by Vasquez. For the classification and recognition object we used Lenet-5 convolutional neural network proposed by Lecun. The dataset was obtained from the cactus plants images, captured by an unmanned aerial vehicle (UAV) at San Antonio Nanahuatipan, Oax, biosphere reserve. Finally we performed the couple between the embedded system and the UAV. 

[Source](https://www.researchgate.net/publication/329453166_Sistema_embebido_para_la_supervision_inteligente_de_terrenos_con_vehiculos_aereos_no_tripulados)

This is an interesting application and integration of machine learning where it is used as a step in the process of mapping an area through a UAV. The difficulty of the challenge is not in predicting the class of the image - that is trivial with today's state of the art techniques. However, the challenge seems to be able to integrate this into the UAV's embedded processing system. Thus, size and processing power for the network are highly constrained.

In [None]:
from fastai import *
from fastai.vision import *
import fastai
fastai.__version__

In [None]:
path = Path('../input')
path.ls()

In [None]:
labs = pd.read_csv(path/'train.csv')
labs.head()

## Let's Look at Some of the Data

There are 17500 training images.

In [None]:
labs.has_cactus.count()

With about 13000 containing a cactus and about 5000 without a cactus.

In [None]:
labs['has_cactus'].value_counts().plot(kind='bar')

The augmentations we will use for this data are probably limited since changing the few pixel values that we have can drastically change the image. For example, rotations introduce pixelations in the output. However, since our application is on a drone, we probably want to use vertical flips.

In [None]:
data = (ImageList.from_csv(path, 'train.csv', folder='train/train')
        .random_split_by_pct()
        .label_from_df()
        .add_test_folder('test/test')
        .transform(get_transforms(max_rotate=0, max_lighting=0.1, max_zoom=1, flip_vert=True), size=32)
        .databunch(bs=256))

In [None]:
data.show_batch(5, figsize=(6,6))

## Let's create a Good Model

Define the AUC metric that Kaggle uses to score the competition.

In [None]:
# https://www.kaggle.com/guntherthepenguin/fastai-v1-densenet169
from sklearn.metrics import roc_auc_score

def auc_score(y_pred,y_true,tens=True):
    score=roc_auc_score(y_true,torch.sigmoid(y_pred)[:,1])
    if tens:
        score=tensor(score)
    else:
        score=score
    return score

Let's use a resnet50 to prove that we can get good results on this task.

In [None]:
learn = create_cnn(data, models.resnet50, path=".", metrics=[accuracy, auc_score])

In [None]:
learn.lr_find()
learn.recorder.plot()

In [None]:
learn.fit_one_cycle(5, 1e-2)

After 5 epochs of training we attain a validation accuracy of 99%, showing that this isn't very difficult for powerful models. Let's learn a bit more about the `learn` object. 

## Computational Size of Model

We can use the objgraph package to visualise the connections of the `learn` object.

In [None]:
! pip install objgraph xdot -q
import objgraph
objgraph.show_refs([learn])

Theres clearly a lot going on in the model. Additionally, the model weights are 114MB which is fairly large!

In [None]:
learn.save('model-resnet')
print(Path('./models/model-resnet.pth').stat().st_size//(1024*1024), 'MB')

Let's see the RAM usage when we're using the model for inference.

In [None]:
# Export the model for inference
learn.export()

We can use the [IPython Memwatcher](https://github.com/FrancescAlted/ipython_memwatcher) package to monitor RAM usage when predicting.

In [None]:
!pip install git+https://github.com/FrancescAlted/ipython_memwatcher -q

In [None]:
from ipython_memwatcher import MemWatcher
mw = MemWatcher()
mw.start_watching_memory()

In [None]:
learn = load_learner("")

In [None]:
img = open_image(path/'train/train'/labs.id.iloc[0])

In [None]:
learn.predict(img)

Similarly the RAM usage is fairly high for classification of a single image.

In [None]:
print(mw.measurements)
mw.stop_watching_memory()

Finally, let's look at the number of parameters and FLOPs the model uses by using this [simple library](https://github.com/Lyken17/pytorch-OpCounter). The library however had not implemented some of the nn Modules like `nn.AdaptiveAvgPool2d` so I implemented them and have made a pull request to the repo. In the mean time, you can use my branch to install the repo.

In [None]:
!pip install git+https://github.com/Tom2718/pytorch-OpCounter -q --upgrade

In [None]:
from thop import profile

In [None]:
model = learn.model
flops, params = profile(model, input_size=(1, 3, 32,32), 
                        custom_ops={Flatten: None})

print('FLOPs:', flops//1e6, 'M')
print('Params:', params//1e6, 'M')

## Reduce Model Size

Now we will use a different architecture more optimized to have a small footprint: squeezenet.

In [None]:
learn = create_cnn(data, models.squeezenet1_1, path=".", metrics=[accuracy, auc_score])

With a few more training epochs, we can reach similar accuracy to the Resnet model:

In [None]:
learn.lr_find()
learn.recorder.plot()

In [None]:
learn.fit_one_cycle(10, 5e-2)

In [None]:
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()

In [None]:
learn.fit_one_cycle(7, slice(5e-4, 5e-3))

Repeating the above steps, we can see the computational impact that this model has.

In [None]:
learn.save('model-squeezenet')
print(Path('./models/model-squeezenet.pth').stat().st_size//(1024*1024), 'MB')

In [None]:
learn.export()
mw.start_watching_memory()

In [None]:
learn = load_learner("")

In [None]:
img = open_image(path/'train/train'/labs.id.iloc[0])

In [None]:
learn.predict(img)

In [None]:
print(mw.measurements)
mw.stop_watching_memory()

In [None]:
model = learn.model
flops, params = profile(model, input_size=(1, 3, 32,32), custom_ops={Flatten: None})

print('FLOPs:', flops//1e6, 'M')
print('Params:', params//1e6, 'M')

This is clearly less computationally expensive!

## A Smaller Resnet

Let's wrap this up by comparing these results with a Resnet18.

In [None]:
learn = create_cnn(data, models.resnet18, path=".", metrics=[accuracy, auc_score])

In [None]:
learn.fit_one_cycle(10, 5e-2)

In [None]:
learn.save('model-resnet18')
print(Path('./models/model-resnet18.pth').stat().st_size//(1024*1024), 'MB')

In [None]:
learn.export()
mw.start_watching_memory()

In [None]:
learn = load_learner("")

In [None]:
img = open_image(path/'train/train'/labs.id.iloc[0])

In [None]:
learn.predict(img)

In [None]:
print(mw.measurements)
mw.stop_watching_memory()

In [None]:
model = learn.model
flops, params = profile(model, input_size=(1, 3, 32,32), custom_ops={Flatten: None})

print('FLOPs:', flops//1e6, 'M')
print('Params:', params//1e6, 'M')

## Final Thoughts

Let's tabulate our results:

|               | Model Size (MB) | Loading Model RAM (MB) | Predict RAM (MB) | FLOPs (M) | Params (M) |
|---------------|-----------------|------------------------|------------------|-----------|------------|
|      ResNet50 |       114       |          0.008         |       0.488      |     88    |     25     |
|      ResNet18 |        48       |          0.254         |       0.566      |     38    |     11     |
| SqueezeNet1_1 |        14       |          0.004         |       0.191      |     4     |      1     |


As we observed, the smaller our model was, the fewer computational resources it required. These results however, should be taken with a pinch of salt - the computer that they are running on is far more powerful than the embedded computer that might be found on a UAV. They are relevant however when looking at relative differences in compute.

MemWatcher does have the limitation of showing the combined RAM usage so the more variables the notebook has defined above, the more RAM will be used anyway. The most demonstrative runs were done by restarting the kernel and running just that model and these are the results you see in the table. They will certainly be different when the kernel is committed. If I get more time, I will try do multiple runs of the same model so we can get a better result.

Thanks for reading!

In [None]:
plt.plot([14,48,114], [4,38,88])
plt.xlabel('Model Size (MB)')
plt.ylabel('FLOPs (M)')
plt.title('Model Size vs FLOPs')