# Fluffy recognition model

Trying to build a model that recognises whether an image has something fluffy in it.

## Prepare environment

Install & import fastai / fastbook libraries

In [1]:
!pip install fastai==2.5
!pip install -Uqq fastbook

import fastbook
from fastbook import *
from fastai.vision.all import *

Collecting fastai==2.5
  Downloading fastai-2.5.0-py3-none-any.whl (188 kB)
[?25l[K     |█▊                              | 10 kB 22.9 MB/s eta 0:00:01[K     |███▌                            | 20 kB 9.1 MB/s eta 0:00:01[K     |█████▏                          | 30 kB 7.8 MB/s eta 0:00:01[K     |███████                         | 40 kB 7.2 MB/s eta 0:00:01[K     |████████▊                       | 51 kB 5.3 MB/s eta 0:00:01[K     |██████████▍                     | 61 kB 5.7 MB/s eta 0:00:01[K     |████████████▏                   | 71 kB 5.5 MB/s eta 0:00:01[K     |█████████████▉                  | 81 kB 6.2 MB/s eta 0:00:01[K     |███████████████▋                | 92 kB 5.0 MB/s eta 0:00:01[K     |█████████████████▍              | 102 kB 5.4 MB/s eta 0:00:01[K     |███████████████████             | 112 kB 5.4 MB/s eta 0:00:01[K     |████████████████████▉           | 122 kB 5.4 MB/s eta 0:00:01[K     |██████████████████████▋         | 133 kB 5.4 MB/s eta 0:00:01[

Download images

In [2]:
IMG_URL = "https://github.com/mihailthebuilder/fluffy-nb/raw/main/fluffy-images.tar.xz"
path = untar_data(IMG_URL)

Check files downloaded and how they're split between fluffy/not fluffy.

In [3]:
file_paths = get_image_files(path)
print(file_paths[:3])

total_files = len(file_paths)
print("total files - "+str(total_files))

def is_fluffy(x): return x[0].islower()

fluffy_files = len([x for x in file_paths if is_fluffy(x.name)])
print("fluffy files - "+str(fluffy_files))

[Path('/root/.fastai/data/fluffy-images/NCFMXCIKQGPTPMMKOIUU.JPEG.jpeg.jpg'), Path('/root/.fastai/data/fluffy-images/obgpwenkqxhdnyconiut.jpg'), Path('/root/.fastai/data/fluffy-images/mnslxqrfucrfrlmyneto.jpg')]
total files - 283
fluffy files - 127


## Establish baseline error rate

The baseline model will always predict that the image is **not** fluffy. So the error rate is the % of images that are fluffy.

In [4]:
fluffy_ratio = fluffy_files / total_files

print("baseline - " + str(round(fluffy_ratio,2)))

baseline - 0.45


## Prepare data

Prepare data for model training

In [5]:
dls = ImageDataLoaders.from_name_func(
    path, file_paths, valid_pct=0.2, seed=42,
    label_func=is_fluffy, item_tfms=Resize(500))

## Train model

In [9]:
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(2)

epoch,train_loss,valid_loss,error_rate,time
0,1.468605,0.404443,0.178571,00:06


epoch,train_loss,valid_loss,error_rate,time
0,0.468031,0.209574,0.089286,00:07
1,0.337211,0.188692,0.053571,00:08


The error rate should be somewhere between 3-7%.

## Try out model

Upload your image

In [None]:
uploader = widgets.FileUpload()
uploader

FileUpload(value={}, description='Upload')

Apply model on image

In [None]:
img = PILImage.create(uploader.data[0])
fluffy,_,probs = learn.predict(img)
print(f"Is this fluffy?: {fluffy}.")
print(f"Probability it's fluffy: {probs[1].item():.6f}")

Is this fluffy?: True.
Probability it's fluffy: 0.646041


## Previous experiments
- 12.10.2021 - resnet34, 224 pixels - tried different epochs and 2 was the best, with error rates between 5% and 8%
- 12.10.2021 - resnet34 and 500 pixels improved error rates to 3%-6%; 2 epochs still seems the best

## Other notes
- can't use more than 34 layers for resnet architecture together with 500-pixel images as I run out of GPU
- bug relating to file names made the results prior to 12.10.2021 useless
- consider trying EfficientNetV2; it's state of the art and you've already built it for experiments on 11.10.2021