# `001-adjust-hyperparameters-1`

Task: change some basic hyperparameters of notebook 000

## Setup

In [1]:
# setup fastai if needed
try: import fastbook
except ImportError: import subprocess; subprocess.run(['pip','install','-Uq','fastbook'])

# Import fastai code.
from fastai.vision.all import *

# Set a seed for reproducibility.
set_seed(12345, reproducible=True)

## Task

Starting with the basic classifier of notebook `000`, report the effect on validation accuracy of each of the following changes:

* Hold out 90% (instead of 20%) of the data for validation. (How many images will the training set have now?) How does the accuracy compare?
* Use the *breed* of the dog/cat as the target (the breed is the `.name` of the file, up until the last underscore). *Peek at chapter 5 if you can't figure out how to do this.*

For each, run two trials, each with a different seed value passed to `set_seed`.

## Solution

In [2]:
path = untar_data(URLs.PETS)/'images'

def is_cat(x):
    return "cat" if x[0].isupper() else "dog"

# Aside: fastai added a nice trick that makes labelling easy:
get_image_files(path).attrgot('name')

def get_breed(x): return x.rsplit('_', 1)[0]

(
    get_image_files(path)
    .attrgot('name')
    .map(get_breed)
)

(#7390) ['japanese_chin','scottish_terrier','american_bulldog','Russian_Blue','leonberger','Sphynx','Maine_Coon','boxer','British_Shorthair','american_pit_bull_terrier'...]

In [3]:
# Using 90% of images for validation and is_cat() (round 1)
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.9, seed=42,
    label_func=is_cat, item_tfms=Resize(224)
)

learn = cnn_learner(dls, resnet18, metrics=error_rate)

learn.fine_tune(1)

epoch,train_loss,valid_loss,error_rate,time
0,0.729838,0.139915,0.050669,00:26


epoch,train_loss,valid_loss,error_rate,time
0,0.109671,0.164134,0.05969,00:28


In [4]:
# Using 90% of images for validation and is_cat() (round 2)
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.9, seed=6,
    label_func=is_cat, item_tfms=Resize(224)
)

learn = cnn_learner(dls, resnet18, metrics=error_rate)

learn.fine_tune(1)

epoch,train_loss,valid_loss,error_rate,time
0,0.542902,0.104198,0.035634,00:26


epoch,train_loss,valid_loss,error_rate,time
0,0.0945,0.126817,0.0427,00:28


In [5]:
# Using 20% of images for validation and get_breed() (round 1)
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=42,
    label_func=get_breed, item_tfms=Resize(224)
)

learn = cnn_learner(dls, resnet18, metrics=error_rate)

learn.fine_tune(1)

epoch,train_loss,valid_loss,error_rate,time
0,1.566645,0.390776,0.124493,01:01


epoch,train_loss,valid_loss,error_rate,time
0,0.478852,0.300433,0.096076,01:20


In [6]:
# Using 20% of images for validation and get_breed() (round 2)
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=6,
    label_func=get_breed, item_tfms=Resize(224)
)

learn = cnn_learner(dls, resnet18, metrics=error_rate)

learn.fine_tune(1)

epoch,train_loss,valid_loss,error_rate,time
0,1.534847,0.407924,0.124493,01:01


epoch,train_loss,valid_loss,error_rate,time
0,0.495951,0.299078,0.101489,01:20


## Analysis

**Report the effect of each change: (1) Is the original or modified classifier more accurate? (2) What is your confidence in that conclusion?**

1. The original classifier was more accurate.

2. I'm very confident in this conclusion, because the original classifier uses a larger training set in comparison to the classifier that uses 90% of the data as a validation set. Since the latter classifier only uses 10% of the data to train the model (using the other 90% as validation), it has less data to learn from and therefore will be less accurate. The original classfier is also more accurate than the breed classifier, since the breed clasifier deals with more categories but less data within each category. Therefore, the model predicts less accurately than the simple dog-cat classifier.