# Bark or Not Bark?

Partitioning the source images for training yields a very large number of patches. For reference, the 3000x4000 images I was taking in Pittsburgh produce 196 patches of 250x250 *each*. Due to their sizes, a great many of these patches contain extraneous data that will only interfere in training. Often, they are mostly sky, asphalt, grass, or other things that were in the general vicinity of the tree. I foolishly thought it would be possible to manually sort these images, and in doing so created a dataset of ~30K binary images. At that point, it made sense to train a binary classifier to apply it to the rest of the images.

First, we import standard libraries. The constellation of stuff from `torch` is infrastructure for training the model, we use a pretrained EfficientNet B4 provided by `timm`, and we need `Image` to actually load images.

In [None]:
import torch
from torch import nn, optim, utils
from torch.optim import lr_scheduler
from torchvision import transforms, models, datasets
from torch.utils.data import Dataset, DataLoader
import timm

from PIL import Image

import os

os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

The good thing about this dataset is that it was relatively easy to train. Without changing many parameters, I was able to train one to 93% accuracy with a pronounced bias toward rejecting images, which is suitable for my purposes.

In [1]:
EPOCHS = 25
BATCH_SIZE = 4
N_CLASSES = 2

# Loading the Data

Here we load the patches. Notice that we are getting the 250x250 images. Something that I didn't anticipate and found a bit strange looking back was that the sorter trained on 500x500 images was unusable when trained on images of 250x250. I'm wondering now if this would be the case the other way around, and the information theoretical implications of this in general.

In [2]:
dataset_folder = "dataset/reject_accept_data/250x250/"
train_folder = dataset_folder + "train/"
test_folder = dataset_folder + "test/"
#train_file = dataset_folder + "processed.csv"

Here we define some basic image transforms and actually load the data. If you're playing along at home, see what happens when you don't add a `Normalize()`.

In [4]:
train_transforms = transforms.Compose([
    transforms.Resize(224),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.0406],
                        std=[0.229, 0.224, 0.225])
])

train_data = datasets.ImageFolder(root=train_folder, transform=train_transforms)
train_data_loader = DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True, num_workers=1)

test_data = datasets.ImageFolder(root=test_folder, transform=train_transforms)
test_data_loader = DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=True, num_workers=1)

In [5]:
train_loader = DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(dataset=test_data, batch_size=BATCH_SIZE, shuffle=False)

# Setting Up the Model

First, it's always a good idea to get a sense of the variety of models that are on offer in `timm`.

In [6]:
timm.list_models(pretrained=True)

['adv_inception_v3',
 'bat_resnext26ts',
 'beit_base_patch16_224',
 'beit_base_patch16_224_in22k',
 'beit_base_patch16_384',
 'beit_large_patch16_224',
 'beit_large_patch16_224_in22k',
 'beit_large_patch16_384',
 'beit_large_patch16_512',
 'beitv2_base_patch16_224',
 'beitv2_base_patch16_224_in22k',
 'beitv2_large_patch16_224',
 'beitv2_large_patch16_224_in22k',
 'botnet26t_256',
 'cait_m36_384',
 'cait_m48_448',
 'cait_s24_224',
 'cait_s24_384',
 'cait_s36_384',
 'cait_xs24_384',
 'cait_xxs24_224',
 'cait_xxs24_384',
 'cait_xxs36_224',
 'cait_xxs36_384',
 'coat_lite_mini',
 'coat_lite_small',
 'coat_lite_tiny',
 'coat_mini',
 'coat_tiny',
 'coatnet_0_rw_224',
 'coatnet_1_rw_224',
 'coatnet_bn_0_rw_224',
 'coatnet_nano_rw_224',
 'coatnet_rmlp_1_rw_224',
 'coatnet_rmlp_2_rw_224',
 'coatnet_rmlp_nano_rw_224',
 'coatnext_nano_rw_224',
 'convit_base',
 'convit_small',
 'convit_tiny',
 'convmixer_768_32',
 'convmixer_1024_20_ks9_p14',
 'convmixer_1536_20',
 'convnext_atto',
 'convnext_atto_

I suspect a great many of these would suffice, but for the sake of familiarity, we'll just be using an EfficientNet B4. This is a good general-purpose architecture built with ResNets, and the weights at the end are small.

In [7]:
model = timm.create_model("tf_efficientnet_b4_ns", pretrained=True, num_classes=N_CLASSES)
# Change this to the actual sorter on Beatrice
model.load_state_dict(torch.load("reject_accept_sorter_500x500.bin"))
model.to(device)

EfficientNet(
  (conv_stem): Conv2dSame(3, 48, kernel_size=(3, 3), stride=(2, 2), bias=False)
  (bn1): BatchNormAct2d(
    48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
    (drop): Identity()
    (act): SiLU(inplace=True)
  )
  (blocks): Sequential(
    (0): Sequential(
      (0): DepthwiseSeparableConv(
        (conv_dw): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)
        (bn1): BatchNormAct2d(
          48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): SiLU(inplace=True)
        )
        (se): SqueezeExcite(
          (conv_reduce): Conv2d(48, 12, kernel_size=(1, 1), stride=(1, 1))
          (act1): SiLU(inplace=True)
          (conv_expand): Conv2d(12, 48, kernel_size=(1, 1), stride=(1, 1))
          (gate): Sigmoid()
        )
        (conv_pw): Conv2d(48, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn2): BatchNormAct2d(
          24, ep

And to set up the model parameters, we're using cross-entropy loss, training with an Adam optmizer, and using a scheduler that decreases the learning rate at 200, 400, and 600 images. Pretty standard.

In [8]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = lr_scheduler.MultiStepLR(optimizer, milestones=[200, 400, 600], gamma=0.5)
#scheduler = lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0)

# Training and Testing

In [11]:
for epoch in range(EPOCHS):
    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device, dtype=torch.float)
        labels = labels.to(device, dtype=torch.long)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print('Epoch [%d/%d], Iter [%d/%d] Loss: %.4f' % (epoch+1, EPOCHS, i+1, len(train_data)//BATCH_SIZE, loss.item()))
        
    if (epoch+1) % 5 == 0:
        torch.save(model.state_dict(), f"reject_accept_sorter_250x250-epoch{epoch}.bin")
        print(f"Saved at epoch {epoch}")
        

Epoch [1/25], Iter [100/247] Loss: 0.1688
Epoch [1/25], Iter [200/247] Loss: 0.0699
Epoch [2/25], Iter [100/247] Loss: 0.0265
Epoch [2/25], Iter [200/247] Loss: 0.0709
Epoch [3/25], Iter [100/247] Loss: 0.3459
Epoch [3/25], Iter [200/247] Loss: 0.0770
Epoch [4/25], Iter [100/247] Loss: 0.0038
Epoch [4/25], Iter [200/247] Loss: 0.0560
Epoch [5/25], Iter [100/247] Loss: 0.0019
Epoch [5/25], Iter [200/247] Loss: 0.0067
Saved at epoch 4
Epoch [6/25], Iter [100/247] Loss: 0.0229
Epoch [6/25], Iter [200/247] Loss: 0.0015
Epoch [7/25], Iter [100/247] Loss: 0.0119
Epoch [7/25], Iter [200/247] Loss: 0.0508
Epoch [8/25], Iter [100/247] Loss: 0.0078
Epoch [8/25], Iter [200/247] Loss: 0.0018
Epoch [9/25], Iter [100/247] Loss: 0.0035
Epoch [9/25], Iter [200/247] Loss: 0.0030
Epoch [10/25], Iter [100/247] Loss: 0.0525
Epoch [10/25], Iter [200/247] Loss: 0.0000
Saved at epoch 9
Epoch [11/25], Iter [100/247] Loss: 0.0076
Epoch [11/25], Iter [200/247] Loss: 0.0007
Epoch [12/25], Iter [100/247] Loss: 0.

I've gotten into the habit of saving and loading the weights at this point. It's convenient for trying different things in Jupyter notebooks, such as when I reload one and just want to test instead of training something again.

In [14]:
#torch.save(model.state_dict(), "test.bin")

In [17]:
model.load_state_dict(torch.load("reject_accept_sorter_250x250-epoch9.bin"))
model.to(device)

EfficientNet(
  (conv_stem): Conv2dSame(3, 48, kernel_size=(3, 3), stride=(2, 2), bias=False)
  (bn1): BatchNormAct2d(
    48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
    (drop): Identity()
    (act): SiLU(inplace=True)
  )
  (blocks): Sequential(
    (0): Sequential(
      (0): DepthwiseSeparableConv(
        (conv_dw): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)
        (bn1): BatchNormAct2d(
          48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): SiLU(inplace=True)
        )
        (se): SqueezeExcite(
          (conv_reduce): Conv2d(48, 12, kernel_size=(1, 1), stride=(1, 1))
          (act1): SiLU(inplace=True)
          (conv_expand): Conv2d(12, 48, kernel_size=(1, 1), stride=(1, 1))
          (gate): Sigmoid()
        )
        (conv_pw): Conv2d(48, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn2): BatchNormAct2d(
          24, ep

In [18]:
model.eval()
correct = 0
total = 0
for images, labels, in test_loader:
    images = images.to(device, dtype=torch.float)
    labels = labels.to(device, dtype=torch.long)
    outputs = model(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()
print('Test accuracy: %.6f%%' % (100.0*correct/total))

Test accuracy: 90.900002%
