#  Building a State of the Art Bacterial Classifier with fast.ai

The fast.ai library has been breaking records as students use it to produce state of the art results on a myriad of tasks. Some examples given in the course include environmental sound classification and handwritten devanagari prediction.

In this notebook we'll use the library for state of the art bacteria classification with the [DIBaS dataset](http://misztal.edu.pl/software/databases/dibas/). DIBaS (Digital Image of Bacterial Species) contains 660 images, with 33 different genera and species of bacteria.

You can also check out the full blog post on [Building a Bacterial Classifier with fast.ai](https://blog.paperspace.com/building-a-state-of-the-art-bacterial-classifier-with-paperspace-gradient-and-fast-ai/) by Harsh Sikka.

## Load libraries

In [None]:
!pip uninstall torch -y
!pip install torch==1.4.0 torchvision==0.5.0
!pip install fastai==1.0.60
!pip install bs4

In [None]:
import os
import requests
import urllib.request
import zipfile
import matplotlib.pyplot as plt
from fastai.vision import *
from fastai.metrics import error_rate
from bs4 import BeautifulSoup

## Download and extract bacteria dataset

In [None]:
if not os.path.exists('./storage/dibas_zips'):
    os.makedirs('./storage/dibas_zips')

if not os.path.exists('./storage/dibas_images'):
    os.makedirs('./storage/dibas_images')

In [None]:
# Parse the webpage; images are saved in a separate .zip file for each strain of bacteria
url = 'http://misztal.edu.pl/software/databases/dibas/'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

In [None]:
links = [tag['href'] for tag in soup.findAll('a')]

for link in links:
    if ".zip" in link:
        file_name = link.partition("/dibas/")[2]
        urllib.request.urlretrieve(link, './storage/dibas_zips/' + file_name) 
        zip_ref = zipfile.ZipFile('./storage/dibas_zips/' + file_name, 'r')
        zip_ref.extractall('./storage/dibas_images/')   
        zip_ref.close()
        print("Downloaded and extracted: " + file_name)

In [None]:
len(os.listdir(('./storage/dibas_images/')))

## Verify images

In [None]:
verify_images('./storage/dibas_images/', delete=True, max_size=500)

## Train our model

In [None]:
bs = 64
fnames = get_image_files('./storage/dibas_images/')
fnames[:5]

In [None]:
np.random.seed(42)
pat = r'/([^/]+)_\d+.tif$'
data = ImageDataBunch.from_name_re('.', fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats)

In [None]:
learn = create_cnn(data, models.resnet34, metrics=error_rate)

In [None]:
learn.fit_one_cycle(4)

## Find optimal learning rate

In [None]:
learn = create_cnn(data, models.resnet50, metrics=error_rate)
learn.lr_find()
learn.recorder.plot()

In [None]:
learn.fit_one_cycle(8)

In [None]:
learn.save('stage-1-50')
learn.unfreeze()
learn.fit_one_cycle(3, max_lr=slice(1e-6,1e-4))