This notebook outlines the steps I took to train the Bear Classifier model, which you can see in action here: https://huggingface.co/spaces/emiranda182/bear_classifier

Outline
1. Set up the environment and import libraries
2. Download the dataset
3. Organize and verify the data
4. Create a DataBlock
5. Create a DataLoader
6. Train and evaluate the model
7. Export the trained model


___________________________________________________________________________

1. Set up the environment and import libraries

In [2]:
#hide
! [ -e /content ] && pip install -Uqq fastbook fastai duckduckgo_search
import fastbook
fastbook.setup_book()

In [3]:
#hide
from fastbook import *
from fastai.vision.widgets import *
from duckduckgo_search import DDGS

In [8]:
# helper function to help us download our dataset
def search_images(keywords, max_images=200): return L(DDGS().images(keywords, max_results=max_images)).itemgot('image')

2. Download the dataset: Search for grizzly, black, and teddy bear images, then download them into labeled folders. Note: the original images I used to train the model are included in the repo under the "bears" folder

In [5]:
bear_types = 'grizzly','black','teddy'
path = Path('bears')

In [None]:
bear_types = 'grizzly','black','teddy'
path = Path('bears')

#Downloads roughly 150 images for each bear type
if not path.exists():

    path.mkdir()
    for o in bear_types:
        dest = (path/o)
        dest.mkdir(exist_ok=True)
        results = search_images(f'{o} bear', 2)
        download_images(dest, urls=results)

3. Organize and verify the data: Ensure images are saved in the correct subfolders, and remove any failed or corrupted downloads.

In [None]:
fns = get_image_files(path)
fns

In [None]:
failed.map(Path.unlink);

4. Create a Datablock: Define how images are labeled, split into training/validation sets, and preprocessed.

In [None]:
bears = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=Resize(128))

5. Create a DataLoader: Use the DataBlock to create a DataLoader, which prepares the data for training.

In [None]:
bears = bears.new(
    item_tfms=RandomResizedCrop(224, min_scale=0.5),
    batch_tfms=aug_transforms(mult=4))
dls = bears.dataloaders(path)

6. Train and evaluate the model: Use a pre-trained model ( resnet18) to train the dataset. Watch the results and check how well the model is performing as it learns.

In [None]:
learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(4)

7. Export the trained model: This .pkl file contains the trained model and is what we’ll use to run inference in production on Hugging Face Spaces.

In [None]:
learn.export()

path = Path()
path.ls(file_exts='.pkl') # Should see a 'export.pkl' file