## Is it BAYC or Punks??

In [2]:
#     Kaggle requires phone verification to use the internet or a GPU. If you haven't done that yet, the cell below will fail
#    This code is only here to check that your internet is enabled. It doesn't do anything else.
#    Here's a help thread on getting your phone number verified: https://www.kaggle.com/product-feedback/135367

import socket,warnings
try:
    socket.setdefaulttimeout(1)
    socket.socket(socket.AF_INET, socket.SOCK_STREAM).connect(('1.1.1.1', 53))
except socket.error as ex: raise Exception("STOP: No internet. Click '>|' in top right and set 'Internet' switch to on")

In [4]:
# It's a good idea to ensure you're running the latest version of any libraries you need.
# `!pip install -Uqq <libraries>` upgrades to the latest version of <libraries>
# NB: You can safely ignore any warnings or errors pip spits out about running as root or incompatibilities
import os
iskaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')

if iskaggle:
    !pip install -Uqq fastai

Here's the outline of Image Classifier
1. Use DuckDuckGo to search for images of "Bored Ape Yacht Club (BAYC) images"
1. Use DuckDuckGo to search for images of "Cryptopunks (Punks) images"
1. Fine-tune a pretrained neural network to recognise these two groups
1. Try running this model on a picture of a bird and see if it works.

## Step 1: Download images of BAYC and Punks

In [5]:
from fastcore.all import *
import time

def search_images(term, max_images=200):
'''takes in <str> name and sesarches on duckduckgo images related to name'''
    url = 'https://duckduckgo.com/'
    res = urlread(url,data={'q':term})
    searchObj = re.search(r'vqd=([\d-]+)\&', res)
    requestUrl = url + 'i.js'
    params = dict(l='us-en', o='json', q=term, vqd=searchObj.group(1), f=',,,', p='1', v7exp='a')
    urls,data = set(),{'next':1}
    while len(urls)<max_images and 'next' in data:
        data = urljson(requestUrl,data=params)
        urls.update(L(data['results']).itemgot('image'))
        requestUrl = url + data['next']
        time.sleep(0.2)
    return L(urls)[:max_images]

In [8]:
urls = search_images('Bored Ape Yacht Club', max_images=1)
# urls[0]

In [9]:
# BAYC 
from fastdownload import download_url
dest = 'bayc.jpg'
download_url(urls[0], dest, show_progress=False)

from fastai.vision.all import *
im = Image.open(dest)
im.to_thumb(256,256)

In [10]:
# Punks
download_url(search_images('cryptopunks', max_images=1)[0], 'cryptopunks.jpg', show_progress=False)
Image.open('cryptopunks.jpg').to_thumb(256,256)

Our searches seem to be giving reasonable results, so we will grab 200 examples of each of "bayc" and "punks" images, and save each group of photos to a different folder:

In [25]:
searches = 'Bored Ape Yacht Club','Cryptopunks'
path = Path('bayc_or_punks')

if not path.exists():
    path.mkdir()
for o in searches:
    dest = (path/o)
    dest.mkdir(exist_ok=True)
    results = search_images(f'{o}')
    download_images(dest, urls=results)

    # remove any png images to avoid some errors
    # [os.remove(path/o/file) for file in os.listdir(path/o) if file.endswith('.png')]

    # convert all images to RGBA
    for image in os.listdir(path/o):
        im = Image.open(image)
        im.convert("RGBA").save(f"{image}2.png")

## Step 2: Train our model

Some photos might not download correctly which could cause our model training to fail, so we'll remove them:

In [26]:
failed = verify_images(get_image_files(path))
failed.map(Path.unlink)
len(failed)

To train a model, we'll use `DataLoaders`, which is an object that contains a *training set* (the images used to create a model) and a *validation set* (the images used to check the accuracy of a model -- not used during training). 

In [13]:
dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=[Resize(192, method='squish')]
).dataloaders(path)

dls.show_batch(max_n=6)

Now we're ready to train our model. The fastest widely used computer vision model is `resnet18`. You can train this in a few minutes, even on a CPU! (On a GPU, it generally takes under 10 seconds...)

`fastai` comes with a helpful `fine_tune()` method which automatically uses best practices for fine tuning a pre-trained model, so we'll use that.

In [20]:
learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(3)

Generally when I run this I see 100% accuracy on the validation set (although it might vary a bit from run to run).

"Fine-tuning" a model means that we're starting with a model someone else has trained using some other dataset (called the *pretrained model*), and adjusting the weights a little bit so that the model learns to recognise your particular dataset. In this case, the pretrained model was trained to recognise photos in *imagenet*, and widely-used computer vision dataset with images covering 1000 categories)

In [22]:
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

#Not exactly sure what went wrong here... It was running just fine this afternoon and now I can only get images from one category.

In [27]:
interp.plot_top_losses(5, nrows=1)

## Step 3: Use our model (and build your own!)

In [28]:
from fastai.vision.widgets import *

cleaner = ImageClassifierCleaner(learn)
cleaner

Let's see what our model thinks about that bird we downloaded at the start:

In [27]:
for idx in cleaner.delete(): cleaner.fns[idx].unlink()
# for idx,cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat)

In [28]:
# export learner
learn.export()
path = Path()
path.ls(file_exts='.pkl')

In [31]:
learn_inf = load_learner(path/'export.pkl')

In [32]:
learn_inf.dls.vocab

In [1]:
is_bayc,_,probs = learn.predict(PILImage.create('bayc.jpg'))

print(f"This is a: {is_bayc}.")
print(f"Probability it's a BAYC: {probs[0]:.4f}")