# Creating your own dataset from Google Images

*by: Francisco Ingham and Jeremy Howard. Inspired by [Adrian Rosebrock](https://www.pyimagesearch.com/2017/12/04/how-to-create-a-deep-learning-dataset-using-google-images/)*

In this tutorial we will see how to easily create an image dataset through Google Images. **Note**: You will have to repeat these steps for any new category you want to Google (e.g once for dogs and once for cats).

# Downloading Data

Teddy Bear vs Grizzly vs Black

Awesome library to get data from google image search [google-images-download](https://github.com/hardikvasa/google-images-download)

In [None]:
!pip install google_images_download

In [None]:
from google_images_download import google_images_download   #importing the library

In [None]:
response = google_images_download.googleimagesdownload()   #class instantiation

In [1]:
images_to_download = ["grizzly bear", "teddy bear", "black bear"]

#images_to_download = ["pizza", "pasta", "hot dog"]


In [29]:
images_to_download = sorted(images_to_download)
print (images_to_download)
num_images_to_download = 100
output_directory = "data"
safe_search = True

['black bear', 'grizzly bear', 'teddy bear']


In [None]:
for image_class in images_to_download:
    print (f"Downloading {image_class}")
    # arguments the library will pass to google search
    arguments = {"keywords":image_class,"limit":num_images_to_download
                 ,"print_urls":False, "output_directory": output_directory
                 , "safe_search":safe_search
                , "format": "jpg"
                }   
    
    # passing the arguments to the function
    paths = response.download(arguments)   
    # printing absolute paths of the downloaded images
    #print(paths)   

In [3]:
from fastai import *
from fastai.vision import *

Choose an appropriate name for your labeled images. You can run these steps multiple times to grab different labels.

You will need to run this line once per each category.

In [4]:
path = Path('data')

Then we can remove any images that can't be opened:

In [5]:
image_path = path/"train"

In [6]:
image_path.ls()

[PosixPath('data/train/black bear'),
 PosixPath('data/train/teddy bear'),
 PosixPath('data/train/grizzly bear')]

In [None]:
for c in images_to_download:
    print(c)
    verify_images(image_path/c, delete=True, max_workers=8)

## View data

In [None]:
np.random.seed(42)
data = ImageDataBunch.from_folder(path, train="train", valid_pct=0.2,
        ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

Good! Let's take a look at some of our pictures then.

In [None]:
data.classes

In [None]:
data.show_batch(rows=3, figsize=(7,8))

In [None]:
data.classes, data.c, len(data.train_ds), len(data.valid_ds)

## Train model

In [None]:
learn = create_cnn(data, models.resnet34, metrics=error_rate)

In [None]:
learn.fit_one_cycle(4)

In [None]:
learn.unfreeze()

In [None]:
learn.fit_one_cycle(2, max_lr=slice(3e-5,3e-4))

In [12]:
final_model_name = 'final'

In [None]:
learn.save(final_model_name)

## Interpretation

In [None]:
learn.load(final_model_name)

In [None]:
interp = ClassificationInterpretation.from_learner(learn)

In [None]:
interp.plot_confusion_matrix()

## Putting your model in production

In [None]:
data.classes

You probably want to use CPU for inference, except at massive scale (and you almost certainly don't need to train in real-time). If you don't have a GPU that happens automatically. You can test your model on CPU like so:

In [None]:
# fastai.defaults.device = torch.device('cpu')

In [None]:
image_path.ls()[0]

In [None]:
sample_img = image_path.ls()[0].ls()[5]
sample_img

In [None]:
img = open_image(sample_img)
img

In [13]:
classes = images_to_download


In [None]:
data2 = ImageDataBunch.single_from_classes(path, classes
                                           , tfms=get_transforms()
                                           , size=224).normalize(imagenet_stats)
learn = create_cnn(data2, models.resnet34)
learn.load(final_model_name)

In [None]:
data2.classes, data2.c

In [None]:
pred_class,pred_idx,losses = learn.predict(img)
pred_class

In [None]:
outputs

In [None]:
softmax= nn.Softmax()
probabilities = softmax(outputs)
probabilities

In [28]:
ClassificationLearner

fastai.vision.learner.ClassificationLearner

In [27]:
type(model)

fastai.vision.learner.ClassificationLearner

In [23]:
??open_image

[0;31mSignature:[0m [0mopen_image[0m[0;34m([0m[0mfn[0m[0;34m:[0m[0mUnion[0m[0;34m[[0m[0mpathlib[0m[0;34m.[0m[0mPath[0m[0;34m,[0m [0mstr[0m[0;34m][0m[0;34m)[0m [0;34m->[0m [0mfastai[0m[0;34m.[0m[0mvision[0m[0;34m.[0m[0mimage[0m[0;34m.[0m[0mImage[0m[0;34m[0m[0m
[0;31mSource:[0m   
[0;32mdef[0m [0mopen_image[0m[0;34m([0m[0mfn[0m[0;34m:[0m[0mPathOrStr[0m[0;34m)[0m[0;34m->[0m[0mImage[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"Return `Image` object created from image in file `fn`."[0m[0;34m[0m
[0;34m[0m    [0mx[0m [0;34m=[0m [0mPIL[0m[0;34m.[0m[0mImage[0m[0;34m.[0m[0mopen[0m[0;34m([0m[0mfn[0m[0;34m)[0m[0;34m.[0m[0mconvert[0m[0;34m([0m[0;34m'RGB'[0m[0;34m)[0m[0;34m[0m
[0;34m[0m    [0;32mreturn[0m [0mImage[0m[0;34m([0m[0mpil2tensor[0m[0;34m([0m[0mx[0m[0;34m)[0m[0;34m.[0m[0mfloat[0m[0;34m([0m[0;34m)[0m[0;34m.[0m[0mdiv_[0m[0;34m([0m[0;36m255[0m[0;34m)[0m

In [14]:
def load_model(path, classes ,  model_name, architecture= models.resnet34, image_size=224):
    data = ImageDataBunch.single_from_classes(path, classes
                                           , tfms=get_transforms()
                                           , size=image_size).normalize(imagenet_stats)
    learn = create_cnn(data, architecture)
    learn.load(model_name)
    return learn

In [15]:
model = load_model(path="data" ,classes=classes, model_name=final_model_name )

In [20]:
doc(ImageDataBunch.single_from_classes)

In [16]:
from io import BytesIO

In [17]:
url = "https://upload.wikimedia.org/wikipedia/commons/0/08/01_Schwarzb%C3%A4r.jpg"
response = requests.get(url)
img = open_image(BytesIO(response.content))

In [18]:
type(img)

fastai.vision.image.Image

In [19]:
pred_class,pred_idx,losses = model.predict(img)
pred_class

'black bear'

In [None]:
losses

In [None]:
softmax= nn.Softmax()
probabilities

In [None]:
learn.data.classes

In [None]:
losses.tolist()

In [None]:
losses.flatten()

In [None]:
list(map (float, losses ))

# Predict from url


In [None]:
doc(open_image)

So you might create a route something like this ([thanks](https://github.com/simonw/cougar-or-not) to Simon Willison for the structure of this code):

```python
@app.route("/classify-url", methods=["GET"])
async def classify_url(request):
    bytes = await get_bytes(request.query_params["url"])
    img = open_image(BytesIO(bytes))
    _,_,losses = learner.predict(img)
    return JSONResponse({
        "predictions": sorted(
            zip(learner.data.classes, map(float, losses)),
            key=lambda p: p[1],
            reverse=True
        )
    })
```

(This example is for the [Starlette](https://www.starlette.io/) web app toolkit.)