# DuckDuckGo Image Scraper

This was originally an image scraper notebook for creating deep learning datasets.

It has since been turned into an installable library and is much easier to use as you can simply drop a few lines of code into your own notebook as you're experimenting.

This notebook now shows you how to use the library.

Docs are at [joedockrill.github.io/jmd_imagescraper/](https://joedockrill.github.io/jmd_imagescraper/)

Hugs & kisses, Joe Dockrill.

## Install



In [None]:
!pip install -q jmd_imagescraper

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for bs4 (setup.py) ... [?25l[?25hdone


## Download images

In [None]:
from pathlib import Path
root = Path().cwd()/"images"

from jmd_imagescraper.core import * # dont't worry, it's designed to work with import *

duckduckgo_search(root, "Angry", "angry expression of human face", max_results=70)
duckduckgo_search(root, "Happy", "happy  expression of human face", max_results=70)
duckduckgo_search(root, "Sad", "sad  expression of human face", max_results=70)
duckduckgo_search(root, "Neutral", "neutral  expression of human face", max_results=70)

# file paths are returned so if you want to snag a list of downloaded files as you go, do this:

# images = []
# images.extend(duckduckgo_search(root, "Cats", "cute kittens", max_results=10))
# images.extend(duckduckgo_search(root, "Dogs", "cute puppies", max_results=10))
# images.extend(duckduckgo_search(root, "Birds", "cute baby ducks and chickens", max_results=10))
# images

Duckduckgo search: angry expression of human face
Downloading results into /content/images/Angry


Duckduckgo search: happy  expression of human face
Downloading results into /content/images/Happy


Duckduckgo search: sad  expression of human face
Downloading results into /content/images/Sad


Duckduckgo search: neutral  expression of human face
Downloading results into /content/images/Neutral


[PosixPath('/content/images/Neutral/071_b9cd70cb.jpg'),
 PosixPath('/content/images/Neutral/072_f975911b.jpg'),
 PosixPath('/content/images/Neutral/073_f9117d6c.jpg'),
 PosixPath('/content/images/Neutral/074_c4176952.jpg'),
 PosixPath('/content/images/Neutral/075_0e9366d3.jpg'),
 PosixPath('/content/images/Neutral/076_2f4181d3.jpg'),
 PosixPath('/content/images/Neutral/077_66849ddb.jpg'),
 PosixPath('/content/images/Neutral/078_8ad9b38f.jpg'),
 PosixPath('/content/images/Neutral/079_1de30716.jpg'),
 PosixPath('/content/images/Neutral/080_e7c5867f.jpg'),
 PosixPath('/content/images/Neutral/081_f3f3a8f1.jpg'),
 PosixPath('/content/images/Neutral/082_d963b897.jpg'),
 PosixPath('/content/images/Neutral/083_2c6f2eec.jpg'),
 PosixPath('/content/images/Neutral/084_ccdb7177.jpg'),
 PosixPath('/content/images/Neutral/085_d69f0ae9.jpg'),
 PosixPath('/content/images/Neutral/086_57219f5c.jpg'),
 PosixPath('/content/images/Neutral/087_ee149353.jpg'),
 PosixPath('/content/images/Neutral/088_7c47f635

## Changing params across multiple searches

In [None]:
from pathlib import Path
root = Path().cwd()/"images"

from jmd_imagescraper.core import *

# If you're going to override default params across multiple searches you can use a
# dictionary like this (so you can change search params for the entire dataset once).

params = {
    "max_results": 80,             # this can go up to 477 at the time of writing
    "img_size":    ImgSize.Cached,
    "img_type":    ImgType.Photo,
    "img_layout":  ImgLayout.Square,
    #"img_color":   ImgColor.Purple
}

duckduckgo_search(root, "Angry", "angry expression of human face", **params)
duckduckgo_search(root, "Happy", "happy  expression of human face", **params)
duckduckgo_search(root, "Sad", "sad  expression of human face", **params)
duckduckgo_search(root, "Neutral", "neutral  expression of human face", **params)

Duckduckgo search: angry expression of human face
Downloading results into /content/images/Angry


Duckduckgo search: happy  expression of human face
Downloading results into /content/images/Happy


Duckduckgo search: sad  expression of human face
Downloading results into /content/images/Sad


Duckduckgo search: neutral  expression of human face
Downloading results into /content/images/Neutral


[PosixPath('/content/images/Neutral/001_e4e742ea.jpg'),
 PosixPath('/content/images/Neutral/002_abb93047.jpg'),
 PosixPath('/content/images/Neutral/003_e5105c63.jpg'),
 PosixPath('/content/images/Neutral/004_87d80635.jpg'),
 PosixPath('/content/images/Neutral/005_37a55323.jpg'),
 PosixPath('/content/images/Neutral/006_9113b4b4.jpg'),
 PosixPath('/content/images/Neutral/007_bd08ff72.jpg'),
 PosixPath('/content/images/Neutral/008_42c760e8.jpg'),
 PosixPath('/content/images/Neutral/009_d74a8f82.jpg'),
 PosixPath('/content/images/Neutral/010_d73ef529.jpg'),
 PosixPath('/content/images/Neutral/011_ee91d69f.jpg'),
 PosixPath('/content/images/Neutral/012_284908ba.jpg'),
 PosixPath('/content/images/Neutral/013_f667e152.jpg'),
 PosixPath('/content/images/Neutral/014_6f94238c.jpg'),
 PosixPath('/content/images/Neutral/015_ce4c4c33.jpg'),
 PosixPath('/content/images/Neutral/016_413457f2.jpg'),
 PosixPath('/content/images/Neutral/017_ffaf6427.jpg'),
 PosixPath('/content/images/Neutral/018_f5a1d53a

## Deleting all images

In [None]:
rmtree(root)

AssertionError: ignored

## Displaying the image cleaner

Use this to get rid of unsuitable images without leaving your notebook

In [None]:
from jmd_imagescraper.imagecleaner import *

display_image_cleaner(root)

HBox(children=(Button(description='|<<', layout=Layout(width='60px'), style=ButtonStyle()), Button(description…

HTML(value='<h2>No images left to display in this folder.</h2>', layout=Layout(visibility='hidden'))

GridBox(children=(VBox(children=(Image(value=b'', layout="Layout(width='150px')"), Button(description='Delete'…

FileNotFoundError: ignored

## Create a zip to download or transfer to google drive

In [None]:
# create zip

ZIP_NAME = "images.zip" # maybe change this?

!rm -f {ZIP_NAME}
!zip -q -r {ZIP_NAME} {root}

In [None]:
# download to your local system

from google.colab import files
files.download(ZIP_NAME)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# copy to google drive

from google.colab import drive
import shutil

DESTINATION_FOLDER = "Datasets" # where would you like this in Google Drive?

drive.mount("/content/drive")
folder = Path("/content/drive/My Drive")/DESTINATION_FOLDER
folder.mkdir(parents=True, exist_ok=True)

shutil.copyfile(ZIP_NAME, str(folder/ZIP_NAME))

## Create a CSV file of URLs

If you'd rather distribute a file with the image URLs and labels and have people download the images themselves you can do so here.

In [None]:
CSV_NAME = "images.csv" # maybe change this?

!rm -f {CSV_NAME}

csv = Path.cwd()/CSV_NAME
save_urls_to_csv(csv, "Nice", "nice clowns", max_results=5)
save_urls_to_csv(csv, "Scary", "scary clowns", max_results=5)