Image Crawler

Crawl images from baidu, bing, google (BBG)
according to keyword using browser.

1. Install Dependencies

1.1 Create virtual environment using conda.
Only test in Python 3.6 and 3.7.

conda create --name crawler python=3.7
source activate crawler

1.2 Download browser driver for selenium.
Find and download the driver of the browser you use.
https://www.seleniumhq.org/download/

1.3 Add the driver's folder's path to SYSTEM PATH.

2. Get Code

Clone the repo and install packages.

git clone https://github.com/quqixun/ImageCrawler.git
cd crawler
pip install -r requires.txt

Run test code to crawl my favourite pandas from BBG.

cd src
python panda.py

3. Explaination

ImageCrawler: obtain HTTP path of desired images;
ImageDownloader: download images to local directory.

import os

from image_crawler import ImageCrawler
from image_downloader import ImageDownloader

4 parameters for ImageCrawler and ImageDownloader:

keyword: you know, the keyword;
n_scroll: number of scrolling in brower;
link_save_dir: holds all links of images;
image_save_dir: where to find all cute pandas.

n_scroll = 5
keyword = 'panda'

link_save_dir = os.path.join('../data/links', keyword)
image_save_dir = os.path.join('../data/images', keyword)

Crawl images' links using Baidu

engine = 'baidu'
baidu_links_name = 'baidu_links.csv'

baidu_ic = ImageCrawler(engine)
baidu_ic.run(keyword, n_scroll)
baidu_ic.save_links(link_save_dir, baidu_links_name)

Crawl images' links using Bing

engine = 'bing'
bing_links_name = 'bing_links.csv'

bing_ic = ImageCrawler(engine)
bing_ic.run(keyword, n_scroll)
bing_ic.save_links(link_save_dir, bing_links_name)

Crawl images' links using Google

engine = 'google'
google_links_name = 'google_links.csv'

google_ic = ImageCrawler(engine)
google_ic.run(keyword, n_scroll)
google_ic.save_links(link_save_dir, google_links_name)

Download images

ider = ImageDownloader(link_save_dir)
ider.run(image_save_dir)

4. Panda

Here is a panda we found.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
panda.jpg		panda.jpg
requires.txt		requires.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

panda.jpg

panda.jpg

requires.txt

requires.txt

Repository files navigation

Image Crawler

1. Install Dependencies

2. Get Code

3. Explaination

4. Panda

About

Releases

Packages

Contributors 2

Languages

License

quqixun/ImageCrawler

Folders and files

Latest commit

History

Repository files navigation

Image Crawler

1. Install Dependencies

2. Get Code

3. Explaination

4. Panda

About

Topics

Resources

License

Stars

Watchers

Forks

Languages